Image to Prompt Skill
Overview
Reverse-engineers uploaded images into two artifacts:
- Prompt Template — A reusable structured prompt with
[PLACEHOLDER]variables for generating similar images with different subjects. - Prompt Case — The concrete prompt with placeholders filled using actual content observed in the input image.
This skill is the reverse counterpart of image-creation-prompt-skill:
image-creation-prompt-skill= forward: text description -> structured promptimage-to-prompt-skill= reverse: image -> structured prompt + reusable template
Trigger Conditions
Activate when:
- User uploads an image (base64, file path, or URL) and asks for prompt generation
- Explicit requests: "根据这张图生成 prompt", "reverse engineer this image", "分析这张图片的风格", "拆解这张图的结构"
- Keywords: "image to prompt", "图片转 prompt", "prompt 模板", "逆向图片", "根据图片生成 prompt"
10-Step Visual Analysis Framework
Before generating output, systematically analyze the image across these 10 dimensions:
| Step | Dimension | What to Extract |
| ---- | ----------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
| 1 | Overall Style | Art movement, visual genre, aesthetic label (e.g., "graffiti collage poster", "minimalist flat illustration") |
| 2 | Color Scheme | Primary/secondary/accent colors, contrast level, saturation, palette type (monochrome, complementary, analogous, triadic) |
| 3 | Main Subject | Central figure/object: identity, pose, expression, clothing, accessories, physical traits, proportions relative to frame |
| 4 | Background Design | Environment, depth of field, background elements, layering, spatial relationship to subject |
| 5 | Text & Typography | All visible text content, font styles, sizes, orientations, languages, placement strategy, hierarchy |
| 6 | Composition & Perspective | Framing, rule of thirds, symmetry/asymmetry, diagonal lines, camera angle, focal point, visual hierarchy |
| 7 | Material & Texture | Surface qualities: paper, metal, fabric, digital smoothness, brush strokes, torn edges, grain, gloss/matte |
| 8 | Lighting & Mood | Light direction, intensity, shadows, highlights, emotional tone, atmosphere, time of day feel |
| 9 | Decorative & Auxiliary Elements | Icons, borders, geometric shapes, patterns, filters, overlays, vignettes, watermarks, corner decorations |
| 10 | Quality & Technical Parameters | Estimated aspect ratio, resolution cues, suspected AI model parameters (e.g., --ar 9:16 --v 5 --style raw) |
Output Format
Produce two code blocks in a single response.
Block 1: Prompt Template
=== Overall Style ===
<[STYLE_NAME] art style — style description with [ART_STYLE], [ERA], [GENRE] placeholders>
=== Color Scheme ===
<Color palette description with [PRIMARY_COLOR], [SECONDARY_COLOR], [ACCENT_COLOR] placeholders>
=== Main Subject ===
<Subject template with [SUBJECT_NAME], [AGE], [GENDER], [POSE], [EXPRESSION], [CLOTHING], [ACCESSORIES] placeholders>
=== Background Design ===
<Background template with [SETTING], [DEPTH_OF_FIELD], [BG_ELEMENTS] placeholders>
=== Text & Typography ===
<Text template with [HEADLINE], [BODY_TEXT], [FONT_STYLE], [TEXT_PLACEMENT] placeholders>
=== Composition & Perspective ===
<Composition template with [FRAMING], [PERSPECTIVE], [FOCAL_POINT], [VISUAL_WEIGHT] placeholders>
=== Material & Texture ===
<Material template with [SURFACE_TYPE], [TEXTURE_DETAIL], [FINISH] placeholders>
=== Lighting & Mood ===
<Lighting template with [LIGHT_DIRECTION], [LIGHT_QUALITY], [MOOD], [ATMOSPHERE] placeholders>
=== Decorative & Auxiliary Elements ===
<Decorative template with [DECOR_TYPE], [PATTERN], [OVERLAY] placeholders>
=== Quality & Technical Parameters ===
<Quality tags with [ASPECT_RATIO], [MODEL_VERSION], [STYLE_PARAM], [QUALITY_TAGS] placeholders>
Block 2: Prompt Case
Same 10 sections, but all placeholders are replaced with actual values observed in the input image. The case should read as a single coherent prompt that could be pasted directly into an image generator.
Placeholder Naming Convention
- All caps, snake_case, wrapped in square brackets:
[SUBJECT_NAME],[PRIMARY_COLOR] - Semantic naming: the placeholder name must describe what it represents
- Language-agnostic: placeholders use English for universal readability since the skill is open-source
- Predefined common placeholders (use these when applicable; invent new ones only when necessary):
[STYLE_NAME][SUBJECT_NAME]/[SUBJECT_TYPE]/[SUBJECT_DESCRIPTION][AGE]/[GENDER]/[POSE]/[EXPRESSION]/[FACIAL_FEATURES][CLOTHING]/[ACCESSORIES]/[HAIR_STYLE]/[BODY_TYPE][PRIMARY_COLOR]/[SECONDARY_COLOR]/[ACCENT_COLOR]/[BACKGROUND_COLOR][SETTING]/[BG_ELEMENTS]/[DEPTH_OF_FIELD]/[ENVIRONMENT][HEADLINE]/[BODY_TEXT]/[FONT_STYLE]/[TEXT_PLACEMENT]/[LANGUAGE][FRAMING]/[PERSPECTIVE]/[FOCAL_POINT]/[VISUAL_WEIGHT]/[CAMERA_ANGLE][SURFACE_TYPE]/[TEXTURE_DETAIL]/[FINISH]/[MATERIAL][LIGHT_DIRECTION]/[LIGHT_QUALITY]/[LIGHT_COLOR]/[MOOD]/[ATMOSPHERE][DECOR_TYPE]/[PATTERN]/[OVERLAY]/[BORDER_STYLE][ASPECT_RATIO]/[MODEL_VERSION]/[STYLE_PARAM]/[QUALITY_TAGS]/[RESOLUTION][ART_STYLE]/[ERA]/[GENRE]
Content Quality Rules
- Be specific, not vague. Instead of "modern design," write "asymmetrical layout with strong diagonal lines and overlapping geometric shapes."
- Preserve proportions and relationships. If the subject occupies 30% of the frame and is positioned left-of-center, say so.
- Distinguish observed from inferred. Observed: "black and orange color scheme." Inferred: "likely generated with --v 5 --style raw."
- Template must be truly reusable. A user should be able to swap
[SUBJECT_NAME]and[SUBJECT_TYPE]and get a coherent new prompt for a different subject in the same style. - Quality tags go last. Always end with technical/quality parameters so they can be easily copied to image generators.
- Use the 10 sections consistently. Every output must have all 10 sections. If a section has minimal content (e.g., no text in a pure photograph), explicitly state "No text elements present" rather than omitting the section.
Few-Shot Learning
Before generating, review examples in examples/ to understand expected analysis depth and output quality. Each example demonstrates the 10-step framework applied to a real image.
Example Reference Files
| Example | Style | Key Learning |
| ------------------------------------------- | --------------------------- | ---------------------------------------------------------------------------- |
| examples/urban-collage-poster-example.md | Graffiti collage poster | Complex multi-layer composition with text, portrait, and urban elements |
Scan to join WeChat group