Desktop Scene Tagger — 桌面场景图片打标签
Overview
Analyze desk/workspace scene photos against a fixed tag taxonomy, then output a formatted Excel report with per-image rows, per-device columns, and inferred user personas.
Workflow
Step 1: Load the Tag System
Before analyzing any image, read references/tag-system.md to understand the complete tag taxonomy, all valid values, and the persona inference rules. All analysis must strictly follow this reference — never deviate from the defined tag names or value options.
Step 2: Collect Images
Ask the user to provide the images to analyze. Accept images via:
- Direct file upload in the conversation
- A folder path containing images
If a folder path is provided, list all image files (.jpg, .jpeg, .png, .webp, .bmp, .heic) and confirm with the user before processing.
Step 3: Analyze Each Image Individually
For each image, use the multimodal vision capabilities to read and visually inspect the image. For each tag category, determine the appropriate value. The analysis must be thorough and cover every single tag defined in the tag system:
- Environment (装修环境) — lighting, carpet, placement
- Computer (电脑类) — screen count, screen orientation
- Keyboard & Mouse (键鼠类) — keyboard, mouse, wrist rest
- Audio (音频类) — speaker, headphone, audio interface, microphone
- Mood Items (情绪类) — aromatherapy, plant, artwork, figure, ambient light
- Ergonomics (人体工学类) — lumbar support, footrest, standing desk, monitor arm, ergonomic chair, desk mat
Key principles for accurate tagging:
- Be thorough: Scan the entire image — foreground, background, desk surface, under desk, walls, floor.
- Default to "未识别": Only tag "有" when the item is clearly visible and identifiable. When in doubt, use "未识别". Do not guess.
- Count screens carefully: Look for monitor bezels, distinct display areas, laptops (screen + external = 2), all-in-one PCs.
- Distinguish device types: A laptop counts as both a screen and a keyboard. An all-in-one counts as a screen.
- Check orientation: Look at aspect ratio and on-screen content layout to determine portrait vs landscape.
Step 4: Compile Analysis Results
After analyzing all images, compile the results into a JSON array. Each object represents one image. Use exactly the keys defined in scripts/generate_excel.py. The required keys are:
{
"image": "filename.jpg",
"lighting": "明亮|昏暗|未识别",
"carpet": "有|无|未识别",
"placement": "窗边|墙边|房间中央|角落|床边|未识别",
"screen_count": "1|2|3|4个及以上|未识别",
"screen_orientation": "1横|1竖|2横|1横1竖|2横1竖|3横|全横|全竖|混合|未识别",
"keyboard": "有|未识别",
"mouse": "有|未识别",
"wrist_rest": "有|未识别",
"speaker": "有|未识别",
"headphone": "有|未识别",
"audio_interface": "有|未识别",
"microphone": "有|未识别",
"aromatherapy": "有|未识别",
"plant": "有|未识别",
"artwork": "有|未识别",
"figure": "有|未识别",
"ambient_light": "有|未识别",
"lumbar_support": "有|未识别",
"footrest": "有|未识别",
"standing_desk": "有|未识别",
"monitor_arm": "有|未识别",
"ergonomic_chair": "有|未识别",
"desk_mat": "有|未识别",
"persona": "画像名称"
}
Step 5: Infer User Persona
For each image, apply the persona inference rules from references/tag-system.md Section 3. The persona is image-level (one per photo), not global. If a user uploaded multiple images showing different setups, each gets its own persona.
Step 6: Generate Excel Report
Save the compiled JSON to a temporary file (e.g., analysis_result.json), then run:
python scripts/generate_excel.py analysis_result.json labels_output.xlsx
The script is at <skill-directory>/scripts/generate_excel.py. It produces a formatted Excel file with:
- Row 1: Category group headers (装修环境, 电脑类, 键鼠类, 音频类, 情绪类, 人体工学类, 画像推断)
- Row 2: Individual tag column headers
- Rows 3+: One row per image
- Color coding: Green = "有", Light red = "未识别", Persona column = persona-specific color
- Frozen panes at B3, auto-filter enabled
Step 7: Present Results
Present the Excel file to the user using present_files. Provide a brief summary:
- Number of images analyzed
- Most common persona detected
- Notable patterns (e.g., "3/5 images show dual-screen setups", "all users have mechanical keyboards")
Quick Reference: All Tags at a Glance
| Category | Tags | |---|---| | 环境 | 灯光情况, 地毯, 设备摆放位置 | | 电脑 | 屏幕数量, 屏幕方向 | | 键鼠 | 键盘, 鼠标, 腕托 | | 音频 | 音箱, 耳机, 声卡/音频接口, 麦克风 | | 情绪 | 香薰, 绿植, 插画/装饰画, 手办/摆件, 氛围灯 | | 工学 | 腰靠, 脚踏, 升降桌, 显示器支架, 人体工学椅, 桌垫 | | 画像 | 用户画像(升级装备党/极简主义者/氛围营造者/健康办公族/影音发烧友/电竞玩家/普通办公族/创意工作者) |
Scan to join WeChat group