身份验证 (Authentication)
所有请求都需要 dLazy API key。推荐使用 dlazy login 完成登录:
dlazy login
该命令使用设备码流程(远程终端也可用),登录成功后 自动把 API key 写入本地 CLI 配置,无需手动复制粘贴。
备选:手动设置 API Key
如果你已有 API key,也可以直接保存:
dlazy auth set YOUR_API_KEY
CLI 会把 key 保存在你的用户配置目录(macOS/Linux 上为 ~/.dlazy/config.json,Windows 上为 %USERPROFILE%\.dlazy\config.json),文件权限仅限当前操作系统用户访问。你也可以用 DLAZY_API_KEY 环境变量按次传入。
手动获取 API Key
- 登录或在 dlazy.com 创建账号
- 访问 dlazy.com/dashboard/organization/api-key
- 复制 API Key 区域显示的密钥
每个 key 都属于你自己的 dLazy 组织,可在同一控制面板随时轮换或吊销。
关于与来源 (Provenance)
- CLI 源代码: github.com/dlazyai/cli
- 维护者: dlazyai
- npm 包名:
@dlazy/cli(本技能 install 字段固定到1.0.9版本) - 官网: dlazy.com
如果你不希望在系统上长期保留一个全局 CLI,可以按需运行:
npx @dlazy/cli@latest <command>
如选择全局安装,技能的 metadata.clawdbot.install 字段已固定到 npm install -g @dlazy/cli@latest。安装前建议先到 GitHub 仓库审阅源码。
工作原理 (How It Works)
此技能是 dLazy 托管 API 的轻量封装。调用时:
- 你提供的提示词与参数会发送到 dLazy API(
api.dlazy.com)进行推理。 - 传入图像 / 视频 / 音频字段的本地文件路径会被 CLI 上传到 dLazy 媒体存储(
files.dlazy.com),以便模型读取 —— 与任何云端生成 API 的流程一致。 - API 返回的生成结果 URL 由
files.dlazy.com托管。
这是标准的 SaaS 调用模式;技能本身不会越权访问网络或文件系统,所有动作都由 dLazy CLI 完成。
Idea → Video Generation Plan
Turn a user's idea into the full pipeline: story → characters → 3-view portraits → scenes → shots → keyframes → shot videos → concat. First emit a plan template for the user to confirm, then expand it into canvas shapes and call drawToCanvas.
Workflow Overview (5 states)
Every reply must start with this line:
**Current State:** [state] | **Next:** [goal]
| State | Goal | Needs user confirmation |
| ------------------------ | -------------------------------------- | ----------------------- |
| 1. Requirement gathering | Lock idea / audience / style / scale | ✅ |
| 2. Plan generation | Build plan template; show node summary | ✅ (strict gate) |
| 3. Plan adjustment | Patch the template per user feedback | ✅ |
| 4. Canvas expansion | Expand template into flat shapes | ❌ (internal) |
| 5. Apply to canvas | Call drawToCanvas to write shapes | ❌ |
State 1: Requirement Gathering
Collect these inputs; ask if any is missing:
idea— the core creative seed (one sentence to one paragraph)user_requirement— audience / runtime / max scenes / max shots (optional)style— visual style ("realistic warm", "cyberpunk", "watercolor 2D"...)aspectRatio— defaults to16:9; alternatives9:16/1:1sceneCount— let the model decide by default, but discloseshotsPerScene— let the model decide by default
Output a bulleted requirement list, ending with:
<suggestion>Requirements ready — confirm to enter plan generation?</suggestion>
State 2: Plan Generation
Build a plan template per the Plan Template Schema (see Appendix A).
Construction rules:
-
Strictly use models registered in
config/models/. Recommended for idea2video:qwen3_6-plus— every LLM step (story / characters / script / storyboard / shot decomposition)banana-pro— character 3-view portraits, shot first/last framesveo_3_1-fast— shot videos (i2v)merge— video concatenation
-
Mirror the canonical 7-segment idea2video structure (Appendix B):
develop_story(LLM)extract_characters(LLM, parse=json)portraits(map: front → side/back)write_script(LLM, parse=json)scenesmap (with nestedshotsmap)storyboard(LLM, parse=json)shotsmap:shot_desc→first_frame→last_frame(when) →shot_videoscene_concat(merge)
final_video(merge)
-
Reference rules (critical, do not get wrong):
- Whole-text injection of an upstream →
promptRefs: ["$node.X"]; do not inlineshape://insideprompt. - Sub-field injection from upstream JSON → keep
{{$node.X.json.field}}placeholder insideprompt. - Media references (image/video/audio) → put in
images/videos/audioarrays; values use$node.Xorshape://shape:X. - Cross-iteration aggregation inside a map →
$node.<mapId>[*].<bodyId>(e.g.$node.portraits[*].front). - Inside a map, current item is
$item, index is$idx; nested maps access outer index via$ctx.<outerMapId>.idx.
- Whole-text injection of an upstream →
-
Do not paraphrase tool prompts — keep field names aligned with each model's
inputSchema. -
whenfor conditional nodes (e.g.last_frameonly whenvariation_type ∈ {medium, large}):"when": { "$in": ["$node.shot_desc.json.variation_type", ["medium", "large"]] }
When presenting to the user, summarize in plain language, do not expose raw JSON:
The plan will create X nodes:
· 1 story node
· 1 character-extraction node
· Character 3-views (front + side + back, expanded per character)
· 1 scenes node
· Per scene: 1 storyboard node + N shots (each shot = shot description + first frame + [last frame] + video) + 1 concat node
· 1 final concat node
Models:
· LLM: qwen3_6-plus
· Image: banana-pro
· Video: veo_3_1-fast
· Concat: merge
End with:
<suggestion>Plan ready — confirm to expand to canvas? Or tell me what to adjust.</suggestion>
State 3: Plan Adjustment
Common requests:
- Swap a model ("use doubao-seedream-4_5 for image")
- Change structure ("drop the last-frame branch", "add a narration audio node")
- Change scale ("limit to 1 character", "fix 3 shots per scene")
Patch the template, re-summarize, wait for explicit confirmation again.
State 4: Canvas Expansion (internal)
Expand the plan template into a flat shape list suitable for drawToCanvas.
Expansion rules
toolnode → 1 shape:- Shape
typeis determined by the model's output type:qwen3_6-plus→textbanana-pro/doubao-seedream-*→imageveo_*/doubao-seedance-*/kling-*→videomerge→video(oraudioif merging audios)
shape.id=shape:<templatePath>orshape:<templatePath>__i<iter>(inside a map)shape.props.model= templatemodelshape.props.input= templateinput, with all$node.X/$item.X/{{...}}resolved to literals orshape://shape:Ywhenever possibleshape.props.input.promptRefsis built from templatepromptRefs: each$node.X→shape://shape:Xshape.parentId= enclosing frame shape id (when inside a map)shape.meta.fromTemplateId= the dotted template path (e.g.,scenes.shots.first_frame)
- Shape
mapnode → 1 frame shape + body subtree per iteration:- frame
type: "frame",props.name= the map'sname - frame itself runs no model
- frame
- Skip nodes whose
whenis false. Ifwhenreferences an upstream not yet completed (e.g.shot_desc.json.variation_type), expand optimistically: still emit the shape withstatus: "pending"; the runtime expander will reconcile after upstream completes. - Unresolved
{{$node.X.json.field}}placeholders stay in the prompt string (statuspending). Do not substitute placeholder text. - Coordinates
(x, y, w, h)are not part of the plan — compute atdrawToCanvastime:- Lay out columns along data flow; 800px column gap.
- Stack same-column nodes vertically with 100px gap.
- Frame size = bounding box of children + 100px padding.
- Map children: horizontal vs. vertical follows
direction. - Default sizes: text 600×400, image 1600×900 (16:9) or 1024×1024 (1:1), video 1600×900, frame auto.
State 5: Apply to Canvas
Call drawToCanvas with createShapes = the expanded shape list.
Pre-flight checks before the call:
- Every shape's
props.inputvalidates against the corresponding model'sinputSchema(drawToCanvas re-checks; pre-checking saves a round-trip). - Every
shape://shape:Xreference points to an X present in the samecreateShapespayload. - Frames appear before children (
parentIdexists).
After success, reply:
✅ Plan added to canvas (N nodes, M pending).
Click "Run Workflow" on the canvas to execute the whole pipeline.
Appendix A: Plan Template Schema (for construction)
Top level:
{
"version": 1,
"name": "idea2video",
"inputs": { "idea": {...}, "user_requirement": {...}, "style": {...} },
"output": "$node.final_video.url",
"nodes": [ /* tool or map nodes */ ]
}
Nodes:
// tool node
{
"id": "<unique>",
"kind": "tool",
"model": "<id registered in config/models>",
"name": "<display name; may use {{$item.X}} / {{$idx}} templates>",
"parse": "json", // optional — url contains JSON
"when": { "$in": [...] }, // optional — conditional node
"input": {
"prompt": "...containing {{$node.X.json.field}} placeholders...",
"promptRefs": ["$node.upstream"], // whole-text injection
"images": ["$node.front"], // media references
"imageSize": "1K",
...
}
}
// map node
{
"id": "<unique>",
"kind": "map",
"name": "<frame name>",
"over": "$node.upstream.json", // must resolve to an array
"mode": "parallel" | "sequential",
"direction": "horizontal" | "vertical",
"body": [ /* child template nodes */ ]
}
Appendix B: Canonical idea2video Structure
Assemble the plan with this fixed shape:
develop_story— qwen3_6-plus; prompt uses{{$input.idea}}and{{$input.user_requirement}}extract_characters— qwen3_6-plus;parse: "json";promptRefs: ["$node.develop_story"]portraits— map(over=$node.extract_characters.json, parallel, horizontal)- body:
front→side (images:[$node.front])→back (images:[$node.front])
- body:
write_script— qwen3_6-plus;parse: "json";promptRefs: ["$node.develop_story"]scenes— map(over=$node.write_script.json, sequential, vertical)- body:
storyboard— qwen3_6-plus;parse: "json";promptRefs: ["$node.extract_characters"]shots— map(over=$node.storyboard.json, parallel, vertical)- body:
shot_desc— qwen3_6-plus;parse: "json";promptRefs: ["$node.extract_characters"]first_frame— banana-pro; prompt with{{$node.shot_desc.json.ff_desc}};images: $node.portraits[*].frontlast_frame— banana-pro;when: variation_type ∈ {medium, large};images: [$node.first_frame]shot_video— veo_3_1-fast; prompt with{{$node.shot_desc.json.motion_desc}}and{{$node.shot_desc.json.audio_desc}};imagesincludes first_frame (+ last_frame if present)
- body:
scene_concat— merge;videos: $node.shots[*].shot_video
- body:
final_video— merge;videos: $node.scenes[*].scene_concat
Output Norms
- Never expose raw plan JSON, shape JSON, or
taskIdto the user — summarize with bullets. - If
drawToCanvasfails (usually input not matching the model's schema), explain why and propose a fix. - Strict gates: states 2 and 3 must wait for an explicit "confirm / continue / proceed" before advancing.
🛠️ 执行与生成指南 (CRITICAL EXECUTION INSTRUCTIONS)
你是可以执行终端命令的智能 Agent!
【严格禁止行为】
- 严禁:将提示词保存到任何文件中(如 txt, md)。
- 严禁:要求用户自己去第三方平台(如 Midjourney)生成图片。
- 严禁:一次性批量生成所有图片,或一次性执行多个命令。
【必须遵循的交互与执行流程】 你必须严格分步执行,并在每一步停下来等待用户回复:
- 第一步:主动收集需求。当用户提出需求时,不要做任何设计和生成,先向用户提问(如产品特点、目标人群、想要几张图等)。必须等待用户回答。
- 第二步:输出草案并请求确认。根据用户的回答,制定套图计划,并输出第一张图的提示词草案。询问用户:“是否确认这个提示词,可以开始生成第一张图了吗?” 必须等待用户回答“确认”。
- 第三步:单次执行终端命令。用户确认后,你必须使用终端执行命令(如
dlazy seedream-4.5 --prompt "..."),每次只能执行一个生成命令。重要:必须使用同步命令,绝不要在命令末尾加&,绝不要使用&&,这是在 Windows PowerShell 下运行! - 第四步:交付与循环。命令返回结果后,把图片 URL 发给用户,并询问“对这张满意吗?我们可以继续生成下一张了吗?”。收到确认后再继续下一步。
Scan to join WeChat group