Auto
Plan → [ExitPlanMode batch-approve] → Execute in Batches → Verify
Manual invocation only. Type /auto. Never auto-triggers.
Announce at start: "I'm using the auto skill to plan and execute this task."
When to Use
Type /auto. That is the only trigger.
Use for: tasks with 2+ distinct steps, multi-file changes, cross-domain work. Skip for: single-line fixes, pure Q&A, reading files.
Modes
The user's intent determines the mode. No keyword matching — understand what they want from the request.
| Mode | User Intent | Behavior |
|------|-------------|----------|
| PLAN | User wants a plan only, no execution | EnterPlanMode -> design plan -> save to docs/plans/ -> ExitPlanMode -> STOP |
| BUILD | User wants to execute an existing plan | Load plan from docs/plans/ -> critical review -> execute in batches with checkpoints |
| FULL | (default) Plan then execute | EnterPlanMode -> present plan -> ExitPlanMode(allowedPrompts) -> execute all steps |
| AUTO | User wants autonomous execution, minimal prompts | Plan internally -> ExitPlanMode(allowedPrompts) -> execute without review gates |
Workflow
Step 0: Initialize Skill Index
Every auto invocation starts here. The skill index at references/skill-index.json is the single source of truth for skill matching. Keep it fresh.
Two modes — index absent vs. index present:
| Scenario | Behavior | |----------|----------| | First run (index missing) | Full scan → read every SKILL.md body → classify each skill from full content | | Subsequent run (index exists) | Diff scan → only classify new/changed skills via full body read → delete removed entries |
Step 0a: Full Scan (First Run — No Index Exists)
This is the expensive path. It happens ONCE.
- Run:
python scripts/scan_skills.py --json-stdout - Read the scan output.
classifications_neededwill contain ALL discovered skills. - For every skill in
classifications_needed: a. Read the full SKILL.md body using thefilepath from the scan output. No shortcuts. No pattern pre-classification. No name+description guessing. b. Classify using the Classification Guidelines below — ops, domain, prereqs, summary (one sentence capturing the actual behavior, not the frontmatter description), use_for (2-5 specific tasks), do_not_use_for (1-3 likely misapplications). c. The scan_skills.pypre_classifiedfield is a hint only — verify against the full body. Override when it disagrees. - Write all classifications to
skill-index.jsonv2 format:{version, scanned_at, skills: {name: {ops, domain, prereqs, summary, use_for, do_not_use_for, content_hash}}} - Save. The index now exists. Proceed to Step 1.
Constraint: Process in batches of 20-30 skills. After each batch, write partial results to the index so a crash doesn't lose all progress.
Step 0b: Incremental Update (Subsequent Runs — Index Exists)
This is the cheap path. It happens on every subsequent /auto invocation.
- Run:
python scripts/scan_skills.py --json-stdout - Read the scan output:
classifications_neededhas only new + changed skillsdeletedhas removed skills
- If
classifications_neededis non-empty: a. For each skill: read the full SKILL.md body before classifying (same classification rules as first run) b. Merge into index: add/update entries inskills, updatescanned_at - If
deletedis non-empty: Remove those keys fromskillsin the index. - If both are empty: Index is fresh. Proceed to Step 1.
- Save the updated index.
Fallback
If scan_skills.py fails (Python not available, etc.):
- Read
skill-index.jsondirectly and proceed with whatever data is available - Warn: "Skill index may be outdated."
Index freshness rule: Re-scan if scanned_at > 3 days ago OR user mentions installing/removing skills.
Step 1: Plan
- Parse the request into ordered, bite-sized tasks — each 2-5 minutes of work, one concrete action
- Assign each task: exact file paths, tool to use, expected output
- For each task, decide: invoke a skill (see Skill Matching below), or use direct tools
- Present plan compactly:
Plan: <one-line summary> | Tasks: N | Mode: <mode>
-> Task 1..N: <brief sequence>
Skills: <list or "direct">
Plan file naming (PLAN / FULL modes): docs/plans/YYYY-MM-DD-<slug>.md
Step 2: Get Approval
Use ExitPlanMode with allowedPrompts — the official Claude Code batch-approval mechanism. The user approves once, all listed operations are pre-authorized.
In AUTO mode: still call ExitPlanMode (required by the harness). For truly zero-prompt execution, pre-configure permissions.allow in settings.json:
{
"permissions": {
"allow": [
"Bash(git *)",
"Bash(npm *)",
"Bash(cargo *)",
"Bash(rtk *)",
"WebSearch",
"WebFetch(*)"
]
}
}
Use /update-config or /fewer-permission-prompts to build this list from actual usage.
Step 3: Execute in Batches
Adopted from executing-plans: batch of 3 tasks -> report -> checkpoint.
FULL mode: execute batch -> report -> auto-continue to next batch. BUILD mode: execute batch -> report -> wait for feedback before next batch. AUTO mode: skip review gates entirely. Continue until done or blocked.
Re-evaluate after each batch: if a later task's inputs changed due to earlier results, update it before executing.
Per-task execution:
- Announce:
Task N/M: <action> - Invoke matched skill via Skill tool, or use direct tools
- Run verification for the task
- Mark complete with TaskUpdate
Track all tasks with TaskCreate / TaskUpdate (Claude Code's official task tracking).
Step 4: Verify
Iron law (from verification-before-completion):
NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
For each task and at final backpressure:
- IDENTIFY: What command proves this claim?
- RUN: Execute the full command (fresh, complete)
- READ: Full output, check exit code, count failures
- VERIFY: Does output confirm the claim?
- ONLY THEN: Make the claim
Never use "should work", "probably", or "seems to". Run the command. Read the output. Then claim.
When to Stop
Adopted from executing-plans — STOP immediately when:
- Hit a blocker (missing dependency, test fails, instruction unclear)
- Verification fails after 3 attempts with different approaches (per-task counter, resets each new task)
- You don't understand an instruction
Ask for clarification rather than guessing. Don't force through blockers.
Skill Matching
The skill index at references/skill-index.json is the single source of truth. Match each task against the index using the algorithm below.
Matching Algorithm
Given a task with an operation tag:
-
Pre-filter: If the task is a simple file read, shell command, or local file search (Glob/Grep/Read), skip skills entirely — use direct tools.
-
Compatibility Gate: Remove skills whose
prereqsare not met in the current environment (no git repo -> remove git-prereq skills, etc.) -
Operation Filter: Keep only skills where at least one
opstag matches the task's operation. Hard gates:explore:localtasks -> never matchexplore:webskillscreatetasks -> never match review-only skillsdesigntasks -> never match execute-only skills
-
Rank Candidates (additive scoring):
- +4: exact operation tag match
- +3:
domainmatches task domain - +3 per word: skill name keywords appear in task description (up to +6)
- +1: skill's
use_forentries semantically match the task - -2 per word: skill's
do_not_use_forentries semantically match the task
-
Select: Take the highest-ranked skill with score >= 3. If no skill scores >= 3, use direct tools.
-
Tiebreaker (when multiple skills have equal score):
- Prefer the skill with fewer words in its name (simpler = more general-purpose)
- Prefer the skill with more ops tags (broader applicability)
- If still tied, pick the first alphabetically
Standardized Tags
Operation Tags (assign exactly ONE per task)
| Tag | Meaning |
|-----|---------|
| create | Making new files, features, content from scratch |
| update | Modifying, refactoring, fixing existing things |
| review | Reading, analyzing, auditing, explaining |
| design | Planning, brainstorming, architecting, estimating |
| execute:local | Running local commands, builds, tests, scripts |
| execute:remote | Deploying, pushing, remote API calls |
| explore:local | Searching/reading local codebase |
| explore:web | Web research, external data fetching |
Domain Tags
meta backend frontend devops testing docs git security ml research utility performance
Prereq Tags
git git:diff web node python pip mcp api:anthropic
Classification Guidelines
When classifying a skill from its description and source context:
ops (choose 1-4):
createif it produces new files/code/contentupdateif it modifies existing thingsreviewif it reads, analyzes, audits, explains, or inspectsdesignif it plans, brainstorms, architects, or estimatesexecute:localif it runs local commands (build, test, install, cli)execute:remoteif it deploys, pushes, or calls remote servicesexplore:localif it searches/reads the local codebaseexplore:webif it does web searches or fetches external data
domain (exactly 1): Infer from description keywords and source context.
- Plugin skills under "plugin-dev/" ->
meta - CLI-anything skills ->
utility - Skills mentioning React/UI/frontend/css ->
frontend - Skills mentioning API/server/backend/database ->
backend - Skills mentioning deploy/k8s/infra/docker ->
devops - Skills mentioning test/verify/QA/TDD ->
testing - Skills mentioning docs/write/presentation/README ->
docs - Skills mentioning git/commit/PR/branch ->
git - Skills mentioning security/vulnerability/audit ->
security - Skills mentioning ML/model/training ->
ml - Skills mentioning research/search/data ->
research
prereqs (0-4): Infer from description. Git commands -> git. pip install -> pip. npm/node -> node. Web searches -> web. MCP tools -> mcp.
use_for (2-5 short phrases): What specific tasks does this skill handle well? Be specific.
do_not_use_for (1-3 short phrases): What common tasks would be a bad fit? Focus on likely mistakes.
Batch strategy: Classify in groups by source context for consistency (e.g., all sc-* skills together, all cli-anything-* together).
File-Based Memory
For tasks with 5+ steps:
| File | Purpose |
|------|---------|
| docs/plans/YYYY-MM-DD-<slug>.md | Tasks, progress, decisions |
| findings.md | Research discoveries |
| progress.md | Session log |
Reboot check (after context gaps): Read plan file -> check current phase -> resume from last completed task.
Completion
After all tasks complete and verified (from finishing-a-development-branch):
- Verify tests pass the project's test command
- Run final backpressure check (build, test, lint)
- Report with evidence:
All N tasks complete. <verification output>.
微信扫一扫