Auto

Plan → [ExitPlanMode batch-approve] → Execute in Batches → Verify

Manual invocation only. Type /auto. Never auto-triggers.

Announce at start: "I'm using the auto skill to plan and execute this task."

When to Use

Type /auto. That is the only trigger.

Use for: tasks with 2+ distinct steps, multi-file changes, cross-domain work. Skip for: single-line fixes, pure Q&A, reading files.

Modes

The user's intent determines the mode. No keyword matching — understand what they want from the request.

| Mode | User Intent | Behavior | |------|-------------|----------| | PLAN | User wants a plan only, no execution | EnterPlanMode -> design plan -> save to docs/plans/ -> ExitPlanMode -> STOP | | BUILD | User wants to execute an existing plan | Load plan from docs/plans/ -> critical review -> execute in batches with checkpoints | | FULL | (default) Plan then execute | EnterPlanMode -> present plan -> ExitPlanMode(allowedPrompts) -> execute all steps | | AUTO | User wants autonomous execution, minimal prompts | Plan internally -> ExitPlanMode(allowedPrompts) -> execute without review gates |

Workflow

Step 0: Initialize Skill Index

Every auto invocation starts here. The skill index at references/skill-index.json is the single source of truth for skill matching. Keep it fresh.

Two modes — index absent vs. index present:

| Scenario | Behavior | |----------|----------| | First run (index missing) | Full scan → read every SKILL.md body → classify each skill from full content | | Subsequent run (index exists) | Diff scan → only classify new/changed skills via full body read → delete removed entries |

Step 0a: Full Scan (First Run — No Index Exists)

This is the expensive path. It happens ONCE.

Run: python scripts/scan_skills.py --json-stdout
Read the scan output. classifications_needed will contain ALL discovered skills.
For every skill in classifications_needed: a. Read the full SKILL.md body using the file path from the scan output. No shortcuts. No pattern pre-classification. No name+description guessing. b. Classify using the Classification Guidelines below — ops, domain, prereqs, summary (one sentence capturing the actual behavior, not the frontmatter description), use_for (2-5 specific tasks), do_not_use_for (1-3 likely misapplications). c. The scan_skills.py pre_classified field is a hint only — verify against the full body. Override when it disagrees.
Write all classifications to skill-index.json v2 format: {version, scanned_at, skills: {name: {ops, domain, prereqs, summary, use_for, do_not_use_for, content_hash}}}
Save. The index now exists. Proceed to Step 1.

Constraint: Process in batches of 20-30 skills. After each batch, write partial results to the index so a crash doesn't lose all progress.

Step 0b: Incremental Update (Subsequent Runs — Index Exists)

This is the cheap path. It happens on every subsequent /auto invocation.

Run: python scripts/scan_skills.py --json-stdout
Read the scan output:
- classifications_needed has only new + changed skills
- deleted has removed skills
If classifications_needed is non-empty: a. For each skill: read the full SKILL.md body before classifying (same classification rules as first run) b. Merge into index: add/update entries in skills, update scanned_at
If deleted is non-empty: Remove those keys from skills in the index.
If both are empty: Index is fresh. Proceed to Step 1.
Save the updated index.

Fallback

If scan_skills.py fails (Python not available, etc.):

Read skill-index.json directly and proceed with whatever data is available
Warn: "Skill index may be outdated."

Index freshness rule: Re-scan if scanned_at > 3 days ago OR user mentions installing/removing skills.

Step 1: Plan

Parse the request into ordered, bite-sized tasks — each 2-5 minutes of work, one concrete action
Assign each task: exact file paths, tool to use, expected output
For each task, decide: invoke a skill (see Skill Matching below), or use direct tools
Present plan compactly:

Plan: <one-line summary> | Tasks: N | Mode: <mode>
-> Task 1..N: <brief sequence>
Skills: <list or "direct">

Plan file naming (PLAN / FULL modes): docs/plans/YYYY-MM-DD-<slug>.md

Step 2: Get Approval

Use ExitPlanMode with allowedPrompts — the official Claude Code batch-approval mechanism. The user approves once, all listed operations are pre-authorized.

In AUTO mode: still call ExitPlanMode (required by the harness). For truly zero-prompt execution, pre-configure permissions.allow in settings.json:

{
  "permissions": {
    "allow": [
      "Bash(git *)",
      "Bash(npm *)",
      "Bash(cargo *)",
      "Bash(rtk *)",
      "WebSearch",
      "WebFetch(*)"
    ]
  }
}

Use /update-config or /fewer-permission-prompts to build this list from actual usage.

Step 3: Execute in Batches

Adopted from executing-plans: batch of 3 tasks -> report -> checkpoint.

FULL mode: execute batch -> report -> auto-continue to next batch. BUILD mode: execute batch -> report -> wait for feedback before next batch. AUTO mode: skip review gates entirely. Continue until done or blocked.

Re-evaluate after each batch: if a later task's inputs changed due to earlier results, update it before executing.

Per-task execution:

Announce: Task N/M: <action>
Invoke matched skill via Skill tool, or use direct tools
Run verification for the task
Mark complete with TaskUpdate

Track all tasks with TaskCreate / TaskUpdate (Claude Code's official task tracking).

Step 4: Verify

Iron law (from verification-before-completion):

NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE

For each task and at final backpressure:

IDENTIFY: What command proves this claim?
RUN: Execute the full command (fresh, complete)
READ: Full output, check exit code, count failures
VERIFY: Does output confirm the claim?
ONLY THEN: Make the claim

Never use "should work", "probably", or "seems to". Run the command. Read the output. Then claim.

When to Stop

Adopted from executing-plans — STOP immediately when:

Hit a blocker (missing dependency, test fails, instruction unclear)
Verification fails after 3 attempts with different approaches (per-task counter, resets each new task)
You don't understand an instruction

Ask for clarification rather than guessing. Don't force through blockers.

Skill Matching

The skill index at references/skill-index.json is the single source of truth. Match each task against the index using the algorithm below.

Matching Algorithm

Given a task with an operation tag:

Pre-filter: If the task is a simple file read, shell command, or local file search (Glob/Grep/Read), skip skills entirely — use direct tools.
Compatibility Gate: Remove skills whose prereqs are not met in the current environment (no git repo -> remove git-prereq skills, etc.)
Operation Filter: Keep only skills where at least one ops tag matches the task's operation. Hard gates:
- explore:local tasks -> never match explore:web skills
- create tasks -> never match review-only skills
- design tasks -> never match execute-only skills
Rank Candidates (additive scoring):
- +4: exact operation tag match
- +3: domain matches task domain
- +3 per word: skill name keywords appear in task description (up to +6)
- +1: skill's use_for entries semantically match the task
- -2 per word: skill's do_not_use_for entries semantically match the task
Select: Take the highest-ranked skill with score >= 3. If no skill scores >= 3, use direct tools.
Tiebreaker (when multiple skills have equal score):
- Prefer the skill with fewer words in its name (simpler = more general-purpose)
- Prefer the skill with more ops tags (broader applicability)
- If still tied, pick the first alphabetically

Standardized Tags

Operation Tags (assign exactly ONE per task)

| Tag | Meaning | |-----|---------| | create | Making new files, features, content from scratch | | update | Modifying, refactoring, fixing existing things | | review | Reading, analyzing, auditing, explaining | | design | Planning, brainstorming, architecting, estimating | | execute:local | Running local commands, builds, tests, scripts | | execute:remote | Deploying, pushing, remote API calls | | explore:local | Searching/reading local codebase | | explore:web | Web research, external data fetching |

Domain Tags

meta backend frontend devops testing docs git security ml research utility performance

Prereq Tags

git git:diff web node python pip mcp api:anthropic

Classification Guidelines

When classifying a skill from its description and source context:

ops (choose 1-4):

create if it produces new files/code/content
update if it modifies existing things
review if it reads, analyzes, audits, explains, or inspects
design if it plans, brainstorms, architects, or estimates
execute:local if it runs local commands (build, test, install, cli)
execute:remote if it deploys, pushes, or calls remote services
explore:local if it searches/reads the local codebase
explore:web if it does web searches or fetches external data

domain (exactly 1): Infer from description keywords and source context.

Plugin skills under "plugin-dev/" -> meta
CLI-anything skills -> utility
Skills mentioning React/UI/frontend/css -> frontend
Skills mentioning API/server/backend/database -> backend
Skills mentioning deploy/k8s/infra/docker -> devops
Skills mentioning test/verify/QA/TDD -> testing
Skills mentioning docs/write/presentation/README -> docs
Skills mentioning git/commit/PR/branch -> git
Skills mentioning security/vulnerability/audit -> security
Skills mentioning ML/model/training -> ml
Skills mentioning research/search/data -> research

prereqs (0-4): Infer from description. Git commands -> git. pip install -> pip. npm/node -> node. Web searches -> web. MCP tools -> mcp.

use_for (2-5 short phrases): What specific tasks does this skill handle well? Be specific.

do_not_use_for (1-3 short phrases): What common tasks would be a bad fit? Focus on likely mistakes.

Batch strategy: Classify in groups by source context for consistency (e.g., all sc-* skills together, all cli-anything-* together).

File-Based Memory

For tasks with 5+ steps:

| File | Purpose | |------|---------| | docs/plans/YYYY-MM-DD-<slug>.md | Tasks, progress, decisions | | findings.md | Research discoveries | | progress.md | Session log |

Reboot check (after context gaps): Read plan file -> check current phase -> resume from last completed task.

Completion

After all tasks complete and verified (from finishing-a-development-branch):

Verify tests pass the project's test command
Run final backpressure check (build, test, lint)
Report with evidence: All N tasks complete. <verification output>.