โ† Back to skills
extension
Category: Development & EngineeringNo API key required

doc-splitter

Split large documentation (10K+ pages) into focused sub-skills with intelligent routing. Use for massive doc sites like Godot, AWS, or MSDN.

personAuthor: jakexiaohubgithub

Documentation Splitter Skill

Purpose

Single responsibility: Split large documentation sites into multiple focused sub-skills with an optional router skill for intelligent navigation. (BP-4)

Grounding Checkpoint (Archetype 1 Mitigation)

Before executing, VERIFY:

  • [ ] Total page count is known (run estimation first)
  • [ ] Documentation categories are identifiable
  • [ ] Target skill size determined (default: 5,000 pages per skill)
  • [ ] Router strategy selected (category, size, or hybrid)

DO NOT split without understanding documentation structure.

Uncertainty Escalation (Archetype 2 Mitigation)

ASK USER instead of guessing when:

  • Category boundaries unclear
  • Optimal skill size uncertain for target use case
  • Cross-references between sections complicate splitting
  • Router vs flat structure decision needed

NEVER arbitrarily split - seek user guidance on boundaries.

Context Scope (Archetype 3 Mitigation)

| Context Type | Included | Excluded | |--------------|----------|----------| | RELEVANT | Doc structure, categories, page counts | Actual page content | | PERIPHERAL | Similar large doc examples | Other documentation | | DISTRACTOR | Content quality concerns | Individual page issues |

Size Guidelines

| Documentation Size | Recommendation | Strategy | |-------------------|----------------|----------| | < 5,000 pages | One skill | No splitting | | 5,000 - 10,000 pages | Consider splitting | Category-based | | 10,000 - 30,000 pages | Recommended | Router + Categories | | 30,000+ pages | Strongly recommended | Router + Categories |

Workflow Steps

Step 1: Estimate Documentation Size (Grounding)

# Quick estimation with skill-seekers
skill-seekers estimate configs/large-docs.json

# Output:
# ๐Ÿ“Š ESTIMATION RESULTS
# โœ… Pages Discovered: 28,450
# ๐Ÿ“ˆ Estimated Total: 32,000
# โฑ๏ธ  Time Elapsed: 2.1 minutes
# ๐Ÿ’ก Recommended: Split into 6-7 sub-skills

Step 2: Analyze Category Structure

# Identify natural category boundaries
skill-seekers analyze --config configs/large-docs.json --categories

# Output:
# Categories detected:
# - scripting: 8,200 pages
# - 2d: 5,400 pages
# - 3d: 9,100 pages
# - physics: 4,300 pages
# - networking: 2,800 pages
# - editor: 2,200 pages

Step 3: Choose Split Strategy

| Strategy | Best For | Description | |----------|----------|-------------| | category | Clear topic divisions | Split by documentation sections | | size | Uniform distribution | Split every N pages | | router | User navigation | Hub skill + specialized sub-skills | | hybrid | Complex docs | Categories + size limits per category |

Step 4: Execute Split

Option A: With skill-seekers

# Category-based split
skill-seekers split --config configs/godot.json --strategy category

# Router-based split (recommended for large docs)
skill-seekers split --config configs/godot.json --strategy router

# Size-based split
skill-seekers split --config configs/godot.json --strategy size --pages-per-skill 5000

Option B: Manual split configuration

{
  "name": "godot",
  "max_pages": 40000,
  "split_strategy": "router",
  "split_config": {
    "target_pages_per_skill": 5000,
    "create_router": true,
    "categories": {
      "scripting": {
        "patterns": ["/scripting/", "/gdscript/", "/c_sharp/"],
        "max_pages": 8000
      },
      "2d": {
        "patterns": ["/2d/", "/sprite/", "/tilemap/"],
        "max_pages": 6000
      },
      "3d": {
        "patterns": ["/3d/", "/mesh/", "/spatial/"],
        "max_pages": 10000
      },
      "physics": {
        "patterns": ["/physics/", "/collision/", "/rigidbody/"],
        "max_pages": 5000
      }
    }
  }
}

Step 5: Scrape Sub-Skills

# Scrape all sub-skills in parallel
for config in configs/godot-*.json; do
  skill-seekers scrape --config $config &
done
wait

# Or sequentially with progress
for config in configs/godot-*.json; do
  echo "Processing: $config"
  skill-seekers scrape --config $config
done

Step 6: Generate Router Skill

# Auto-generate router from sub-skills
skill-seekers generate-router configs/godot-*.json

# Creates godot-router skill that intelligently routes queries

Step 7: Validate Split Results

# Check sub-skill sizes
for dir in output/godot-*/; do
  echo "$dir: $(find $dir -name "*.md" | wc -l) files"
done

# Verify router coverage
cat output/godot-router/SKILL.md | grep -A 50 "## Sub-Skills"

Recovery Protocol (Archetype 4 Mitigation)

On error:

  1. PAUSE - Note which sub-skill failed
  2. DIAGNOSE - Check error type:
    • Category overlap โ†’ Refine URL patterns
    • Uneven split โ†’ Adjust page limits
    • Orphan pages โ†’ Add catch-all category
    • Router incomplete โ†’ Regenerate after all sub-skills done
  3. ADAPT - Modify split configuration
  4. RETRY - Re-split affected category (max 3 attempts)
  5. ESCALATE - Present split preview, ask user for boundary adjustments

Checkpoint Support

State saved to: .aiwg/working/checkpoints/doc-splitter/

checkpoints/doc-splitter/
โ”œโ”€โ”€ estimation.json         # Page count results
โ”œโ”€โ”€ category_analysis.json  # Category breakdown
โ”œโ”€โ”€ split_plan.json         # Planned split configuration
โ”œโ”€โ”€ progress/
โ”‚   โ”œโ”€โ”€ godot-scripting.json
โ”‚   โ”œโ”€โ”€ godot-2d.json
โ”‚   โ””โ”€โ”€ ...
โ””โ”€โ”€ router_draft.md         # Router skill draft

Output Structure

After splitting large documentation:

configs/
โ”œโ”€โ”€ godot.json              # Original config
โ”œโ”€โ”€ godot-scripting.json    # Generated sub-config
โ”œโ”€โ”€ godot-2d.json
โ”œโ”€โ”€ godot-3d.json
โ”œโ”€โ”€ godot-physics.json
โ””โ”€โ”€ godot-router.json       # Router config

output/
โ”œโ”€โ”€ godot-scripting/        # Sub-skill
โ”‚   โ”œโ”€โ”€ SKILL.md
โ”‚   โ””โ”€โ”€ references/
โ”œโ”€โ”€ godot-2d/               # Sub-skill
โ”œโ”€โ”€ godot-3d/               # Sub-skill
โ”œโ”€โ”€ godot-physics/          # Sub-skill
โ””โ”€โ”€ godot-router/           # Router skill
    โ”œโ”€โ”€ SKILL.md            # Routing logic
    โ””โ”€โ”€ references/
        โ””โ”€โ”€ routing-table.md

Router Skill Structure

The generated router skill:

# Godot Documentation Router

## Purpose
Route queries to the appropriate specialized Godot sub-skill.

## Sub-Skills

| Topic | Skill | Coverage |
|-------|-------|----------|
| GDScript, C#, scripting patterns | godot-scripting | 8,200 pages |
| 2D graphics, sprites, tilemaps | godot-2d | 5,400 pages |
| 3D graphics, meshes, materials | godot-3d | 9,100 pages |
| Physics, collisions, rigid bodies | godot-physics | 4,300 pages |

## Routing Rules

1. **Scripting questions** โ†’ godot-scripting
   - Keywords: script, gdscript, c#, function, variable, class

2. **2D graphics questions** โ†’ godot-2d
   - Keywords: sprite, 2d, tilemap, animation2d, canvas

3. **3D graphics questions** โ†’ godot-3d
   - Keywords: mesh, 3d, spatial, material, shader, camera3d

4. **Physics questions** โ†’ godot-physics
   - Keywords: physics, collision, rigidbody, area, raycast

## Usage

Ask your question naturally. This router will direct you to the appropriate specialized skill.

Example:
- "How do I create a player movement script?" โ†’ godot-scripting
- "How do I set up tilemap collisions?" โ†’ godot-2d
- "How do I apply materials to a mesh?" โ†’ godot-3d

Troubleshooting

| Issue | Diagnosis | Solution | |-------|-----------|----------| | Uneven splits | Category size varies | Use hybrid strategy with max_pages | | Orphan pages | URL patterns incomplete | Add catch-all or refine patterns | | Router confusion | Overlapping keywords | Make routing rules more specific | | Too many skills | Over-segmented | Merge related categories |

References

  • Skill Seekers Large Documentation: https://github.com/jmagly/Skill_Seekers/blob/main/docs/LARGE_DOCUMENTATION.md
  • REF-001: Production-Grade Agentic Workflows (BP-4, BP-9 KISS)
  • REF-002: LLM Failure Modes (Archetype 3 context filtering, Archetype 4 recovery)