Executing Spikes

Overview

Execute one approach from a spike definition autonomously, exploring feasibility without asking permission or polishing code.

Core principle: Fast exploration over clean code. Prove it works, don't perfect it.

Announce at start: "I'm using the Executing Spikes skill to explore approach N."

Context Override Authority

This skill explicitly overrides:

TDD rules (skip test-first discipline, prove it works however is fastest)
Code quality rules (messy code is GOOD during spikes)
"Ask for permission" rules (make decisions autonomously, document assumptions)

These overrides are NOT permission exceptions - they are the correct behavior for spike execution.

The Process

Copy this checklist to track progress:

Executing Spikes Progress:
- [ ] Phase 1: Setup Worktree (worktree created, data stores isolated)
- [ ] Phase 2: Load Spike Definition & Choose Approach (spike notes read, approach chosen)
- [ ] Phase 3: Autonomous Exploration (implementation complete, quick-and-dirty code)
- [ ] Phase 4: Proving It Works (test script runs, output captured)
- [ ] Phase 5: Push Until Natural Stop (reached natural stopping point)
- [ ] Phase 6: Discovery Report (findings documented, work committed)

Phase 1: Setup Worktree

Announce: "I'm using the Using Git Worktrees skill to set up spike workspace."
Use skills/collaboration/using-git-worktrees
Branch from spike-[canonical-name] creating spike-[canonical-name]-N
Partner tells you which number to use (1, 2, 3...)

Data Store Isolation (Any Project with Databases/State)

CRITICAL: Each spike must use its own data stores to prevent parallel spikes from conflicting.

Applies to: PostgreSQL, MySQL, SQLite files, Redis databases, MongoDB collections, etc.

Before creating schema or running migrations, verify isolation:

For Rails projects, check both development AND test databases:

# Check what database you'll use
bin/rails db:migrate:status

# Expected: database name should be spike-specific
# ✅ Good: spike_overlay_data_model_2_development
# ✅ Good: spike_overlay_data_model_2_test
# ❌ Bad: myapp_development (shared across all spikes)

For other frameworks, verify equivalent isolation mechanism exists.

If data stores are NOT isolated:

STOP and implement isolation (check config for branch/worktree-based naming)
If you cannot figure out how to isolate data stores, STOP and ask partner for guidance before proceeding
Do NOT proceed with shared data stores - parallel spikes will conflict

Why critical: Without isolation, parallel spikes will drop each other's tables/collections, wasting hours debugging phantom failures that only occur when multiple spikes run simultaneously.

Phase 2: Load Spike Definition & Choose Approach

Read spike-notes-[canonical-name].md from the base spike branch
Copy to your worktree if needed
Extract approach number from branch name
- Example: spike-replace-3d-vectors-2 → approach 2
If that numbered approach exists in notes: use it
If that numbered approach doesn't exist: Create one, document it in spike notes
Document your chosen approach details

Phase 3: Autonomous Exploration

Execute independently:

Make ALL decisions yourself (library choices, architecture, error handling)
Document assumptions in spike notes
Quick-and-dirty over clean code
Duplication is fine, inconsistent naming is fine, messy code is GOOD
Don't stop to validate choices
Don't ask for permission
Push through minor obstacles with workarounds

Code Quality Expectations for Spikes:

✅ GOOD: Duplicated code across 3 places
✅ GOOD: Inconsistent naming
✅ GOOD: Quick hacks and workarounds
✅ GOOD: Copy-pasted code
✅ GOOD: Hardcoded values
❌ BAD: Spending time refactoring
❌ BAD: Extracting shared functions
❌ BAD: Consistent abstractions
❌ BAD: "Clean" code

The goal is learning speed, not maintainable code.

Phase 4: Proving It Works (Critical)

Your spike MUST actually run and do something.

Minimum requirement: Create executable test script

Create a test file that can be run with a single command:
- test_spike.rb / test_spike.py / test.sh / npm run spike-test
- Should test ALL scenarios from spike definition
- Must print clear output showing pass/fail
Run it and capture output:
- Don't just write the tests - RUN THEM
- Copy actual output into your report
- Output is proof you didn't just write code that "looks right"
Test script should:
- Setup test data
- Exercise the spike's core functionality
- Print results for each scenario
- Use ✅/❌ or PASS/FAIL markers for clarity

Example test script output:

=== Testing Scenario 1: Base entity ===
✅ Loaded entity: {"name": "Bran", ...}

=== Testing Scenario 2: With overlay ===
✅ Applied overlay, got: {"name": "Bran", "items": ["mace"], ...}

=== Testing Scenario 3: Mutual exclusivity ===
✅ Validation rejected conflicting overlays
Error: "recently-bubbled and 100-years-bubbled are mutually exclusive"

Choose fastest validation method:

Quick validation (prefer these):

Test script that exercises all scenarios (recommended)
Manual testing with documented steps + output
Print statements showing data flow
Simple integration showing end-to-end works

Automated tests (use if already faster):

Integration tests proving happy path
Tests as executable documentation

TDD discipline (SKIP THIS):

❌ Test-first workflow
❌ Comprehensive coverage
❌ Testing edge cases exhaustively
❌ RED-GREEN-REFACTOR cycle

The rule: Your spike must work - run it and prove it. Use whatever validation is fastest.

Red flags:

❌ "The code looks correct" → Run it
❌ "I tested it mentally" → Run it
❌ "Logic is sound" → Run it
❌ Writing report without running code → Stop, run it first

In your report, include:

Path to test script
Command to run it
Full output (or representative sample if very long)
Mapping of output to spike test scenarios

Phase 5: Push Until Natural Stop

Stop when:

Feature works end-to-end and you've proven it (success!)
Hit genuine blocker you can't work around (missing system dependency, fundamental incompatibility)
Discovered approach won't work (fundamental design flaw)
Reasonable effort expended (~2-3 hours worth of exploration)

Don't stop when:

Code is messy (that's fine - this is exploratory)
Hit a minor error (try workaround first)
Unsure if approach is "right" (keep going, that's not the spike's purpose)
Want to check if design is okay (make the call yourself)
Want to refactor (skip it entirely)
Tests are incomplete (you're not doing TDD)

Phase 6: Discovery Report

Create a detailed spike report following the standardized template in reference/report-template.md.

Key requirements:

12 required sections covering implementation, results, evaluation, and next steps
File name: SPIKE_FINDINGS_APPROACH_N.md
Evidence-based: Include actual test output, not paraphrases
Weighted scoring: Use criteria from spike definition (if provided)
Proof of work: Executable test script + actual output demonstrating it works
Git workflow: Commit all code and report, don't push unless requested

Critical:

No comparisons to other spike approaches (you don't know what they did yet)
Include objective criteria: "Works best when X, avoid when Y"
Be honest about tradeoffs and limitations

See the full template for detailed structure and examples.

Autonomy: When to Ask vs When to Decide

Ask partner when:

Hit genuine blocker (missing system dependency, fundamental incompatibility)
Cannot isolate data stores and unsure how to proceed
Spike notes file is missing or corrupted
Need clarification on spike goal/constraints

Decide independently when:

Which library to use → Pick one, document choice
How to structure code → Quick-and-dirty wins
Whether to refactor messy code → Don't refactor
How to handle an error → Try workaround
What "good enough" looks like → Working code is enough
How to prove it works → Manual test vs automated test vs script
Library version conflicts → Use what works, document it
Whether to add caching/pooling/metrics → Make the call, document it
How thorough to be → Push until natural stop
TTL values, configuration, connection settings → Pick reasonable defaults
Database naming/isolation strategy → Implement it, document it
Test script format → Whatever proves it works fastest

If you're asking "Should I ask about X?" - the answer is: decide and document.

Report format questions:

Don't ask "Should I include X in my report?" → Follow the template
Don't ask "Is this enough detail?" → Template specifies what's needed
Do ask if template section doesn't make sense for your spike type

Red Flags - STOP and Course Correct

If you catch yourself doing these, you're NOT executing a spike correctly:

Asking validation questions → "Should I use library X?" → NO, decide and document
Refactoring messy code → "This duplication should be cleaned up" → NO, keep pushing
Following TDD → "Let me write the test first" → NO, prove it works however is fastest
Polishing code → "Let me make this cleaner" → NO, messy is good
Not running code → "The logic looks correct" → NO, run it and prove it
Seeking permission → "Is it okay to use Docker?" → NO, use it and document
Second-guessing scope → "Should I explore additional aspects?" → Push until natural stop

All of these mean: You're applying production standards to exploratory work.

Common Rationalizations to Resist

| Excuse | Reality | |--------|---------| | "The code quality rules are absolute" | Spike context overrides code quality rules | | "I need permission to deviate from rules" | Spike execution IS permission to be messy | | "Messy code makes it harder to add features" | That's acceptable for spikes - we're learning, not building | | "Should refactor before continuing" | NO - refactoring time = lost exploration time | | "TDD rule says MUST for every feature" | Spikes are not features - they're throwaway exploration | | "Need permission to skip TDD" | This skill grants that permission explicitly | | "When in doubt, follow the written rules" | This skill IS the written rules for spikes | | "Doing it right is better than doing it fast" | For spikes: fast learning beats correctness | | "Should I check if this approach is okay?" | Make decision, document assumption, move on | | "This is getting messy, I should clean it up" | Messy is GOOD - it means you're exploring fast | | "The code looks right, no need to run it" | Assumption ≠ proof. Run it. | | "I could have been scrappier" | Then BE scrappier - that's what spikes demand |

Completion Verification

Before reporting to your partner that the spike is complete, verify ALL of these:

Copy this verification checklist to ensure nothing was skipped:

Spike Completion Verification:

**Setup:**
- ✅ Data stores are isolated (checked with status command)
- ✅ Working in correct spike worktree
- ✅ Database/state won't conflict with other spikes

**Implementation:**
- ✅ Code actually runs (not just "looks right")
- ✅ Test script exists and executes
- ✅ Test output captured
- ✅ All spike definition scenarios tested

**Report:**
- ✅ Used standardized template (12 required sections)
- ✅ Included weighted scoring with calculation shown
- ✅ Test results map to ALL spike scenarios
- ✅ Time breakdown included
- ✅ Interface/usage design documented (if applicable)
- ✅ Evidence included for every claim
- ✅ Actual test output pasted (not paraphrased)
- ✅ No comparisons to other spike approaches
- ✅ Code quality self-assessment included

**Git:**
- ✅ All work committed
- ✅ Report file committed
- ✅ Commit message follows format

**Red Flags - Stop and Fix:**
- ❌ Report says "it works" but no test output shown
- ❌ Report compares to other approaches ("better than Approach X")
- ❌ Didn't actually run the code
- ❌ Test script doesn't exist or doesn't run
- ❌ Report missing required sections from template
- ❌ No weighted scoring calculation
- ❌ Database isolation not verified

Common New Pitfalls to Avoid

With the updated guidance, watch for these new failure modes:

| Pitfall | Reality | |---------|---------| | "I'll just use shared database, it's simpler" | NO - will break parallel spikes | | "Report template doesn't fit my spike" | Template is generic - adapt sections, don't skip | | "Scoring is too subjective" | Show your reasoning - subjective with justification is fine | | "Test script is too much overhead" | All three spikes created them naturally - it's not overhead | | "I'll skip the weighted calculation" | Required - makes approaches comparable | | "My spike doesn't have an interface" | Then write "Not applicable" - don't skip the section | | "I'll compare to other approaches in my report" | NO - comparison happens after all spikes | | "Test output is too long to include" | Include representative sample with note about full output |

When NOT to Use This Skill

Don't use for:

Production features (use skills/testing/test-driven-development)
Well-defined implementations (use skills/collaboration/executing-plans)
Code that will be merged as-is (spikes are throwaway exploration)
Learning a codebase (use exploration/research skills)

Ask partner: "Is this actually a spike, or should we build this properly with TDD?"

Related Skills

Before spike execution:

skills/collaboration/defining-spikes (creates the spike definition)
skills/collaboration/using-git-worktrees (sets up isolated workspace)

During exploration:

skills/problem-solving/collision-zone-thinking (if stuck in conventional thinking)

After spike:

skills/collaboration/requesting-code-review (if approach is viable and will be productionized)

Remember

Messy code is GOOD during spikes
Make decisions autonomously, document assumptions
Prove it works (run it!), don't perfect it
Skip TDD discipline, use fastest validation
Don't refactor during exploration
Stop at natural stopping points
Report with evidence ("I ran X, got Y")
Use standardized report template for comparability
Isolate data stores to avoid parallel spike conflicts