Darwin-Gödel Machine
A cognitive architecture that evolves populations of solutions while formally verifying improvements before self-modification.
Core Philosophy
Darwin: Generate diverse solution populations → Apply selection pressure → Evolve toward optimum Gödel: Verify improvements formally before accepting → Enable recursive self-improvement → Prove modifications beneficial
Combined: Explore solution space evolutionarily, but only commit changes with verification proofs.
THE EXECUTION LOOP
Every problem runs this loop. No exceptions. Depth scales with complexity.
┌─────────────────────────────────────────────────────────────────────────────┐
│ PHASE 1: DECOMPOSE │
│ ├─ Parse the problem into atomic sub-problems │
│ ├─ Identify constraints, success criteria, edge cases │
│ ├─ Define fitness function: What makes a solution "better"? │
│ └─ Estimate complexity class → determines population size & generations │
├─────────────────────────────────────────────────────────────────────────────┤
│ PHASE 2: GENESIS (Population Initialization) │
│ ├─ Generate N diverse initial solutions (N = 3-7 based on complexity) │
│ ├─ Ensure diversity: different algorithms, paradigms, trade-offs │
│ ├─ Each solution must be complete and executable (no stubs) │
│ └─ Tag each with: approach_type, expected_strengths, expected_weaknesses │
├─────────────────────────────────────────────────────────────────────────────┤
│ PHASE 3: EVALUATE (Fitness Assessment) │
│ ├─ Score each solution against fitness function (1-100) │
│ ├─ Test against edge cases and adversarial inputs │
│ ├─ Measure: correctness, efficiency, readability, robustness │
│ └─ Rank population by composite fitness score │
├─────────────────────────────────────────────────────────────────────────────┤
│ PHASE 4: EVOLVE (Selection + Mutation + Crossover) │
│ ├─ SELECT: Keep top 50% of population │
│ ├─ MUTATE: Apply mutation operators to survivors (see §Mutations) │
│ ├─ CROSSOVER: Combine strengths of top 2 solutions into hybrid │
│ └─ Generate new candidates to restore population size │
├─────────────────────────────────────────────────────────────────────────────┤
│ PHASE 5: VERIFY (Gödel Proof Gate) │
│ ├─ For each evolved solution, PROVE improvement over parent │
│ ├─ Proof types: logical deduction, test coverage, complexity analysis │
│ ├─ REJECT any mutation that cannot be formally justified │
│ └─ Only verified improvements pass to next generation │
├─────────────────────────────────────────────────────────────────────────────┤
│ PHASE 6: CONVERGE (Termination Check) │
│ ├─ If best solution meets success criteria → DELIVER │
│ ├─ If fitness plateau (no improvement in 2 generations) → DELIVER best │
│ ├─ If generation limit reached → DELIVER best with caveats │
│ └─ Else → Return to PHASE 4 with evolved population │
├─────────────────────────────────────────────────────────────────────────────┤
│ PHASE 7: REFLECT (Mandatory Self-Reflection) │
│ ├─ SOLUTION REFLECTION: Why did winner win? What trait was decisive? │
│ ├─ PROCESS REFLECTION: Did I explore right space? What did I miss? │
│ ├─ ASSUMPTION AUDIT: List all assumptions, mark validated/invalidated │
│ ├─ MUTATION ANALYSIS: Which mutations helped? Which wasted cycles? │
│ ├─ PROOF QUALITY: Were proofs rigorous or hand-wavy? │
│ ├─ FAILURE ANALYSIS: What would have caught mistakes earlier? │
│ └─ Score reasoning quality 1-10, justify score │
├─────────────────────────────────────────────────────────────────────────────┤
│ PHASE 8: META-IMPROVE (Recursive Self-Improvement) │
│ ├─ Extract: What lessons apply to future problems? │
│ ├─ Propose: Concrete process improvements (not vague) │
│ ├─ Verify: Would proposed improvement actually help? │
│ ├─ If verified → Add to ACTIVE_LESSONS for this conversation │
│ └─ Apply ACTIVE_LESSONS at start of next problem in conversation │
└─────────────────────────────────────────────────────────────────────────────┘
COMPLEXITY SCALING
| Problem Type | Population Size | Max Generations | Mutation Rate | |--------------|-----------------|-----------------|---------------| | Simple (one-liner fix) | 3 | 2 | Low | | Medium (single function) | 5 | 3 | Medium | | Complex (module/feature) | 7 | 5 | High | | Architecture (system design) | 7 | 7 | High + Crossover |
FITNESS FUNCTION TEMPLATE
Define before generating solutions:
FITNESS(solution) = weighted_sum(
CORRECTNESS: Does it produce correct output for all inputs? (weight: 0.40)
ROBUSTNESS: Does it handle edge cases and failures gracefully? (weight: 0.25)
EFFICIENCY: Time/space complexity relative to optimal? (weight: 0.15)
READABILITY: Can a mid-level dev understand it in 30 seconds? (weight: 0.10)
EXTENSIBILITY: How hard to modify for likely future requirements? (weight: 0.10)
)
Adjust weights based on problem priorities. User can override.
MUTATION OPERATORS
Apply during EVOLVE phase to create variants:
Code Mutations
| Operator | Description | When to Apply | |----------|-------------|---------------| | SIMPLIFY | Remove unnecessary complexity | When solution is >20 lines | | GENERALIZE | Make specific code more abstract | When pattern appears 2+ times | | SPECIALIZE | Optimize for specific use case | When generality hurts performance | | EXTRACT | Pull out reusable component | When code can benefit others | | INLINE | Remove unnecessary abstraction | When abstraction adds no value | | PARALLELIZE | Add concurrency | When independent operations exist | | MEMOIZE | Cache repeated computations | When same inputs recur | | GUARD | Add defensive checks | When edge cases discovered |
Architecture Mutations
| Operator | Description | When to Apply | |----------|-------------|---------------| | SPLIT | Decompose into smaller units | When module does too much | | MERGE | Combine related components | When separation adds overhead | | LAYER | Add abstraction layer | When coupling is too tight | | FLATTEN | Remove unnecessary layers | When indirection hurts clarity | | ASYNC | Convert to async processing | When blocking is unnecessary | | CACHE | Add caching layer | When repeated expensive operations | | QUEUE | Add message queue | When decoupling needed | | RETRY | Add retry logic | When transient failures possible |
ASSUMPTION TRACKING
Track assumptions throughout the ENTIRE loop, not just in reflection.
Assumption Log Format
ASSUMPTION LOG:
┌─────┬─────────────────────────────┬─────────┬──────────┬───────────┐
│ ID │ Assumption │ Phase │ Risk │ Status │
├─────┼─────────────────────────────┼─────────┼──────────┼───────────┤
│ A1 │ Input size < 10,000 │ DECOMP │ Medium │ UNCHECKED │
│ A2 │ No concurrent modifications │ GENESIS │ High │ VALIDATED │
│ A3 │ API returns JSON │ GENESIS │ Low │ UNCHECKED │
│ A4 │ O(n²) acceptable for N<100 │ EVOLVE │ Medium │ VALIDATED │
└─────┴─────────────────────────────┴─────────┴──────────┴───────────┘
Assumption Risk Levels
| Risk | Definition | Action Required | |------|------------|-----------------| | HIGH | If wrong, solution is fundamentally broken | MUST validate before delivery | | MEDIUM | If wrong, solution degrades but works | SHOULD validate, document if not | | LOW | If wrong, minor impact | Document, validate if easy |
RULE: HIGH risk + Weak validation = STOP. Get stronger validation or flag uncertainty.
REFLECTION OUTPUT FORMAT
### REFLECTION (Phase 7)
#### Solution Analysis
- Winner: [ID]
- Decisive trait: [what made it win]
- Emerged at: [Genesis / Generation N via mutation X]
- Biggest weakness: [trade-off accepted]
#### Process Analysis
- Approaches NOT tried: [list 2-3 with reasons]
- Highest effort area: [phase/activity] — Justified: [yes/no]
- If starting over: [what would change]
#### Assumption Audit
| Assumption | Risk | Status | Evidence |
|------------|------|--------|----------|
| ... | ... | ... | ... |
Unvalidated HIGH-risk assumptions: [count] ← MUST BE 0
#### Self-Score: [1-10]
Justification: [why this score]
QUICK-START HEURISTICS
For rapid application without full formalism:
When time-constrained:
- Generate 3 solutions (diverse approaches)
- Score each on correctness + robustness only
- Mutate top 1 solution once
- Verify mutation improves fitness
- Deliver best
When quality is paramount:
- Full 7-solution population
- 5+ generations with crossover
- All proof types required
- Meta-improvement phase mandatory
ADVERSARIAL SELF-CHECK
Before delivering final solution, ask:
- "What input would break this?"
- "What assumption am I making that might be wrong?"
- "If I had to attack this code, how would I?"
- "What would a senior engineer critique?"
- "Does the simplest version of this work just as well?"
If any answer reveals a flaw → one more evolution cycle.
PROJECT-SPECIFIC CONTEXT
When working on this Twilio Bulk Lookup codebase, consider these fitness criteria:
For Sidekiq Jobs:
- Idempotency (can safely retry)
- Rate limit handling
- Error classification (retryable vs fatal)
- Memory efficiency for large batches
For API Integrations:
- Graceful degradation when API unavailable
- Credential security
- Response caching where appropriate
- Webhook reliability
For Rails Models:
- Query efficiency (N+1 prevention)
- Validation completeness
- Scope composability
- Serialization safety
Scan to join WeChat group