AI Debug

Figure out why an existing AI feature is broken.

Works with:

Linear MCP - Pull issue/bug details
Manual - Describe the symptoms

Entry Point

When this skill is invoked, start with:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 AI DEBUG
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

When AI fails, teams blame the model.
But 90% of failures are context failures.

What's going wrong?

  1. Provide a Linear issue ID
  2. Describe the symptoms

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Usage

/ai-debug                 # Describe symptoms manually
/ai-debug LIN-123         # Start from Linear bug/issue

What It Does

Works backwards from symptoms to root cause using the 4D audit:

| Symptom | Likely Root Cause | Focus Area | |---------|-------------------|------------| | Hallucinations | Missing domain context, no grounding | D2, D4 | | Inconsistency | Vague job definition, missing rules | D1, D4 | | Generic outputs | Missing user/environment context | D2 | | Wrong tone/format | Missing constraints, no examples | D1, D4 | | Slow responses | Too much context, bad discovery | D2, D3 | | High costs | Dumping everything in prompt | D2, D3 | | Demo vs prod mismatch | Discovery strategy broken | D3, D4 |

Key insight: When AI fails, teams blame the model. But 90% of failures are context failures.

The 4D Audit

D1: Was the Job Defined?

Can you articulate exactly what the model should produce?
Is there a written spec for inputs, outputs, constraints?
Do engineers and PMs agree on what "good" looks like?

D2: Is Context Right?

What context is the model actually receiving?
Walk through the 6 layers: Intent, User, Domain, Rules, Environment, Exposition
Is context structured or dumped as raw text?
Is there too much context (token bloat)?

D3: Is Context Fetched Reliably?

How is each piece of context being fetched at runtime?
What happens when a data source is unavailable?
Is there visibility into what context is used per request?

D4: Are Failures Being Caught?

Are there pre-checks before calling the model?
Are there post-checks validating output?
What's the fallback UX when things break?
Is there a feedback loop capturing failures?

Output

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 CONTEXT AUDIT COMPLETE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Feature: [Name]
Symptoms: [What was reported]

  D1 Demand:      [CLEAR / GAP / CRITICAL]
  D2 Data:        [CLEAR / GAP / CRITICAL]
  D3 Discovery:   [CLEAR / GAP / CRITICAL]
  D4 Defense:     [CLEAR / GAP / CRITICAL]

Primary Issue: [Root cause summary]

RECOMMENDED FIXES (prioritized):
1. [Highest impact fix]
2. [Second fix]
3. [Third fix]

Quick Win: [Smallest change that would help]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Workflow

Collect symptoms (what's going wrong)
Map symptoms to likely causes using the table above
Audit each D dimension with diagnostic questions
Identify root cause and prioritize fixes
Offer to add findings to Linear or export

Questions to ask at each step:

"What specific behavior are you seeing?"
"What should it be doing instead?"
"When did this start happening?"
"Does it happen every time or intermittently?"

Framework: 4D Context Canvas (Aakash Gupta & Miqdad Jaffer) Best for: Debugging hallucinations, inconsistency, performance issues in AI features