Metric Diagnosis Workflow

Purpose

Systematically investigate why a metric changed unexpectedly (dropped or spiked) by segmenting data, distinguishing internal vs. external factors, and testing hypotheses to identify root cause. Prevents rushing to wrong conclusions and ensures evidence-based responses.

When to Use This Workflow

Use this workflow when:

Key metrics drop or spike unexpectedly
Leadership asks "what happened to [metric]?"
Post-mortem analysis needed after incident
A/B test shows unexpected patterns
Need to validate suspected root cause
User behavior changed suddenly without obvious reason

Skills Sequence

This workflow applies the root-cause-diagnosis skill three times, then assesses North Star impact:

1. Root Cause Diagnosis (Phase 1)
   ↓ (Segment across 4 dimensions: People, Geography, Technology, Time)
2. Root Cause Diagnosis (Phase 2)
   ↓ (Distinguish intrinsic vs extrinsic factors)
3. Root Cause Diagnosis (Phase 3)
   ↓ (Build hypothesis table and test against data)
4. North Star Alignment
   ↓ (Assess impact on top-line metrics)
   
OUTPUT: Narrowed scope, tested hypotheses, identified root cause, recommended actions

Required Inputs

Gather this information before starting:

Metric Change Information

The metric that changed
- Name and definition
- Example: "Weekly Active Users (WAU)"
Magnitude of change
- Absolute and percentage
- Example: "10,000 → 9,000 (10% drop)"
Timeframe
- When was it noticed?
- Over what period did change occur?
- Example: "Noticed today, change happened over past week"

Data Access

Segmented data availability
- Can you segment by user type, geography, platform?
- Are raw logs accessible?
Comparison data
- Historical baselines
- Year-over-year seasonality
- Competitor benchmarks (if available)

Recent Activity Log

Recent releases (past 1-4 weeks)
- Feature launches
- Bug fixes
- Infrastructure changes
Running experiments
- A/B tests active
- Configuration changes
External context
- Marketing campaigns
- News events
- Competitor activity

Workflow Steps

Phase 0: Data Quality Verification (5 minutes)

Before investigating, confirm the data is real:

Check reporting systems
- Is tracking still functional?
- Any pipeline issues?
- Data freshness correct?
Cross-reference sources
- Do multiple data sources show same pattern?
- Internal dashboards vs. external tools aligned?
Validate sample sizes
- Sufficient data for statistical significance?
- Time periods comparable?

Red flags that indicate data quality issues:

Metric went to exactly zero
Change happened at exact midnight
Only visible in one data source
Coincides with known tracking deployment

If data quality issue found: Fix tracking first, then re-investigate if problem persists.

If data is valid: Proceed to Phase 1.

Phase 1: Narrow the Scope (15-20 minutes)

Use the root-cause-diagnosis skill - 4 Dimension Segmentation

Ask clarifying questions across all four dimensions:

Dimension 1: People (User Segments)

Questions to ask:

Does it affect all users equally?
Specific demographics (age, gender, location)?
New vs. returning users?
Free vs. paid users?
Power users vs. casual users?
Specific cohorts or user types?

Request data segmentation by user attributes.

Dimension 2: Geography

Questions to ask:

Specific countries or regions affected?
Certain cities?
Urban vs. rural?
Climate zones?
Time zones?

Request data segmentation by location.

Dimension 3: Technology (Platform)

Questions to ask:

iOS vs. Android?
Web vs. mobile app?
Desktop vs. mobile web?
Specific browser or OS versions?
Device types?
Network types (WiFi vs. cellular)?

Request data segmentation by platform/device.

Dimension 4: Time

Questions to ask:

When exactly did it start?
Gradual decline or sudden drop?
Day of week patterns?
Time of day patterns?
Recurring or one-time?
Seasonal effects?

Request time-series data with granular breakdown.

Output of Phase 1:

Narrow problem statement from:

❌ "Usage is down"

To:

✓ "iOS usage among teens in US dropped 10% starting Sept 1st"

Document:

## Narrowed Scope

**Original statement:** [Broad problem]
**Narrowed statement:** [Specific problem]

**Affected segments:**
- People: [Which user groups]
- Geography: [Which locations]
- Technology: [Which platforms]
- Time: [When and pattern]

**Unaffected segments:**
- [What stayed stable - equally important]

Phase 2: Generate Hypotheses (15-20 minutes)

Use the root-cause-diagnosis skill - Intrinsic vs. Extrinsic Factor Analysis

Brainstorm potential causes in both categories:

Intrinsic Factors (Internal Changes)

Engineering & Product:

Recent code releases (check deploy logs)
Bug introductions
Feature launches by any team
A/B experiments started/stopped
Algorithm or ranking changes
Performance degradations

Check with:

Engineering team (deploys, errors, performance)
Other product teams (their launches)
Data team (tracking changes)

Operations & Business:

Pricing changes
Policy updates
Support process changes
Marketing campaign changes

Check with:

Operations team
Marketing team
Customer success team

Extrinsic Factors (External Events)

Competition:

Competitor product launches
Competitor pricing changes
Competitor marketing campaigns

Market:

Economic shifts
Industry trends
Regulatory changes
Platform policy changes (iOS, Android, etc.)

Calendar/Seasonal:

Holidays
School schedules
Weather patterns
Cultural events

News/PR:

News coverage (positive or negative)
Social media trends
External events affecting user behavior

Check with:

Sales team (competitive intelligence)
Customer success (user feedback)
Marketing (market trends)
Industry news sources

Output of Phase 2:

List of 4-6 hypotheses (mix of intrinsic and extrinsic):

## Hypothesis List

**Intrinsic (Internal):**
1. [Hypothesis name]: [Description]
2. [Hypothesis name]: [Description]
3. [Hypothesis name]: [Description]

**Extrinsic (External):**
4. [Hypothesis name]: [Description]
5. [Hypothesis name]: [Description]
6. [Hypothesis name]: [Description]

Phase 3: Test Hypotheses (20-30 minutes)

Use the root-cause-diagnosis skill - Hypothesis Table Method

Step 3.1: Create Hypothesis Table

Columns: Data points observed (from Phase 1) Rows: Potential causes (from Phase 2)

Fill each cell with predicted impact:

↓ = Would decrease this segment
↑ = Would increase this segment
→ = No effect on this segment
? = Uncertain

Step 3.2: Match Patterns

For each hypothesis:

Do predictions match observations?
If ALL predictions match → Possible cause
If ANY predictions contradict → Rule out

Step 3.3: Request Differentiating Data

If multiple hypotheses fit:

Identify data points that would differ between them
Request additional segmentation
Narrow to single most likely cause

Example Table:

| Cause | iOS ↓ | Android → | Teens ↓ | Adults → | US ↓ | EU → | |-------|-------|-----------|---------|----------|------|------| | iOS bug in latest update | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | | School started (US only) | ✗ | ✗ | ✓ | ✓ | ✓ | ✓ | | Competitor targeted teens | ✗ | ✗ | ✓ | ✓ | ✗ | ✗ |

Pattern match: "School started (US)" fits all observations → Most likely cause

Output of Phase 3:

## Hypothesis Testing Results

**Ruled out:**
- [Hypothesis]: [Why ruled out]
- [Hypothesis]: [Why ruled out]

**Possible causes (remaining):**
1. [Most likely hypothesis]
   - Evidence supporting: [List]
   - Evidence against: [List if any]
   - Confidence: [High/Medium/Low]

2. [Second most likely] (if applicable)
   - Evidence supporting: [List]
   - Confidence: [High/Medium/Low]

**Additional data needed to confirm:**
- [Data point that would differentiate remaining hypotheses]

Phase 4: Assess North Star Impact (10 minutes)

Use the north-star-alignment skill

Evaluate whether identified root cause impacts company-level metrics:

Questions:

Does this affect North Star metrics?
Is impact temporary or permanent?
Does it require immediate action?
What's the business impact (revenue, growth, retention)?

Severity classification:

Critical (Act immediately):

Impacts revenue directly
Affects North Star metrics substantially
Permanent degradation if not fixed
Competitive disadvantage

Important (Act soon):

Impacts user experience
Affects intermediate metrics
May worsen over time
Needs monitoring

Low Priority (Monitor):

Expected variation (seasonality)
Affects non-critical segments
Self-resolving
No action needed

Output of Phase 4:

## North Star Impact Assessment

**Affected North Star metrics:**
- [Metric]: [Impact description]
- [Metric]: [Impact description]

**Business impact:**
- Revenue: [Direct/Indirect/None]
- Growth: [Slowed/Accelerated/Neutral]
- Retention: [Harmed/Improved/Neutral]

**Severity:** [Critical/Important/Low Priority]

**Urgency:** [Immediate/This week/Monitor]

Phase 5: Recommend Actions (10 minutes)

Based on root cause and severity, recommend next steps:

Action categories:

Fix (for bugs/issues)
- What needs to be fixed?
- Who owns the fix?
- Timeline for fix?
Mitigate (for external factors)
- What can reduce harm?
- Counter-measures available?
- Timeline for mitigation?
Monitor (for expected variations)
- What tracking is needed?
- Alert thresholds?
- Review cadence?
No Action (for acceptable changes)
- Why no action needed?
- What validates this decision?

Output document:

# Metric Diagnosis Report: [Metric Name]

## Executive Summary
[One paragraph: What happened, why, what to do]

## Problem Statement
- **Initial observation:** [Broad statement]
- **Narrowed scope:** [Specific statement with segments]
- **Magnitude:** [Absolute and % change]
- **Timeframe:** [When occurred]

## Root Cause Analysis

### Narrowed Scope (Phase 1)
- **Affected:** [User segments, geography, platforms, time pattern]
- **Unaffected:** [What stayed stable]

### Hypotheses Tested (Phase 2-3)
| Hypothesis | Category | Evidence | Conclusion |
|------------|----------|----------|------------|
| [Name] | Intrinsic/Extrinsic | [Supporting/Contradicting] | Ruled Out/Possible/Likely |

### Identified Root Cause
**Most likely cause:** [Hypothesis name]

**Evidence supporting:**
1. [Data point matching prediction]
2. [Data point matching prediction]
3. [Stakeholder confirmation]

**Confidence level:** [High/Medium/Low]

### North Star Impact (Phase 4)
- **Affected metrics:** [List with impact]
- **Severity:** [Critical/Important/Low]
- **Urgency:** [Immediate/Soon/Monitor]

## Recommended Actions

### Primary action:
**[Fix/Mitigate/Monitor/No Action]**
- What: [Specific action]
- Who: [Owner]
- When: [Timeline]
- Success criteria: [How to know it worked]

### Secondary actions:
- [Action 1]
- [Action 2]

### Monitoring plan:
- Metrics to track: [List]
- Review frequency: [Daily/Weekly]
- Alert thresholds: [When to escalate]

## Appendix
- Hypothesis table: [Detailed]
- Data sources: [Links]
- Stakeholders consulted: [Names/teams]

Common Diagnostic Patterns

Pattern 1: Seasonal Variation

Symptoms: Recurring pattern (weekly, yearly)
Investigation: Compare to historical same period
Action: Usually monitor, adjust baselines
Example: School year start, holiday season

Pattern 2: Competitor Impact

Symptoms: Sudden shift, specific demographic
Investigation: Check competitor news, market research
Action: Counter-strategy, feature parity
Example: Competing app launch with targeted marketing

Pattern 3: Internal Bug

Symptoms: Sudden drop, platform-specific
Investigation: Check recent releases, error logs
Action: Hotfix, rollback if needed
Example: iOS update broke key feature

Pattern 4: Product Change Side Effect

Symptoms: Gradual shift after release
Investigation: Analyze feature adoption patterns
Action: Iterate on feature, A/B test variations
Example: Redesign changed user behavior

Pattern 5: Tracking Issue (False Alarm)

Symptoms: Exact numbers, midnight timing, single source
Investigation: Verify data pipeline
Action: Fix tracking, dismiss false alarm
Example: Analytics tag removed in deploy

Common Mistakes

| Mistake | Fix | |---------|-----| | Jumping to conclusions without segmentation | Complete Phase 1 (4D segmentation) first | | Only considering one type of factor | Brainstorm both intrinsic AND extrinsic | | Accepting first matching hypothesis | Test all hypotheses systematically | | Skipping data quality check | Always verify data is valid first | | Not consulting stakeholders | Talk to sales, CS, engineering teams | | Immediate rollback without diagnosis | Understand cause before acting |

Success Criteria

Metric diagnosis succeeds when:

Data quality verified
Problem narrowed to specific segments (4 dimensions)
4-6 hypotheses brainstormed (intrinsic + extrinsic)
Hypotheses tested against data systematically
Root cause identified (or top 2-3 candidates with confidence levels)
North Star impact assessed
Recommended actions clear with owners and timelines
Diagnosis document created and shared
Stakeholders aligned on next steps

Real-World Example: Microsoft Teams Downloads Drop

Initial Report

"New app downloads down 50% from last week"

Phase 1: Narrow Scope (15 min)

Questions asked:

Geography? → "Predominantly US and Europe"
User type? → "Enterprise customers, not consumers"
Platform? → "Web and desktop primarily, some mobile"
Other metrics? → "Usage unchanged (existing users still active)"

Narrowed statement: "Enterprise customer downloads via web/desktop down 50% in US/Europe, starting last week. Existing user engagement unchanged."

Phase 2: Generate Hypotheses (15 min)

Intrinsic:

Bug in download flow
Recent release broke something
A/B test reducing visibility

Extrinsic: 4. Marketing promotion ended 5. Competitor launch 6. Economic downturn affecting enterprise spending

Stakeholder consults:

Engineering: No recent releases, no bugs reported
Marketing: "We were running 50% off for enterprises switching from Slack"
Sales: No competitor news, pipeline healthy

Phase 3: Test Hypotheses (20 min)

| Hypothesis | Enterprise | Consumer | US/EU | Usage Same | Downloads Only | |------------|------------|----------|-------|------------|----------------| | Bug in flow | ✗ Both | ✗ Both | ✗ All | ? | ✓ | | Promo ended | ✓ | ✗ | ✓ | ✓ | ✓ | | Competitor | ✗ Both | ✗ Both | ✗ All | ? | ✓ |

Pattern match: "Promotion ended" fits all observations

Confirmation: Marketing confirms promotion ended last weekend

Phase 4: North Star Impact (5 min)

Root cause: End of promotional campaign

Downloads during promo = artificially inflated baseline
Current downloads = return to normal baseline

North Star impact:

MAU: No impact (existing users unaffected)
Revenue: No impact (existing customers retained)
Severity: Low (false alarm, not actual problem)

Phase 5: Recommended Actions (5 min)

Primary action: No action needed

Not an actual decline, just return to baseline
Adjust baselines for future comparisons

Secondary actions:

Update dashboards to flag promotional periods
Establish "normal" baseline for future monitoring
Document for future reference (prevent false alarms)

Time to diagnosis: 60 minutes

Related Skills

This workflow orchestrates these skills:

root-cause-diagnosis (Phases 1-3) - Core investigation method
north-star-alignment (Phase 4) - Assess strategic impact

Related Workflows

tradeoff-decision: May use diagnosis to understand why metrics changed before deciding
metrics-definition: Uses similar systematic approach for defining vs. diagnosing

Time Estimate

Total: 60-90 minutes

Phase 0 (Data quality): 5 min
Phase 1 (Narrow scope): 15-20 min
Phase 2 (Hypotheses): 15-20 min
Phase 3 (Test): 20-30 min
Phase 4 (Impact): 10 min
Phase 5 (Actions): 10 min

For complex issues, may require multiple iterations or longer investigation time.

metric-diagnosis

Metric Diagnosis Workflow

Purpose

When to Use This Workflow

Skills Sequence

Required Inputs

Metric Change Information

Data Access

Recent Activity Log

Workflow Steps

Phase 0: Data Quality Verification (5 minutes)

Phase 1: Narrow the Scope (15-20 minutes)

Dimension 1: People (User Segments)

Dimension 2: Geography

Dimension 3: Technology (Platform)

Dimension 4: Time

Phase 2: Generate Hypotheses (15-20 minutes)

Intrinsic Factors (Internal Changes)

Extrinsic Factors (External Events)

Phase 3: Test Hypotheses (20-30 minutes)

Step 3.1: Create Hypothesis Table

Step 3.2: Match Patterns

Step 3.3: Request Differentiating Data

Phase 4: Assess North Star Impact (10 minutes)

Phase 5: Recommend Actions (10 minutes)

Common Diagnostic Patterns

Pattern 1: Seasonal Variation

Pattern 2: Competitor Impact

Pattern 3: Internal Bug

Pattern 4: Product Change Side Effect

Pattern 5: Tracking Issue (False Alarm)

Common Mistakes

Success Criteria

Real-World Example: Microsoft Teams Downloads Drop

Initial Report

Phase 1: Narrow Scope (15 min)

Phase 2: Generate Hypotheses (15 min)

Phase 3: Test Hypotheses (20 min)

Phase 4: North Star Impact (5 min)

Phase 5: Recommended Actions (5 min)

Related Skills

Related Workflows

Time Estimate