Back to skills
extension
Category: Data & AnalyticsNo API key required

metric-diagnosis

Use when investigating unexpected metric changes - systematically narrows root cause through 4D segmentation, intrinsic vs extrinsic factor analysis, hypothesis testing, and North Star impact assessment

personAuthor: jakexiaohubgithub

Metric Diagnosis Workflow

Purpose

Systematically investigate why a metric changed unexpectedly (dropped or spiked) by segmenting data, distinguishing internal vs. external factors, and testing hypotheses to identify root cause. Prevents rushing to wrong conclusions and ensures evidence-based responses.

When to Use This Workflow

Use this workflow when:

  • Key metrics drop or spike unexpectedly
  • Leadership asks "what happened to [metric]?"
  • Post-mortem analysis needed after incident
  • A/B test shows unexpected patterns
  • Need to validate suspected root cause
  • User behavior changed suddenly without obvious reason

Skills Sequence

This workflow applies the root-cause-diagnosis skill three times, then assesses North Star impact:

1. Root Cause Diagnosis (Phase 1)
   ↓ (Segment across 4 dimensions: People, Geography, Technology, Time)
2. Root Cause Diagnosis (Phase 2)
   ↓ (Distinguish intrinsic vs extrinsic factors)
3. Root Cause Diagnosis (Phase 3)
   ↓ (Build hypothesis table and test against data)
4. North Star Alignment
   ↓ (Assess impact on top-line metrics)
   
OUTPUT: Narrowed scope, tested hypotheses, identified root cause, recommended actions

Required Inputs

Gather this information before starting:

Metric Change Information

  • The metric that changed
    • Name and definition
    • Example: "Weekly Active Users (WAU)"
  • Magnitude of change
    • Absolute and percentage
    • Example: "10,000 → 9,000 (10% drop)"
  • Timeframe
    • When was it noticed?
    • Over what period did change occur?
    • Example: "Noticed today, change happened over past week"

Data Access

  • Segmented data availability
    • Can you segment by user type, geography, platform?
    • Are raw logs accessible?
  • Comparison data
    • Historical baselines
    • Year-over-year seasonality
    • Competitor benchmarks (if available)

Recent Activity Log

  • Recent releases (past 1-4 weeks)
    • Feature launches
    • Bug fixes
    • Infrastructure changes
  • Running experiments
    • A/B tests active
    • Configuration changes
  • External context
    • Marketing campaigns
    • News events
    • Competitor activity

Workflow Steps

Phase 0: Data Quality Verification (5 minutes)

Before investigating, confirm the data is real:

  1. Check reporting systems

    • Is tracking still functional?
    • Any pipeline issues?
    • Data freshness correct?
  2. Cross-reference sources

    • Do multiple data sources show same pattern?
    • Internal dashboards vs. external tools aligned?
  3. Validate sample sizes

    • Sufficient data for statistical significance?
    • Time periods comparable?

Red flags that indicate data quality issues:

  • Metric went to exactly zero
  • Change happened at exact midnight
  • Only visible in one data source
  • Coincides with known tracking deployment

If data quality issue found: Fix tracking first, then re-investigate if problem persists.

If data is valid: Proceed to Phase 1.

Phase 1: Narrow the Scope (15-20 minutes)

Use the root-cause-diagnosis skill - 4 Dimension Segmentation

Ask clarifying questions across all four dimensions:

Dimension 1: People (User Segments)

Questions to ask:

  • Does it affect all users equally?
  • Specific demographics (age, gender, location)?
  • New vs. returning users?
  • Free vs. paid users?
  • Power users vs. casual users?
  • Specific cohorts or user types?

Request data segmentation by user attributes.

Dimension 2: Geography

Questions to ask:

  • Specific countries or regions affected?
  • Certain cities?
  • Urban vs. rural?
  • Climate zones?
  • Time zones?

Request data segmentation by location.

Dimension 3: Technology (Platform)

Questions to ask:

  • iOS vs. Android?
  • Web vs. mobile app?
  • Desktop vs. mobile web?
  • Specific browser or OS versions?
  • Device types?
  • Network types (WiFi vs. cellular)?

Request data segmentation by platform/device.

Dimension 4: Time

Questions to ask:

  • When exactly did it start?
  • Gradual decline or sudden drop?
  • Day of week patterns?
  • Time of day patterns?
  • Recurring or one-time?
  • Seasonal effects?

Request time-series data with granular breakdown.

Output of Phase 1:

Narrow problem statement from:

  • ❌ "Usage is down"

To:

  • ✓ "iOS usage among teens in US dropped 10% starting Sept 1st"

Document:

## Narrowed Scope

**Original statement:** [Broad problem]
**Narrowed statement:** [Specific problem]

**Affected segments:**
- People: [Which user groups]
- Geography: [Which locations]
- Technology: [Which platforms]
- Time: [When and pattern]

**Unaffected segments:**
- [What stayed stable - equally important]

Phase 2: Generate Hypotheses (15-20 minutes)

Use the root-cause-diagnosis skill - Intrinsic vs. Extrinsic Factor Analysis

Brainstorm potential causes in both categories:

Intrinsic Factors (Internal Changes)

Engineering & Product:

  • Recent code releases (check deploy logs)
  • Bug introductions
  • Feature launches by any team
  • A/B experiments started/stopped
  • Algorithm or ranking changes
  • Performance degradations

Check with:

  • Engineering team (deploys, errors, performance)
  • Other product teams (their launches)
  • Data team (tracking changes)

Operations & Business:

  • Pricing changes
  • Policy updates
  • Support process changes
  • Marketing campaign changes

Check with:

  • Operations team
  • Marketing team
  • Customer success team

Extrinsic Factors (External Events)

Competition:

  • Competitor product launches
  • Competitor pricing changes
  • Competitor marketing campaigns

Market:

  • Economic shifts
  • Industry trends
  • Regulatory changes
  • Platform policy changes (iOS, Android, etc.)

Calendar/Seasonal:

  • Holidays
  • School schedules
  • Weather patterns
  • Cultural events

News/PR:

  • News coverage (positive or negative)
  • Social media trends
  • External events affecting user behavior

Check with:

  • Sales team (competitive intelligence)
  • Customer success (user feedback)
  • Marketing (market trends)
  • Industry news sources

Output of Phase 2:

List of 4-6 hypotheses (mix of intrinsic and extrinsic):

## Hypothesis List

**Intrinsic (Internal):**
1. [Hypothesis name]: [Description]
2. [Hypothesis name]: [Description]
3. [Hypothesis name]: [Description]

**Extrinsic (External):**
4. [Hypothesis name]: [Description]
5. [Hypothesis name]: [Description]
6. [Hypothesis name]: [Description]

Phase 3: Test Hypotheses (20-30 minutes)

Use the root-cause-diagnosis skill - Hypothesis Table Method

Step 3.1: Create Hypothesis Table

Columns: Data points observed (from Phase 1) Rows: Potential causes (from Phase 2)

Fill each cell with predicted impact:

  • ↓ = Would decrease this segment
  • ↑ = Would increase this segment
  • → = No effect on this segment
  • ? = Uncertain

Step 3.2: Match Patterns

For each hypothesis:

  • Do predictions match observations?
  • If ALL predictions match → Possible cause
  • If ANY predictions contradict → Rule out

Step 3.3: Request Differentiating Data

If multiple hypotheses fit:

  • Identify data points that would differ between them
  • Request additional segmentation
  • Narrow to single most likely cause

Example Table:

| Cause | iOS ↓ | Android → | Teens ↓ | Adults → | US ↓ | EU → | |-------|-------|-----------|---------|----------|------|------| | iOS bug in latest update | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | | School started (US only) | ✗ | ✗ | ✓ | ✓ | ✓ | ✓ | | Competitor targeted teens | ✗ | ✗ | ✓ | ✓ | ✗ | ✗ |

Pattern match: "School started (US)" fits all observations → Most likely cause

Output of Phase 3:

## Hypothesis Testing Results

**Ruled out:**
- [Hypothesis]: [Why ruled out]
- [Hypothesis]: [Why ruled out]

**Possible causes (remaining):**
1. [Most likely hypothesis]
   - Evidence supporting: [List]
   - Evidence against: [List if any]
   - Confidence: [High/Medium/Low]

2. [Second most likely] (if applicable)
   - Evidence supporting: [List]
   - Confidence: [High/Medium/Low]

**Additional data needed to confirm:**
- [Data point that would differentiate remaining hypotheses]

Phase 4: Assess North Star Impact (10 minutes)

Use the north-star-alignment skill

Evaluate whether identified root cause impacts company-level metrics:

Questions:

  • Does this affect North Star metrics?
  • Is impact temporary or permanent?
  • Does it require immediate action?
  • What's the business impact (revenue, growth, retention)?

Severity classification:

Critical (Act immediately):

  • Impacts revenue directly
  • Affects North Star metrics substantially
  • Permanent degradation if not fixed
  • Competitive disadvantage

Important (Act soon):

  • Impacts user experience
  • Affects intermediate metrics
  • May worsen over time
  • Needs monitoring

Low Priority (Monitor):

  • Expected variation (seasonality)
  • Affects non-critical segments
  • Self-resolving
  • No action needed

Output of Phase 4:

## North Star Impact Assessment

**Affected North Star metrics:**
- [Metric]: [Impact description]
- [Metric]: [Impact description]

**Business impact:**
- Revenue: [Direct/Indirect/None]
- Growth: [Slowed/Accelerated/Neutral]
- Retention: [Harmed/Improved/Neutral]

**Severity:** [Critical/Important/Low Priority]

**Urgency:** [Immediate/This week/Monitor]

Phase 5: Recommend Actions (10 minutes)

Based on root cause and severity, recommend next steps:

Action categories:

  1. Fix (for bugs/issues)

    • What needs to be fixed?
    • Who owns the fix?
    • Timeline for fix?
  2. Mitigate (for external factors)

    • What can reduce harm?
    • Counter-measures available?
    • Timeline for mitigation?
  3. Monitor (for expected variations)

    • What tracking is needed?
    • Alert thresholds?
    • Review cadence?
  4. No Action (for acceptable changes)

    • Why no action needed?
    • What validates this decision?

Output document:

# Metric Diagnosis Report: [Metric Name]

## Executive Summary
[One paragraph: What happened, why, what to do]

## Problem Statement
- **Initial observation:** [Broad statement]
- **Narrowed scope:** [Specific statement with segments]
- **Magnitude:** [Absolute and % change]
- **Timeframe:** [When occurred]

## Root Cause Analysis

### Narrowed Scope (Phase 1)
- **Affected:** [User segments, geography, platforms, time pattern]
- **Unaffected:** [What stayed stable]

### Hypotheses Tested (Phase 2-3)
| Hypothesis | Category | Evidence | Conclusion |
|------------|----------|----------|------------|
| [Name] | Intrinsic/Extrinsic | [Supporting/Contradicting] | Ruled Out/Possible/Likely |

### Identified Root Cause
**Most likely cause:** [Hypothesis name]

**Evidence supporting:**
1. [Data point matching prediction]
2. [Data point matching prediction]
3. [Stakeholder confirmation]

**Confidence level:** [High/Medium/Low]

### North Star Impact (Phase 4)
- **Affected metrics:** [List with impact]
- **Severity:** [Critical/Important/Low]
- **Urgency:** [Immediate/Soon/Monitor]

## Recommended Actions

### Primary action:
**[Fix/Mitigate/Monitor/No Action]**
- What: [Specific action]
- Who: [Owner]
- When: [Timeline]
- Success criteria: [How to know it worked]

### Secondary actions:
- [Action 1]
- [Action 2]

### Monitoring plan:
- Metrics to track: [List]
- Review frequency: [Daily/Weekly]
- Alert thresholds: [When to escalate]

## Appendix
- Hypothesis table: [Detailed]
- Data sources: [Links]
- Stakeholders consulted: [Names/teams]

Common Diagnostic Patterns

Pattern 1: Seasonal Variation

  • Symptoms: Recurring pattern (weekly, yearly)
  • Investigation: Compare to historical same period
  • Action: Usually monitor, adjust baselines
  • Example: School year start, holiday season

Pattern 2: Competitor Impact

  • Symptoms: Sudden shift, specific demographic
  • Investigation: Check competitor news, market research
  • Action: Counter-strategy, feature parity
  • Example: Competing app launch with targeted marketing

Pattern 3: Internal Bug

  • Symptoms: Sudden drop, platform-specific
  • Investigation: Check recent releases, error logs
  • Action: Hotfix, rollback if needed
  • Example: iOS update broke key feature

Pattern 4: Product Change Side Effect

  • Symptoms: Gradual shift after release
  • Investigation: Analyze feature adoption patterns
  • Action: Iterate on feature, A/B test variations
  • Example: Redesign changed user behavior

Pattern 5: Tracking Issue (False Alarm)

  • Symptoms: Exact numbers, midnight timing, single source
  • Investigation: Verify data pipeline
  • Action: Fix tracking, dismiss false alarm
  • Example: Analytics tag removed in deploy

Common Mistakes

| Mistake | Fix | |---------|-----| | Jumping to conclusions without segmentation | Complete Phase 1 (4D segmentation) first | | Only considering one type of factor | Brainstorm both intrinsic AND extrinsic | | Accepting first matching hypothesis | Test all hypotheses systematically | | Skipping data quality check | Always verify data is valid first | | Not consulting stakeholders | Talk to sales, CS, engineering teams | | Immediate rollback without diagnosis | Understand cause before acting |

Success Criteria

Metric diagnosis succeeds when:

  • Data quality verified
  • Problem narrowed to specific segments (4 dimensions)
  • 4-6 hypotheses brainstormed (intrinsic + extrinsic)
  • Hypotheses tested against data systematically
  • Root cause identified (or top 2-3 candidates with confidence levels)
  • North Star impact assessed
  • Recommended actions clear with owners and timelines
  • Diagnosis document created and shared
  • Stakeholders aligned on next steps

Real-World Example: Microsoft Teams Downloads Drop

Initial Report

"New app downloads down 50% from last week"

Phase 1: Narrow Scope (15 min)

Questions asked:

  • Geography? → "Predominantly US and Europe"
  • User type? → "Enterprise customers, not consumers"
  • Platform? → "Web and desktop primarily, some mobile"
  • Other metrics? → "Usage unchanged (existing users still active)"

Narrowed statement: "Enterprise customer downloads via web/desktop down 50% in US/Europe, starting last week. Existing user engagement unchanged."

Phase 2: Generate Hypotheses (15 min)

Intrinsic:

  1. Bug in download flow
  2. Recent release broke something
  3. A/B test reducing visibility

Extrinsic: 4. Marketing promotion ended 5. Competitor launch 6. Economic downturn affecting enterprise spending

Stakeholder consults:

  • Engineering: No recent releases, no bugs reported
  • Marketing: "We were running 50% off for enterprises switching from Slack"
  • Sales: No competitor news, pipeline healthy

Phase 3: Test Hypotheses (20 min)

| Hypothesis | Enterprise | Consumer | US/EU | Usage Same | Downloads Only | |------------|------------|----------|-------|------------|----------------| | Bug in flow | ✗ Both | ✗ Both | ✗ All | ? | ✓ | | Promo ended | ✓ | ✗ | ✓ | ✓ | ✓ | | Competitor | ✗ Both | ✗ Both | ✗ All | ? | ✓ |

Pattern match: "Promotion ended" fits all observations

Confirmation: Marketing confirms promotion ended last weekend

Phase 4: North Star Impact (5 min)

Root cause: End of promotional campaign

  • Downloads during promo = artificially inflated baseline
  • Current downloads = return to normal baseline

North Star impact:

  • MAU: No impact (existing users unaffected)
  • Revenue: No impact (existing customers retained)
  • Severity: Low (false alarm, not actual problem)

Phase 5: Recommended Actions (5 min)

Primary action: No action needed

  • Not an actual decline, just return to baseline
  • Adjust baselines for future comparisons

Secondary actions:

  • Update dashboards to flag promotional periods
  • Establish "normal" baseline for future monitoring
  • Document for future reference (prevent false alarms)

Time to diagnosis: 60 minutes

Related Skills

This workflow orchestrates these skills:

  • root-cause-diagnosis (Phases 1-3) - Core investigation method
  • north-star-alignment (Phase 4) - Assess strategic impact

Related Workflows

  • tradeoff-decision: May use diagnosis to understand why metrics changed before deciding
  • metrics-definition: Uses similar systematic approach for defining vs. diagnosing

Time Estimate

Total: 60-90 minutes

  • Phase 0 (Data quality): 5 min
  • Phase 1 (Narrow scope): 15-20 min
  • Phase 2 (Hypotheses): 15-20 min
  • Phase 3 (Test): 20-30 min
  • Phase 4 (Impact): 10 min
  • Phase 5 (Actions): 10 min

For complex issues, may require multiple iterations or longer investigation time.