Scientific Graph Interpreter

Interpret and explain scientific graphs, charts, and data visualizations for research publications, clinical presentations, and academic communications with precision and clarity.

Quick Start

from scripts.graph_interpreter import GraphInterpreter

interpreter = GraphInterpreter()

# Comprehensive graph analysis
analysis = interpreter.interpret(
    image_path="figure_1.png",
    graph_type="kaplan_meier",
    context="oncology_phase3_trial",
    audience="clinicians"
)

print(analysis.statistical_summary)
print(analysis.clinical_significance)
print(analysis.suggested_caption)

Core Capabilities

1. Multi-Type Graph Analysis

analysis = interpreter.analyze(
    graph_type="forest_plot",
    data={
        "studies": ["Study A", "Study B", "Study C"],
        "effect_sizes": [1.2, 0.8, 1.5],
        "confidence_intervals": [[1.0, 1.4], [0.6, 1.0], [1.2, 1.8]],
        "overall_effect": 1.15,
        "heterogeneity_p": 0.04
    }
)

Supported Graph Types:

| Graph Type | Common Use | Key Elements to Extract | |------------|------------|------------------------| | Kaplan-Meier | Survival analysis | Median survival, HR, 95% CI, log-rank p | | Forest Plot | Meta-analysis | Effect size, CI, heterogeneity (I²), weights | | ROC Curve | Diagnostic accuracy | AUC, sensitivity, specificity, optimal cutoff | | Box Plot | Distribution comparison | Median, IQR, outliers, whiskers | | Scatter Plot | Correlation | R², p-value, trend line, outliers | | Bar Chart | Group comparisons | Means, SEM/SD, significance indicators | | Heatmap | Expression/omics | Scale, clustering, row/column annotations | | Volcano Plot | Differential analysis | Fold change, p-value, FDR threshold |

2. Statistical Interpretation

stats = interpreter.extract_statistics(
    graph_data,
    extract=[
        "p_values",
        "confidence_intervals", 
        "effect_sizes",
        "sample_sizes",
        "statistical_tests"
    ]
)

Statistical Reporting Standards:

# Example output structure
{
    "primary_outcome": {
        "measure": "Hazard Ratio",
        "value": 0.72,
        "ci_95": [0.58, 0.89],
        "p_value": 0.003,
        "interpretation": "32% risk reduction"
    },
    "secondary_outcomes": [...],
    "significance_level": 0.05,
    "multiple_comparison_adjusted": True
}

3. Audience-Specific Explanations

explanations = interpreter.generate_multi_audience(
    analysis,
    audiences=["researchers", "clinicians", "patients", "policy_makers"]
)

Explanation Templates:

For Researchers:

"The Kaplan-Meier analysis demonstrates a statistically significant survival advantage for the experimental arm (HR 0.72, 95% CI 0.58-0.89, p=0.003). Median survival improved from 14.2 to 19.6 months. The proportional hazards assumption was verified (p=0.42)."

For Clinicians:

"This trial shows patients on the new treatment lived about 5 months longer on average compared to standard care. The 32% reduction in death risk is significant and clinically meaningful. Consider this option for eligible patients."

For Patients:

"The study found that people taking the new treatment lived longer than those on standard treatment. About 1 in 3 patients benefited from the new treatment. Side effects were manageable."

4. Figure Caption Generation

caption = interpreter.generate_caption(
    analysis,
    style="journal",  # or "presentation", "poster"
    word_limit=250,
    include_statistics=True
)

Caption Structure:

Figure X. [Brief title]. [What is shown: X-axis shows..., Y-axis shows..., 
lines/bars represent...]. [Key finding: Group A showed... compared to 
Group B...]. [Statistics: HR 0.72 (95% CI 0.58-0.89), p=0.003]. 
[Conclusion: This demonstrates...].

5. Critical Appraisal

appraisal = interpreter.critical_appraisal(
    graph_data,
    check=[
        "appropriate_graph_type",
        "axis_scaling",
        "error_bars_present",
        "sample_size_adequate",
        "confounding_controlled",
        "generalizability"
    ]
)

Common Graph Pitfalls:

| Issue | Problem | Better Approach | |-------|---------|-----------------| | Truncated y-axis | Exaggerates differences | Start at 0 or clearly indicate break | | No error bars | Hides variability | Include SD, SEM, or 95% CI | | 3D effects | Distorts perception | Use 2D with clear labels | | Dual y-axes | Confusing comparison | Separate graphs or normalized scale | | p-hacking indicators | Multiple comparisons | Adjusted p-values, Bonferroni |

CLI Usage

# Comprehensive analysis
python scripts/graph_interpreter.py \
  --image survival_curve.png \
  --type kaplan_meier \
  --context "phase_3_oncology" \
  --audience clinicians \
  --output analysis.json

# Generate publication caption
python scripts/graph_interpreter.py \
  --image forest_plot.png \
  --type forest_plot \
  --generate caption \
  --journal-style nature \
  --word-limit 200

# Batch process figures
python scripts/graph_interpreter.py \
  --batch figures/ \
  --output report.html \
  --template comprehensive

Common Patterns

Pattern 1: Clinical Trial Primary Endpoint

# Analyze survival curve
analysis = interpreter.interpret(
    graph_type="kaplan_meier",
    primary_endpoint="overall_survival",
    treatment_arms=["Experimental", "Control"],
    key_metrics=["median_os", "hr", "ci", "p_value"]
)

# Generate regulatory-ready summary
regulatory_summary = interpreter.generate_regulatory_summary(
    analysis,
    guideline="ICH_E3"
)

Pattern 2: Meta-Analysis Forest Plot

# Interpret meta-analysis
analysis = interpreter.interpret_forest_plot(
    studies=included_studies,
    check_heterogeneity=True,
    assess_publication_bias=True
)

# Generate GRADE assessment
grade_rating = interpreter.generate_grade_rating(analysis)

Pattern 3: Diagnostic Accuracy ROC

# Analyze diagnostic test
analysis = interpreter.interpret_roc(
    curves=["Test A", "Test B", "Combined"],
    optimal_cutoffs=True,
    clinical Utility=True
)

# Clinical decision support
decision_aid = interpreter.generate_decision_aid(analysis)

Quality Checklist

Before Interpretation:

[ ] Graph type appropriate for data
[ ] Axes clearly labeled with units
[ ] Sample sizes indicated
[ ] Statistical tests specified
[ ] Confidence intervals present

During Interpretation:

[ ] Effect size calculated
[ ] Clinical significance assessed
[ ] Confidence intervals interpreted
[ ] Limitations noted
[ ] Generalizability considered

After Interpretation:

[ ] Explanation appropriate for audience
[ ] Statistical terms explained
[ ] Uncertainty communicated
[ ] Actionable insights highlighted

Best Practices

Statistical Communication:

Always report confidence intervals with point estimates
Distinguish statistical from clinical significance
Note limitations and generalizability
Avoid causal language in observational studies

Visual Analysis:

Check axis scales for distortion
Note truncated axes or breaks
Identify outliers and their impact
Verify error bar representation (SD vs SEM)

Common Pitfalls

❌ Correlation = Causation: "X causes Y because they're correlated" ✅ Cautious Interpretation: "X is associated with Y; other factors may explain this"

❌ Overstating Significance: "Highly significant (p<0.001)" as meaning large effect ✅ Proper Framing: "Statistically significant but modest effect size (d=0.2)"

❌ Ignoring Confidence Intervals: Reporting point estimate only ✅ Interval Reporting: "Effect: 1.5 (95% CI: 0.9-2.4), suggesting uncertainty"

Skill ID: 209 | Version: 1.0 | License: MIT