返回 Skill 列表
extension
分类: 数据与分析无需 API Key

Survival Analysis (KM)

生成Kaplan‑Meier生存曲线,计算生存统计量(log‑rank检验、中位生存时间),并估算临床及生物...的 hazard ratios。

person作者: aipoch-aihubclawhub

Survival Analysis (Kaplan-Meier)

Kaplan-Meier survival analysis tool for clinical and biological research. Generates publication-ready survival curves with statistical tests.

Features

  • Kaplan-Meier Curve Generation: Publication-quality survival plots with confidence intervals
  • Statistical Tests: Log-rank test, Wilcoxon test, Peto-Peto test
  • Hazard Ratios: Cox proportional hazards regression with 95% CI
  • Summary Statistics: Median survival time, restricted mean survival time (RMST)
  • Multi-group Analysis: Supports 2+ comparison groups
  • Risk Tables: Optional at-risk table below curves

Usage

Python Script

python scripts/main.py --input data.csv --time time_col --event event_col --group group_col --output results/

Arguments

| Argument | Description | Required | |----------|-------------|----------| | --input | Input CSV file path | Yes | | --time | Column name for survival time | Yes | | --event | Column name for event indicator (1=event, 0=censored) | Yes | | --group | Column name for grouping variable | Optional | | --output | Output directory for results | Yes | | --conf-level | Confidence level (default: 0.95) | Optional | | --risk-table | Include risk table in plot | Optional |

Input Format

CSV with columns:

  • Time column: Numeric, time to event or censoring
  • Event column: Binary (1 = event occurred, 0 = censored/right-censored)
  • Group column: Categorical variable for stratification

Example:

patient_id,time_months,death,treatment_group
P001,24.5,1,Drug_A
P002,36.2,0,Drug_A
P003,18.7,1,Placebo

Output Files

  • km_curve.png: Kaplan-Meier survival curve
  • km_curve.pdf: Vector version for publications
  • survival_stats.csv: Statistical summary (median survival, confidence intervals)
  • hazard_ratios.csv: Cox regression results with HR and 95% CI
  • `logrank_test.csv**: Pairwise comparison p-values
  • `report.txt**: Human-readable summary report

Technical Details

Statistical Methods

  1. Kaplan-Meier Estimator: Non-parametric maximum likelihood estimate of survival function

    • Product-limit estimator: Ŝ(t) = Π(tᵢ≤t) (1 - dᵢ/nᵢ)
    • Greenwood's formula for variance estimation
  2. Log-Rank Test: Most widely used test for comparing survival curves

    • Null hypothesis: No difference between groups
    • Weighted by number at risk at each event time
  3. Cox Proportional Hazards: Semi-parametric regression model

    • h(t|X) = h₀(t) × exp(β₁X₁ + β₂X₂ + ...)
    • Proportional hazards assumption checked via Schoenfeld residuals

Dependencies

  • lifelines: Core survival analysis library
  • matplotlib, seaborn: Visualization
  • pandas, numpy: Data handling
  • scipy: Statistical tests

Technical Difficulty: High ⚠️

This skill involves advanced statistical modeling. Results should be reviewed by a biostatistician, especially for:

  • Proportional hazards assumption violations
  • Small sample sizes (< 30 per group)
  • Heavy censoring (> 50%)
  • Time-varying covariates

References

See references/ folder for:

  • Kaplan EL, Meier P (1958) original paper
  • Cox DR (1972) regression models paper
  • Sample datasets for testing
  • Clinical reporting guidelines (ATN, CONSORT)

Parameters

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | --input | str | Required | Input CSV file path | | --time | str | Required | Column name for survival time | | --event | str | Required | | | --group | str | Required | | | --output | str | Required | Output directory for results | | --conf-level | float | 0.95 | | | --risk-table | str | Required | Include risk table in plot | | --figsize | str | '10 | | | --dpi | int | 300 | |

Example

# Basic survival curve
python scripts/main.py \
  --input clinical_data.csv \
  --time overall_survival_months \
  --event death \
  --group treatment_arm \
  --output ./results/ \
  --risk-table

Output includes:

  • Survival curves with 95% confidence bands
  • Median survival: Drug A = 28.4 months (95% CI: 24.1-32.7), Placebo = 18.2 months (95% CI: 15.3-21.1)
  • Log-rank test p-value: 0.0023
  • Hazard ratio: 0.62 (95% CI: 0.45-0.85), p = 0.003

Risk Assessment

| Risk Indicator | Assessment | Level | |----------------|------------|-------| | Code Execution | Python/R scripts executed locally | Medium | | Network Access | No external API calls | Low | | File System Access | Read input files, write output files | Medium | | Instruction Tampering | Standard prompt guidelines | Low | | Data Exposure | Output files saved to workspace | Low |

Security Checklist

  • [ ] No hardcoded credentials or API keys
  • [ ] No unauthorized file system access (../)
  • [ ] Output does not expose sensitive information
  • [ ] Prompt injection protections in place
  • [ ] Input file paths validated (no ../ traversal)
  • [ ] Output directory restricted to workspace
  • [ ] Script execution in sandboxed environment
  • [ ] Error messages sanitized (no stack traces exposed)
  • [ ] Dependencies audited

Prerequisites

# Python dependencies
pip install -r requirements.txt

Evaluation Criteria

Success Metrics

  • [ ] Successfully executes main functionality
  • [ ] Output meets quality standards
  • [ ] Handles edge cases gracefully
  • [ ] Performance is acceptable

Test Cases

  1. Basic Functionality: Standard input → Expected output
  2. Edge Case: Invalid input → Graceful error handling
  3. Performance: Large dataset → Acceptable processing time

Lifecycle Status

  • Current Stage: Draft
  • Next Review Date: 2026-03-06
  • Known Issues: None
  • Planned Improvements:
    • Performance optimization
    • Additional feature support