返回 Skill 列表
extension
分类: 开发与工程无需 API Key

data-extraction

用于从医学研究PDF中提取结构化数据,解析研究特征、患者人口统计学信息、结果和结论。调用以从论文中系统地收集评审数据。

person作者: jakexiaohubgithub

Data Extraction Skill

This skill guides structured data extraction from research papers for systematic reviews.

When to Use

Invoke this skill when the user:

  • Asks to extract data from a PDF
  • Needs study characteristics pulled
  • Wants patient demographics collected
  • Requests outcome data extraction
  • Mentions "data extraction" or "data collection"

Data Elements to Extract

1. Study Identification

| Field | Description | Example | |-------|-------------|---------| | study_id | FirstAuthorYear format | "Smith2023" | | pmid | PubMed ID | "37654321" | | doi | Digital Object Identifier | "10.1001/jamasurg.2023.1234" | | title | Full article title | "..." |

2. Study Characteristics

| Field | Description | Values | |-------|-------------|--------| | year | Publication year | 2020 | | country | Study location | "USA", "Japan" | | study_design | Design type | "RCT", "Retrospective cohort" | | multicenter | Single/multi | true/false | | study_period | Enrollment dates | "2015-2020" |

3. Patient Demographics

| Field | Format | Notes | |-------|--------|-------| | sample_size | Integer | Total N | | age_mean | Number | Mean age | | age_sd | Number | Standard deviation | | age_median | Number | If no mean | | age_iqr | [Q1, Q3] | Interquartile range | | male_percent | 0-100 | Percentage male |

4. Clinical Characteristics (Neurosurgery)

Common scales and measures:

  • GCS (Glasgow Coma Scale): 3-15
  • GOS (Glasgow Outcome Scale): 1-5
  • mRS (modified Rankin Scale): 0-6
  • NIHSS (NIH Stroke Scale): 0-42
  • Hunt-Hess: I-V
  • Fisher Grade: 1-4
  • WHO Grade: I-IV (tumors)

5. Intervention Details

intervention:
  name: "Decompressive craniectomy"
  type: "Surgical"
  technique: "Unilateral frontotemporoparietal"
  timing: "Within 48 hours"
  details: "Bone flap ≥12cm diameter"

6. Outcome Data

Binary Outcomes (events/total)

outcomes:
  - name: "Mortality"
    type: "binary"
    timepoint: "30 days"
    intervention:
      events: 12
      total: 50
    control:
      events: 25
      total: 52

Continuous Outcomes (mean ± SD)

outcomes:
  - name: "Length of stay"
    type: "continuous"
    timepoint: "discharge"
    intervention:
      mean: 14.5
      sd: 6.2
      n: 50
    control:
      mean: 18.3
      sd: 7.1
      n: 52

Effect Estimates

effect_estimate:
  measure: "OR"  # OR, RR, HR, MD, SMD
  value: 0.65
  ci_lower: 0.42
  ci_upper: 0.98
  p_value: 0.038

Extraction Principles

DO:

  1. Extract only explicitly stated data
  2. Record the exact numbers from the paper
  3. Note units (mg, mm, days, months)
  4. Specify timepoints for each outcome
  5. Flag unclear or ambiguous values with "?"
  6. Document page numbers for key data

DON'T:

  1. Calculate or derive values (unless necessary)
  2. Assume missing data
  3. Interpret unclear statements
  4. Mix timepoints within outcomes

Quality Checks

After extraction, verify:

  • [ ] Sample sizes sum correctly across groups
  • [ ] Event counts ≤ total participants
  • [ ] Percentages add to ~100%
  • [ ] CIs contain the point estimate
  • [ ] P-values align with CI (crossing 1 for OR/RR)

Common Issues

Converting Median/IQR to Mean/SD

When only median and IQR reported:

Mean ≈ Median (for symmetric distributions)
SD ≈ IQR / 1.35 (for normal distributions)

Extracting from Figures

  • Use WebPlotDigitizer for graph data
  • Note "extracted from figure" in comments
  • Estimate uncertainty

Missing Control Group (Single-Arm)

For case series without controls:

outcomes:
  - name: "Mortality"
    type: "binary"
    timepoint: "in-hospital"
    single_arm:
      events: 15
      total: 100

Output Format

Use YAML format for structured extraction:

study_id: "Smith2023"
pmid: "37654321"
doi: "10.1001/jamasurg.2023.1234"
year: 2023
country: "USA"
study_design: "Retrospective cohort"
sample_size: 150

patient_demographics:
  age_mean: 58.3
  age_sd: 12.4
  male_percent: 62

intervention:
  name: "Decompressive craniectomy"
  type: "Surgical"

outcomes:
  - name: "Mortality"
    type: "binary"
    timepoint: "30 days"
    intervention:
      events: 12
      total: 75
    control:
      events: 18
      total: 75

notes: "Single-center study. High crossover rate (15%)."

Validation

After extraction, use the validate_extraction tool to check against schema:

mcp__neuroresearch__validate_extraction(data, schema_type="study")