OpenClaw R Stats

When to Use

User asks for any statistical analysis, hypothesis testing, group comparison, prediction, association, survival analysis, meta-analysis, causal inference, power/sample size, or mentions R statistical packages.

What This Skill Does NOT Do

Claim causality from observational data (use "associated with")
Run large exploratory fishing without clear user intent
Silently ignore assumption violations
Report only p-values (always include effect sizes and CIs)

Pre-Flight (Mandatory)

Confirm dataset exists and is readable
Run schema inspection: bash {baseDir}/scripts/run-rstats.sh schema --data <path>
Report: rows, columns, types, missing values
If missing > 5%, warn. If n < 30, warn small sample.

Environment Setup

First time or errors: bash {baseDir}/scripts/run-rstats.sh doctor

Install by profile (only when needed):

| Profile | Script | Methods | |---------|--------|---------| | Core | install-core.R | t-test, regression, ANOVA, chi-sq | | Survival | install-survival.R | KM, Cox, competing risks, RMST | | Missing | install-missing.R | MICE, MCAR test | | Mixed | install-mixed.R | LMM, GLMM, GEE, ICC | | Bayes | install-bayes.R | brms, Bayes factors | | Causal | install-causal.R | PSM, IPTW, IV, DiD, RDD, TMLE | | Meta | install-meta.R | meta-analysis, NMA | | SEM | install-sem.R | SEM, CFA, lavaan | | Diagnostic | install-diagnostic.R | ROC, kappa, alpha | | Advanced | install-advanced.R | GAM, quantile, zero-inflated | | Power | install-power.R | power/sample size |

Workflow

Determine analysis type (see references/METHOD_TABLE.md)
Inspect dataset schema
Build JSON spec:

{
  "dataset_path": "<path>",
  "analysis_type": "<type>",
  "outcome": "<column>",
  "predictors": ["<col1>"],
  "hypothesis": "<plain language>",
  "alpha": 0.05,
  "seed": 42,
  "output_dir": "<path>"
}

Save as .json, run: bash {baseDir}/scripts/run-rstats.sh analyze --spec <path>
Read summary.json + report.md
Present: Summary → Statistics → Interpretation → Plots → Assumptions → Caveats

Analysis Selection

For the complete 82-method table with user intent mapping, see references/METHOD_TABLE.md.

Quick lookup — most common:

| Intent | analysis_type | |--------|--------------| | Compare 2 groups | ttest or wilcoxon | | Compare 3+ groups | anova or kruskal | | Categorical association | chisq or fisher | | Predict continuous | linear_regression | | Predict binary | logistic_regression | | Survival curves | kaplan_meier | | Survival regression | cox_regression | | Meta-analysis | meta_analysis | | Causal effect | propensity_match or did | | Power/sample size | power_analysis |

Automatic Method Switching

Non-normal + n < 30 → wilcoxon over ttest
Unequal variance → Welch t-test (equal_var: false)
Expected cells < 5 → fisher over chisq
Overdispersion in Poisson → suggest negative binomial
Heteroscedastic residuals → robust SE warning

Reporting Rules (Non-Negotiable)

Every analysis MUST include:

Sample size (n) and missing data handling
Method name and rationale
Point estimates with confidence intervals
Effect sizes (Cohen's d, η², R², OR, HR, etc.)
Assumption check results
Limitations

Language: "associated with" / "evidence suggests" — NEVER "proves" / "causes"

Spec Field Reference

See references/SPEC_REFERENCE.md for required/optional fields per analysis_type.