Litmus — Parallel Autonomous ML Research Agents

Litmus spawns multiple OpenClaw subagents that experiment on your GPU overnight. Each runs on its own git branch in a shared lab repository — every experiment is a commit, agents can read each other's code, cherry-pick breakthroughs, and build on the global best at any time.

Validated techniques accumulate in a Skills library (~/.litmus/shared/skills/). A Synthesizer runs at 04:00 to distill collective knowledge into skills and write a research agenda for the next day. A Director runs every 2 hours to steer workers, trigger Compass Resets on stagnation, and orchestrate cross-agent knowledge transfer.

What makes it more than autoresearch:

Git worktrees: agents share one repo, each on their own branch — full experiment history, cherry-pick, and cross-agent code inspection via git -C ~/.litmus/repo log --all
Skills library: validated techniques persist and compound — agents don't re-discover wins
Synthesizer: distills all overnight notes into reusable skills and a research agenda
Compass Reset: Director detects stagnation and forces structured pivots using the skills gap
Two-phase experiment budget: quick 90-second check before committing to a full run
Structured attempt records: JSON per experiment in shared/attempts/ for rich analytics
Leisure mode (03:00–06:00): workers read papers, write moonshot hypotheses, identify gaps
Morning digest: research narrative delivered to your chat at 08:00

Everything is a native OpenClaw subagent. No external processes, no PID files.

First-Time Setup

Recommended — ask your OpenClaw agent (runs a guided onboarding conversation):

"Install https://clawhub.ai/kuberwastaken/litmus and set it up for my machine"

Full onboarding instructions: {baseDir}/references/onboarding.md — read that file first.

Or manually:

git clone https://github.com/kuberwastaken/litmus ~/.litmus
bash ~/.litmus/scripts/setup.sh

Clones Karpathy's training harness, builds the shared lab git repo at ~/.litmus/repo/, installs Python deps via uv, downloads ~1 GB of training data. Wait for it to finish.

Starting Research

1 — Prepare workspaces (creates git worktrees)

bash {baseDir}/scripts/prepare-agents.sh --agents 4 --templates architecture,optimizer,general,general

Creates git worktrees under ~/.litmus/agents/, each on its own branch in ~/.litmus/repo/. The shared lab git repo means every agent's experiments are immediately visible to all others:

git -C ~/.litmus/repo log --all --oneline --graph

2 — Spawn research subagents

sessions_spawn
  task: "Read program.md in your current directory and run the research loop forever."
  runtime: "subagent"
  mode: "session"
  agentId: "litmus-worker-arch-1"
  cwd: "~/.litmus/agents/arch-1"

Repeat for each agent, then:

sessions_yield message: "Research agents running. I'll notify you on new discoveries."

Templates: architecture · optimizer · regularization · general Full template details: {baseDir}/references/templates/

3 — Start the Director Layer

bash {baseDir}/scripts/setup-cron.sh --timezone "Your/Timezone"

Registers 6 cron jobs:

| Cron | Default schedule | Role | |---|---|---| | litmus-director | Every 2h during research hours | Reviews results, steers workers, Compass Reset on stagnation | | litmus-leisure | 03:00 daily | Switches workers to paper-reading / creative thinking mode | | litmus-synthesizer | 04:00 daily | Distills notes into skills library, writes research agenda | | litmus-dawn | 06:00 daily | Wakes workers, queues synthesizer's priority experiments | | litmus-watchdog | Every 30 min | Liveness check, escape mode on zero improvements | | litmus-digest | 08:00 daily | Morning research narrative delivered to your chat |

All times are configurable during onboarding — the setup agent pitches defaults and asks what you'd like to change. Common presets: night owl (01:00/02:00/04:00/07:00), early bird (23:00/00:30/02:00/05:30), intensive (1h director). Pass custom times to scripts/setup-cron.sh with --leisure-start, --synthesizer-time, --dawn-time, --digest-time, --director-hours, --watchdog-minutes.

Managing Agents

Status (experiment counts, best val_bpb, git tree):

bash {baseDir}/scripts/status.sh

Leaderboard (cross-agent, from shared/attempts/ JSON):

bash {baseDir}/scripts/results.sh --top 10
bash {baseDir}/scripts/results.sh --agent arch-1  # single agent

Full lab git history (all agents' experiments as a tree):

git -C ~/.litmus/repo log --all --oneline --graph

Inspect any experiment:

git -C ~/.litmus/repo show <commit-hash>  # see what changed
cat ~/.litmus/shared/attempts/<hash>.json  # see the metrics

Steer (redirect mid-run, no restart):

subagents action: "steer"  target: "litmus-worker-arch-1"
  message: "Stop refining depth. Checkout the best commit from opt-2 and combine their LR with DEPTH=10."

Stop:

subagents action: "kill"  target: "all"

What Agents Write Overnight

| Path | Contents | |---|---| | ~/.litmus/shared/attempts/<hash>.json | Structured record for every experiment (agent, val_bpb, status, title) | | ~/.litmus/shared/skills/<name>.md | Validated reusable techniques with YAML frontmatter | | ~/.litmus/shared/notes/discoveries/ | Per-improvement discovery notes | | ~/.litmus/shared/notes/anomalies/ | Unexpected result notes | | ~/.litmus/shared/notes/moonshots/ | Speculative hypotheses from leisure | | ~/.litmus/shared/notes/synthesis/ | Synthesizer's research agenda and combination matrix | | ~/.litmus/shared/discoveries.md | Cross-agent knowledge base (flat, for quick reading) | | ~/.litmus/shared/midnight-reflections.md | Leisure agent's nightly narrative | | ~/.litmus/repo/ (git) | All experiment commits across all agents on their branches |

Reference Files

{baseDir}/references/onboarding.md — first-time setup conversation
{baseDir}/references/program.md — worker agent loop (git-aware, skills-reading, two-phase budget)
{baseDir}/references/director.md — Director cron (Compass Reset, cross-pollination)
{baseDir}/references/leisure.md — Leisure mode (paper reading, structured notes, skill extraction)
{baseDir}/references/synthesizer.md — Synthesizer cron (knowledge distillation, skills library)
{baseDir}/references/dawn.md — Dawn cron (wake workers, queue experiments)
{baseDir}/references/watchdog.md — Watchdog cron (liveness, escape mode)
{baseDir}/references/digest.md — Morning digest (research narrative)
{baseDir}/references/templates/ — Research focus templates
{baseDir}/references/clawrxiv.md — ClawRxiv integration (optional auto-publishing)