Research Lab · /lab

Autonomous Multi-Agent Paper Writing

Give it a topic. Nine specialised agents drive it through nine stages — questioning, literature survey, outline, code drafting, sandboxed Python experiments, analysis, drafting, reviewer iteration, citation verification, finalisation — until convergence or budget exhaustion.

What you get

A Markdown report with verified citations, a BibTeX bundle, and — when the topic admits experiments — the engineer's runnable Python script plus generated plots and result CSVs. The entire run is reproducible from a single command and a topic string.

Realistic positioning

This is arXiv-grade preprint quality, not 顶会 / NeurIPS-grade. 2026 LLMs (across all providers) hit a ceiling on novel research that even sophisticated multi-agent debate can't push through. Use this as a co-pilot that compresses 80% of the writing work, not as an autonomous PhD substitute. Read the output before posting.

Quick start

CLI

cheetahclaws
# in the REPL:
/lab start "Compare logistic regression and random forest on the iris
            dataset, report test accuracy with cross-validation"

# while it runs (typically 15-60 minutes):
/lab status                       # all runs
/lab status lab_a3b1c8e9f012      # detail for one run
/lab logs   lab_a3b1c8e9f012      # recent agent messages
/lab abort  lab_a3b1c8e9f012      # cancel cooperatively

Web UI

cheetahclaws --web --port 8080
# browser → http://127.0.0.1:8080/lab

The web UI gives you a launch form, a recent-runs table, live progress (stage pills + agent message stream auto-refreshing every 5 s), and an in-page Markdown render of the final report.

The 9 specialised agents

🏫

PI

Principal Investigator — picks the most promising research question, signs off on the outline when 2 of 3 reviewers approve, makes final decisions.

Questioner

Drafts 3–5 candidate research questions that frame the topic. Diverges so the PI has real options.

👪

Lay Reader

Sanity-checks the question for accessibility and external relevance, not just technical correctness.

📚

Surveyor

Produces a focused literature review and gap analysis with inline citations across arXiv, Semantic Scholar, and OpenAlex.

📝

Designer

Drafts the paper outline. Section structure, claims, and intended experiments before any code is written.

👨‍💻

Engineer

Writes the Python script. Runs it in a sandboxed subprocess with stdout / stderr / exit-code capture and matplotlib figures collected.

Drafter

Turns the outline + experiment results into a full Markdown draft with figure references and proper citations.

🔎

Reviewer × 3

Three independent reviewers critique the draft. Iterates with the Drafter until 2 of 3 approve or budget runs out.

Citation Checker

Verifies every citation against arXiv / Semantic Scholar / CrossRef. Flags fabrications, normalizes BibTeX entries.

Stage graph

[topic]
   ↓
QUESTIONING       Questioner drafts 3-5 candidate questions; PI picks; Lay Reader checks.
   ↓
SURVEY            Surveyor produces focused literature review + gap analysis.
   ↓
OUTLINE           Designer drafts; Reviewer × 3 critique; PI signs off when 2/3 pass.
   ↓
CODE_DRAFT        Engineer writes initial Python script.
   ↓
EXPERIMENT        Sandboxed subprocess execution: stdout, stderr, exit code, figures.
   ↓
ANALYSIS          Engineer interprets results; flags failed runs for retry.
   ↓
DRAFTING          Drafter composes Markdown report with figure references.
   ↓
REVIEW LOOP       Reviewer × 3 ⇄ Drafter until 2/3 approve or N rounds.
   ↓
CITATION VERIFY   Every reference checked against arXiv / Semantic Scholar / CrossRef.
   ↓
FINALISE          Bundle: report.md + references.bib + workspace/

Output artifacts

When the run finishes, the report lands at ~/.cheetahclaws/research_papers/<run_id>/:

report.md                   ← main deliverable
references.bib              ← verified citations
citations_verified.json     ← per-citation verification log
workspace/
  experiment.py             ← engineer's final script
  stdout.txt
  stderr.txt
  exit_code.txt
  figure_1.png              ← any matplotlib output
  results.csv               ← any data files the engineer wrote

Web UI

cheetahclaws --web --port 8080 then visit /lab. The dashboard gives you:

Budget and convergence

Every run is bounded. The PI tracks token / call / wall-clock budgets and stops the pipeline when any of three exit conditions trigger: (1) the reviewer loop converges (≥ 2/3 reviewers approve), (2) the budget is exhausted, (3) the user calls /lab abort. Final reports always ship the verified-citation pass — bad citations never make it through.

Ready to Run Your First Lab?

Install in 30 seconds, launch your first paper in 5 minutes.