Autonomous Multi-Agent Paper Writing
Give it a topic. Nine specialised agents drive it through nine stages — questioning, literature survey, outline, code drafting, sandboxed Python experiments, analysis, drafting, reviewer iteration, citation verification, finalisation — until convergence or budget exhaustion.
On this page
What you get
A Markdown report with verified citations, a BibTeX bundle, and — when the topic admits experiments — the engineer's runnable Python script plus generated plots and result CSVs. The entire run is reproducible from a single command and a topic string.
Realistic positioning
This is arXiv-grade preprint quality, not 顶会 / NeurIPS-grade. 2026 LLMs (across all providers) hit a ceiling on novel research that even sophisticated multi-agent debate can't push through. Use this as a co-pilot that compresses 80% of the writing work, not as an autonomous PhD substitute. Read the output before posting.
Quick start
CLI
cheetahclaws
# in the REPL:
/lab start "Compare logistic regression and random forest on the iris
dataset, report test accuracy with cross-validation"
# while it runs (typically 15-60 minutes):
/lab status # all runs
/lab status lab_a3b1c8e9f012 # detail for one run
/lab logs lab_a3b1c8e9f012 # recent agent messages
/lab abort lab_a3b1c8e9f012 # cancel cooperatively
Web UI
cheetahclaws --web --port 8080
# browser → http://127.0.0.1:8080/lab
The web UI gives you a launch form, a recent-runs table, live progress (stage pills + agent message stream auto-refreshing every 5 s), and an in-page Markdown render of the final report.
The 9 specialised agents
PI
Principal Investigator — picks the most promising research question, signs off on the outline when 2 of 3 reviewers approve, makes final decisions.
Questioner
Drafts 3–5 candidate research questions that frame the topic. Diverges so the PI has real options.
Lay Reader
Sanity-checks the question for accessibility and external relevance, not just technical correctness.
Surveyor
Produces a focused literature review and gap analysis with inline citations across arXiv, Semantic Scholar, and OpenAlex.
Designer
Drafts the paper outline. Section structure, claims, and intended experiments before any code is written.
Engineer
Writes the Python script. Runs it in a sandboxed subprocess with stdout / stderr / exit-code capture and matplotlib figures collected.
Drafter
Turns the outline + experiment results into a full Markdown draft with figure references and proper citations.
Reviewer × 3
Three independent reviewers critique the draft. Iterates with the Drafter until 2 of 3 approve or budget runs out.
Citation Checker
Verifies every citation against arXiv / Semantic Scholar / CrossRef. Flags fabrications, normalizes BibTeX entries.
Stage graph
[topic]
↓
QUESTIONING Questioner drafts 3-5 candidate questions; PI picks; Lay Reader checks.
↓
SURVEY Surveyor produces focused literature review + gap analysis.
↓
OUTLINE Designer drafts; Reviewer × 3 critique; PI signs off when 2/3 pass.
↓
CODE_DRAFT Engineer writes initial Python script.
↓
EXPERIMENT Sandboxed subprocess execution: stdout, stderr, exit code, figures.
↓
ANALYSIS Engineer interprets results; flags failed runs for retry.
↓
DRAFTING Drafter composes Markdown report with figure references.
↓
REVIEW LOOP Reviewer × 3 ⇄ Drafter until 2/3 approve or N rounds.
↓
CITATION VERIFY Every reference checked against arXiv / Semantic Scholar / CrossRef.
↓
FINALISE Bundle: report.md + references.bib + workspace/
Output artifacts
When the run finishes, the report lands at ~/.cheetahclaws/research_papers/<run_id>/:
report.md ← main deliverable
references.bib ← verified citations
citations_verified.json ← per-citation verification log
workspace/
experiment.py ← engineer's final script
stdout.txt
stderr.txt
exit_code.txt
figure_1.png ← any matplotlib output
results.csv ← any data files the engineer wrote
Web UI
cheetahclaws --web --port 8080 then visit /lab. The dashboard gives you:
- Launch form — topic + provider + budget
- Live progress — stage pills update as the pipeline advances; agent message stream auto-refreshes every 5 s
- Recent runs table — sortable, filterable, with run-id click-through
- In-page Markdown render of the final report when finished
- Abort button for cooperative cancellation mid-run
Budget and convergence
Every run is bounded. The PI tracks token / call / wall-clock budgets and stops the pipeline when any of three exit conditions trigger: (1) the reviewer loop converges (≥ 2/3 reviewers approve), (2) the budget is exhausted, (3) the user calls /lab abort. Final reports always ship the verified-citation pass — bad citations never make it through.
Ready to Run Your First Lab?
Install in 30 seconds, launch your first paper in 5 minutes.