By Parsa Khazaeepoul, co-founder of Pane. Tested every agent manager in this comparison set in production. .
We publish reproducible measurements of agent managers each quarter. Methodology is pre-registered before any numbers are taken; raw logs ship as JSON in the public kit repo; corrections route through GitHub issues, not email. Pane is one of the tools measured — we publish the rows it loses on the same page as the ones it wins.
Methodology
The rules of measurement — pre-registered before any data is captured. Metric definitions, the process-set rule, statistical handling, the pinned agent and model.
Q2 2026 results scheduled
Methodology is pre-registered. Measurements in progress — raw data lands soon. We're not linking the page from the hub until real numbers replace the placeholders.
Reproduce
Step-by-step on how to clone the kit, run the scripts, and submit your results back as a PR. Anyone with a modern laptop and an Anthropic API key can re-run.
The kit (github.com/dcouple/agent-manager-benchmark)
Public 5-package TypeScript monorepo, OS-agnostic measurement scripts, per-manager workflow walkthroughs, raw logs per run. MIT licensed so anyone can fork.
Context on the category lives on the compare pages; longer-form perspective on why this exists is on the author page.