← back to runpane.com

Benchmarks

By Parsa Khazaeepoul, co-founder of Pane. Tested every agent manager in this comparison set in production. .

We publish reproducible measurements of agent managers each quarter. Methodology is pre-registered before any numbers are taken; raw logs ship as JSON in the public kit repo; corrections route through GitHub issues, not email. Pane is one of the tools measured — we publish the rows it loses on the same page as the ones it wins.

Methodology

The rules of measurement — pre-registered before any data is captured. Metric definitions, the process-set rule, statistical handling, the pinned agent and model.

Q2 2026 results scheduled

Methodology is pre-registered. Measurements in progress — raw data lands soon. We're not linking the page from the hub until real numbers replace the placeholders.

Reproduce

Step-by-step on how to clone the kit, run the scripts, and submit your results back as a PR. Anyone with a modern laptop and an Anthropic API key can re-run.

The kit (github.com/dcouple/agent-manager-benchmark)

Public 5-package TypeScript monorepo, OS-agnostic measurement scripts, per-manager workflow walkthroughs, raw logs per run. MIT licensed so anyone can fork.

- - - - - - - - - - - - - - - -

Context on the category lives on the compare pages; longer-form perspective on why this exists is on the author page.