← back to benchmarks

Reproduce the benchmark

By Parsa Khazaeepoul, co-founder of Pane. Tested every agent manager in this comparison set in production. .

The whole kit is public. If a number on the Q2 run page looks wrong, you should be able to re-run it on your hardware in under an afternoon and either reproduce the number or open a correction issue with your delta. This page walks through how.

what you'll need

clone the kit

gh repo clone dcouple/agent-manager-benchmark
cd agent-manager-benchmark
pnpm install

The repository's packages/ directory holds the five target packages — small, realistic TypeScript files with console.log calls scattered through them. The fixed task spec in TASK.md is what each manager will be given.

run the resource measurements

Open your target manager, spawn four parallel panes or workspaces, and wait for each to be ready. Then, from the kit's root:

# macOS / Linux
./scripts/measure-resource.sh pane

# Windows (PowerShell)
.\scripts\measure-resource.ps1 pane

The script prompts for the launcher PID, capture-process-set.sh walks descendants via pgrep -P, and the totals are written to runs/<run-id>/<manager>.json. Repeat for each manager installed on your platform.

run the workflow measurements

Each manager has its own keystroke script in workflow/<manager>.md. Open the file, start your screen recorder, and follow the steps exactly. Every line ends with [click] or [key] so you can count.

submitting your results

Open a PR against the kit repo with your runs/<run-id>/ directory. Disagreements with a number on a published run page go through the correction template; disputes with a methodology rule use the methodology-question template.

Methodology details live on the methodology page. Latest published numbers are on /benchmarks/2026-q2.