Over the last year my co-founder and I shipped a lot of software without writing much of it by hand. Not because we found a magic prompt — because we slowly built a machine. A conversation becomes a ticket. The ticket goes into a repo. We run /discussion, then /plan, then /implement, and at each step a reviewer checks the work before it moves on — or we hand the loop to an agent, pointing something like Codex at a higher-level goal and letting it run those same commands itself, which works surprisingly well. Either way we step back in for the final review: by the time a pull request opens, three different reviewers have already torn it apart. The thing to notice is where our attention lands — almost all of it in those first two steps, the conversation and the discussion. After that, the machine mostly drives.
The interesting part isn't any one command. It's that the whole thing is a pipeline of gates — tests, typecheck, lint, plan review, implementation review, PR review. Semi-deterministic checks an agent has to pass before it's allowed to continue. Stack enough of them and you can take an ambitious feature from nothing to merged, mostly hands-off, at almost any scope.
Which raised a question we couldn't stop turning over: could the same machine build business work? Not code — proposals, decks, pricing, memos, the stuff that actually moves a company. The honest answer turned out to be yes, but only after we found the one piece that was missing. This is the story of what that piece is.
the software factory, in one breath
Worth sketching the thing we already had, because the business version is a remix of it:
- → Capture — a standup or a call becomes a ticket with context, intent, and an explicit list of what not to do.
- → Discussion — the agent probes us before any plan exists.
- → Plan + review — a plan gets written, then an adversarial reviewer attacks it — deliberately a different model than the one that wrote it, so Claude checks Codex's work and Codex checks Claude's.
- → Implement + review — code gets written, then a reviewer checks it against the plan.
- → PR — cloud reviewers run in a loop until the thing is clean.
Every arrow is a gate. And the repo is what makes the gates possible — it's the shared reality every stage reads from and writes back to.
the question with no obvious answer
Software is forgiving to automate because so much of it is deterministic. A test passes or it doesn't. Types check or they don't. Lint is binary. Business work has almost none of that. There's no compiler for a proposal. So when we asked what the business version of the factory looked like, the first instinct — ours included — was to reach for the only obviously-business-shaped gate: review. Write the thing, then run a tough adversarial pass over it.
That instinct is half-right, which makes it a trap. Review is real, but it's the last third of the pipeline. If review is the only structure you add, you're just polishing something that was built on vibes. The software factory doesn't work because of PR review. It works because of everything that happens before a line of code is written.
if it isn't in the context base, it doesn't exist to the agent
So we went looking for what comes before. The gap was obvious the moment we named it. A coding agent is powerful because it has a repo — one place where the truth lives. Architecture, conventions, prior decisions, the reason you picked library X over library Y. We wrote about this in Before Your First /discussion: the repository is the system of record, and if context isn't in the repo, the agent makes it up.
Business context has no repo. It's scattered across Slack threads, Gmail, a call recording, a CRM, a Notion page, last quarter's deck, and three things that only live in someone's head. Humans hop between all of it without noticing. An agent can't. That's the real reason business work feels hard to hand off — not the writing, the context.
So the first job isn't to write anything. It's to build the repo that doesn't exist yet.
building the piece that was missing
The missing piece is the repo business never had: a context base. The catch is that building a good one has two halves — an inside view and an outside one.
- → A filesystem context base. Pull the scattered stuff into plain markdown files an agent can read. Connected apps make this mechanical now: through MCP — the standard way agents plug into outside tools — an agent can reach into Gmail, a calendar, a CRM, a drive, and write what it finds into a
.business/folder — known facts, assumptions, constraints, sources. Same principle as a repo, assembled on demand. No special workspace required; any agent setup that can call tools and write files can do it. - → A research-adversary pass. This one we didn't expect. Internal context tells you what you believe. It says nothing about what the market actually thinks. So we added a stage whose only job is to go read the outside world — the objections buyers really raise, the language they really use, what competitors get praised and dragged for. It's the
codebase-explorerof business work — the step that reads everything already out there before any plan exists — except here the codebase is the internet and the people in your category. And it runs before the spec, as context, not as a polish pass at the end.
Put those together and the agent finally has something to reason over: an internal picture of the truth and an external picture of reality. Now the gates have something to check against.
the map
Once the context problem was solved, the rest of the pipeline mapped over almost one-to-one. Here's the whole translation:
| software factory | business factory |
|---|---|
| GitHub / monorepo | a filesystem context base (.business/) pulled out of connected apps via MCP |
AGENTS.md (map) + CLAUDE.md (manual) | BUSINESS_AGENTS.md + BUSINESS_CONTEXT.md |
| a Linear ticket | the task or brief |
| reading the codebase & prior art | business-context — facts, assumptions, constraints |
codebase-explorer | business-research-adversary — what the outside world actually thinks |
/discussion | business-discussion |
/plan + plan-reviewer | business-spec + business-spec-reviewer |
/implement | business-artifact |
implementation-reviewer | business-artifact-reviewer |
| pull request + cloud review | role-based adversarial review until it clears |
typecheck | every claim maps to evidence (the claim-evidence ledger) |
eslint / zero warnings | no vague claims, no fake precision, one clear ask |
| tests | stakeholder acceptance criteria |
build passes | the artifact is actually sendable |
merge to master | the human gate / release |
why it's a pipeline, not one big prompt
There's a subtle lesson buried in here. You can't cram all of this into a single skill. Each stage has to run with fresh context and hand the next stage a file on disk.
It's the same reason a person doesn't write, review, and ship in one sitting. A reviewer who just watched you write the thing can't see it clearly — they're carrying the same assumptions you are. Fresh eyes catch what tired eyes miss. So the business skills do exactly what the engineering skills do: business-context writes files, business-spec reads them cold and writes a spec, business-artifact-reviewer reads the finished artifact cold and attacks it. The handoff through files is the feature, not an implementation detail.
One practical corollary: each task spins up its own .business/ context base, so the workflow wants one isolated worktree per task — the same way we give every engineering task its own git worktree, a throwaway, isolated copy of the project. Clean context in, no bleed between tasks, and a dozen of them can run in parallel. That isolation is most of why we run all of this in Pane: a fresh worktree, and its own context base, for every task.
The flip side matters just as much. A task's .business/ folder is disposable — it dies with the worktree. So anything worth keeping graduates into a docs/ folder at the repo root: the durable, first-principles sources of truth a business actually runs on — positioning, pricing logic, who the customer is, hard-won facts — kept current so every new task starts from them instead of rediscovering them. It's the same split engineering already lives with: throwaway per-task scratch versus a permanent, maintained manual. docs/ is the business CLAUDE.md.
where the human actually shows up
Nine named stages can make this look like nine things you sit and run. It isn't — and missing that is missing the whole point. The shape is the same as engineering: human attention is front-loaded, and the rest is meant to run on its own.
You do two things by hand. You have the conversation that gets captured into a ticket, and you run the discussion. That's where your judgment is worth the most — deciding what the deliverable is really for and who it's aimed at. None of the support stages are things you chain by hand. Before the discussion even talks to you, it pulls the context base together — the internal facts plus the research-adversary pass — so you're reacting to something real, not an empty repo. After it, the spec and the artifact run in succession and pull in their own reviewers. The main stages invoke the support stages themselves, the same way our /plan spins up its own reviewer without being asked.
Then you come back once, at the end, and only when it earns it. The review itself decides whether the artifact is high-stakes or under-specified and routes it to a human gate if it is. A throwaway internal note ships itself; a six-figure proposal waits for you. That is the line between a checklist and a factory — the factory decides when it needs you.
what's actually different about business
I don't want to oversell the symmetry. The real gap is determinism. Software has a compiler and a test suite telling it the moment it's wrong. Business has neither — taste, relationships, and reading-the-room don't live in any file — so we manufacture the closest checks we can.
The closest thing to a test suite is a claim-evidence ledger. Every meaningful claim in the artifact gets tagged: supported, needs a source, or overreaching. It's not a compiler, but it's the same spirit — make the artifact prove it isn't lying. And the final adversarial review, the one that asks "what are we only approving because we worked hard on it?" — that's the PR review. The anti-sycophancy gate.
what we're not pretending
The hard part isn't the pipeline — it's the context base. "Pull it from your apps" is one sentence hiding a lot of work: which sources are authoritative, what's stale, what's missing entirely. Everything downstream is capped by how good that folder is. Feed it garbage and you get a confident, well-formatted garbage deliverable.
And the checks are softer than they look. In software the compiler is ground truth: it's right, or you don't ship. Here the claim-evidence ledger and the adversarial reviews are themselves model judgments — they raise the floor, they don't guarantee correctness. The only real ground truth is the human at the gate, which is exactly why the gate isn't optional for anything that matters.
Last thing: this is a machine for deliverables that earn it — a proposal, a pricing model, an investor memo. Nobody should run nine stages and a research pass to answer a Slack message. The fast path exists for a reason.
the bigger idea
Strip it all down and every good agent workflow turns out to need the same four things: a context base, a task, a staged process, and review gates. Software got those first because GitHub handed them to us. Business work has to assemble them by hand. But it's the same machine.
Software agents need a repo. Business agents need one too — we just have to build it before we can use it.
The business skills are open source, the same as our engineering skills: github.com/dcouple/skills
Related reading: