ADR-0155shipped

Three-Stage Story Pipeline: Implement → Prove → Judge

2026-02-27T00:00:00.000Z

The existing agent-loop infrastructure (packages/system-bus/src/inngest/functions/agent-loop/) accumulated jank: 500+ line files, Docker sandbox management, Claude auth wrangling, retry ladders, shared state between stages, tool ranking systems. Stories frequently skip or fail due to infrastructure complexity rather than code problems.

Joel wants a clean pipeline: three independent codex calls per story, each starting fresh with no shared state except the git repo.

Decision

Replace the multi-file agent-loop with a single Inngest function agent/story.pipeline that runs three stages per story:

Stage 1: Implement

Fresh codex exec receives:

Story description and acceptance criteria from PRD
Repo context (path, test/typecheck commands)
Any judgment from a previous failed attempt

Codex implements the feature and commits.

Stage 2: Prove

Fresh codex exec receives:

The story’s acceptance criteria
Instruction to verify the implementation: run tests, check types, lint, manually inspect behavior
Permission to fix obvious issues and commit fixes
Must produce a proof-of-work summary

Stage 3: Judge

Fresh codex exec receives:

The story’s acceptance criteria
The diff since before Stage 1
The proof-of-work from Stage 2
Instruction to pass or fail with specific reasoning

Pass → story marked complete, next story starts. Fail → judgment written, story sent back to Stage 1 with the judgment as context. Max 3 attempts before marking as blocked.

Architecture

agent/story.start event
  → agent/story.pipeline function
    → step.run("implement") — codex exec
    → step.run("prove")     — codex exec  
    → step.run("judge")     — codex exec
    → if pass: step.sendEvent("agent/story.start", next story)
    → if fail: step.sendEvent("agent/story.start", same story + judgment)

Single file. No Docker sandbox. No retry ladders. No tool rankings. No Claude auth. Just codex exec with good prompts.

PRD Format

Same JSON format as existing PRDs. Stories have id, title, description, acceptance_criteria, priority, depends_on. The pipeline reads the PRD, picks the next uncompleted story by priority respecting dependencies, and runs it.

State

PRD stored in Redis (key: story-pipeline:{prdId}) with story statuses
Each story tracks: status (pending/implementing/proving/judging/done/blocked), attempts, lastJudgment
No filesystem state files, no progress.txt, no .out files

Consequences

Good:

Simple — one file, three codex calls, clear prompts
Observable — each step is an Inngest step with full trace
Restartable — Inngest handles retries per step
No infrastructure baggage from old loop
Fresh codex per stage means no accumulated context/confusion

Bad:

No Docker sandbox isolation (codex runs in workspace)
No retry ladder across tools (codex only, no Claude/pi fallback)
Three codex calls per story = 3x the API cost vs single-pass

Acceptable because:

Codex sandbox mode (workspace-write) provides sufficient isolation
If codex can’t do it, the story is probably underspecified — fix the PRD
Cost is negligible vs quality improvement from independent verification

Boundary Update (2026-03-06)

This ADR remains shipped for the three-stage story pattern itself.

What changed is the runtime assumption underneath it:

shared workspace-write is still acceptable for simpler single-lane story work in a controlled checkout
it is not the governing isolation model for durable multi-agent Restate-driven story execution
for that class of work, ADR-0205 and ADR-0206 now own the runtime direction
ADR-0172 mail reservations remain useful coordination, but they are not a substitute for per-story filesystem isolation

Three-Stage Story Pipeline: Implement → Prove → Judge

Status

Context

Decision

Stage 1: Implement

Stage 2: Prove

Stage 3: Judge

Architecture

PRD Format

State

Consequences

Boundary Update (2026-03-06)