Executable Documents as Agent Proof-of-Work
verify command is a mechanical reviewer — re-runs all blocks and diffs output, same pattern as agent loop judge step
Executable Documents as Agent Proof-of-Work
Simon Willison’s Showboat flips the script on agent output. Instead of asking an agent to produce code and then separately verifying it worked, the document itself is the proof. A markdown file accumulates executable code blocks and their captured output as the agent works. Then showboat verify re-runs every block and diffs against the recorded output. If anything changed, it fails. The artifact and the test are the same thing.
The CLI is deliberately agent-shaped — init, note, exec, pop, verify, extract. An agent builds the document incrementally with exec (which prints stdout so it can react to errors), backs out mistakes with pop, and the whole thing is reproducible from scratch via extract. Written in Go, distributed via PyPI, which is a fun deployment choice. The exec command returns the child process exit code so agents can branch on failure without parsing output.
The verify-as-diff pattern is the real insight here. It’s not just “run the tests.” It’s “re-run the exact sequence of commands that built this artifact and confirm the world hasn’t changed.” That’s a stronger guarantee than a test suite because it captures the full build narrative, not just assertions about the end state. The --output flag on verify writes an updated copy without modifying the original — non-destructive verification with a paper trail.
This maps cleanly onto the agent loop architecture. The reviewer step already asks “does this work?” — but Showboat gives you a mechanical answer instead of an LLM judgment call. An agent that builds a Showboat document as it implements a story produces its own verifiable receipt. The judge step could just run showboat verify and get a binary pass/fail with diffs.
Key Ideas
- Document-as-test: the artifact and its verification are the same file — no separate test suite to maintain
- Incremental building with
pop: agents can back out failed commands without corrupting the document, which is undo for append-only logs verifyas mechanical reviewer: re-execute all blocks, diff output, exit 0 or 1 — no LLM needed for the “did it work?” questionextractfor reproducibility: emit the exact CLI commands to rebuild a document from scratch, which means documents are both human-readable and machine-replayable- Agent-first CLI design: stdout passthrough on
exec, exit code forwarding, stdin piping — every command is designed for non-human callers - Proof-of-work pattern: the verified document proves the agent actually ran the commands and got the outputs, not that it hallucinated a plausible-looking transcript
Links
- Showboat on GitHub
- Example demo document (built by Claude Code)
- Claude Code transcript showing the agent building the demo
- Simon Willison’s blog — by Simon Willison