Supervised Soak as a Confidence Gate for New System Capabilities
supervised soak maps directly to the agent loop reviewer/judge gate — human-in-the-loop validation before autonomous operation is fully trusted
This is an internal reference from ADR-0217, Story 5 — the supervised soak phase. A supervised soak is a deployment validation pattern: you ship the capability into real conditions, let it run under observation, and only accept it once you’ve watched it behave correctly across enough real workloads. No synthetic tests. No staging environment. The real thing, with eyes on it.
The concept matters because confidence in automation is earned, not assumed. Especially in a system like joelclaw where agent loops, Inngest durable functions, and OTEL telemetry are all load-bearing — a supervised soak is the moment between “it works in tests” and “it runs unsupervised forever.” The system bus worker executes 110+ functions; new capabilities need to prove themselves under real event load before they’re trusted.
The observer role in the joelclaw loop architecture already encodes this instinct — the judge step is a mechanical gate before the loop advances. A supervised soak extends that gate to deployment itself. You’re not just reviewing code; you’re reviewing the running system’s behavior over time. OTEL events flowing into Typesense, Langfuse traces capturing LLM calls, joelclaw otel list — all of this is the instrumentation that makes a soak legible rather than a vibes check.
Story 5 being specifically labeled “supervised soak” suggests ADR-0217 has an explicit acceptance phase built into its implementation plan. That’s the right shape: write the capability, deploy it, watch it run, accept or reject based on observable evidence. Not just “it merged” — “it ran correctly N times under real conditions.”
Key Ideas
- Supervised soak = deploy + observe + accept — three distinct phases, not one. Shipping is not the finish line.
- The “supervised” qualifier means a human (or judge agent) is actively watching runs, not just waiting for alerts
- Pairs naturally with OTEL observability — the soak is only useful if you can read what’s happening
- Distinct from a canary deploy (traffic splitting) — this is full-traffic with heightened attention and explicit acceptance criteria
- Maps to the agent loop judge step: mechanical gates before advancing, applied at the infrastructure layer
- ADR-0217 Story 5 is the soak phase of a multi-story implementation — the capability exists, now it needs to earn trust
joelclaw otel list --hours 24andjoelclaw runs --count 20are the primary instruments for conducting a soak in this system
Links
- ADR-0217 — source ADR for this story
- ADR-0015 — agent loop architecture with reviewer/judge gates
- ADR-0157 — agent lifecycle CLI (proposed)
- joelclaw OTEL observability — instrumentation that makes soaks legible
- Inngest — durable function runtime where soak behavior is observable via run traces
- Langfuse — LLM call tracing for observing model behavior during a soak