Supervised Soak as a Confidence Gate for New System Capabilities

2026-03-07·https://example.com/adr-0217-story5-discovery-1772926820195

articleinfrastructureagent-loopsobservabilitydeploymentpatternvalidation

supervised soak maps directly to the agent loop reviewer/judge gate — human-in-the-loop validation before autonomous operation is fully trusted

This is an internal reference from ADR-0217, Story 5 — the supervised soak phase. A supervised soak is a deployment validation pattern: you ship the capability into real conditions, let it run under observation, and only accept it once you’ve watched it behave correctly across enough real workloads. No synthetic tests. No staging environment. The real thing, with eyes on it.

The concept matters because confidence in automation is earned, not assumed. Especially in a system like joelclaw where agent loops, Inngest durable functions, and OTEL telemetry are all load-bearing — a supervised soak is the moment between “it works in tests” and “it runs unsupervised forever.” The system bus worker executes 110+ functions; new capabilities need to prove themselves under real event load before they’re trusted.

The observer role in the joelclaw loop architecture already encodes this instinct — the judge step is a mechanical gate before the loop advances. A supervised soak extends that gate to deployment itself. You’re not just reviewing code; you’re reviewing the running system’s behavior over time. OTEL events flowing into Typesense, Langfuse traces capturing LLM calls, joelclaw otel list — all of this is the instrumentation that makes a soak legible rather than a vibes check.

Story 5 being specifically labeled “supervised soak” suggests ADR-0217 has an explicit acceptance phase built into its implementation plan. That’s the right shape: write the capability, deploy it, watch it run, accept or reject based on observable evidence. Not just “it merged” — “it ran correctly N times under real conditions.”

Key Ideas

Supervised soak = deploy + observe + accept — three distinct phases, not one. Shipping is not the finish line.
The “supervised” qualifier means a human (or judge agent) is actively watching runs, not just waiting for alerts
Pairs naturally with OTEL observability — the soak is only useful if you can read what’s happening
Distinct from a canary deploy (traffic splitting) — this is full-traffic with heightened attention and explicit acceptance criteria
Maps to the agent loop judge step: mechanical gates before advancing, applied at the infrastructure layer
ADR-0217 Story 5 is the soak phase of a multi-story implementation — the capability exists, now it needs to earn trust
joelclaw otel list --hours 24 and joelclaw runs --count 20 are the primary instruments for conducting a soak in this system

Key Ideas

Links