ADR-0189accepted

Gateway Guardrails

2026-03-04 Reality Check

Status updated to accepted until the remaining guardrails are actually shipped. Not all guardrails are implemented.

Prompt/docs surface shipped (commit 6910d1a, 2026-03-01)

  • Guardrails 1–3 were added to the gateway role/docs as explicit operator discipline
  • Guardrail 9 (gateway role/docs availability-first)
  • Guardrails 10–11 (system-awareness/skills discovery + operator-vs-system routing policy)
  • Guardrail 12 (dedicated SYSTEM role profile + role alias routing)

Runtime enforcement shipped (2026-03-06)

  • Guardrail 1 now auto-schedules deploy verification after successful git push when HEAD touched apps/web/ or root config
  • Guardrail 3 now has a daemon-enforced tool-budget checkpoint tripwire with Telegram status updates
  • Guardrail hits now emit OTEL under daemon.guardrails
  • Added focused test coverage in packages/gateway/src/guardrails.test.ts

Not yet shipped

  • Remaining numbered guardrails — see original spec for full list
  • Runtime enforcement is still partial (no scope cap, commit velocity breaker, or tsc gate yet)

Context

On 2026-03-01, a gateway session ran for 476 minutes, made 54 git pushes (52 successful), committed 57 times across 6 unrelated workstreams, and never once verified deploy status. Commit fafdfda broke the Vercel build with a ReferenceError in copy-as-prompt.tsx. The gateway kept shipping features on top of the broken build for 45 minutes. Zero Next.js skills were loaded despite heavy apps/web/ work.

The session loaded 23 unique skills (44 reads) but none of next-best-practices, next-cache-components, nextjs-static-shells, or vercel-debug.

Joel was not consulted or updated at any point during the 8-hour session.

Decision

Add the following guardrails to the gateway runtime prompt surface (roles/gateway.md, docs/gateway.md, and gateway AGENTS where applicable):

1. Post-Push Deploy Verification (SHIPPED)

After every git push touching apps/web/ or root config (turbo.json, package.json, pnpm-lock.yaml):

  • Wait 60–90s
  • Run vercel ls --yes 2>&1 | head -10
  • ● Error → STOP all work, fix the build
  • ● Ready → continue
  • Never stack commits on a broken deploy

2. Mandatory Skill Loading by Domain (SHIPPED)

PathRequired Skills
apps/web/next-best-practices, next-cache-components, nextjs-static-shells, vercel-debug
packages/system-bus/inngest-durable-functions, inngest-steps, inngest-events, inngest-flow-control, system-bus
packages/gateway/gateway, telegram
k8s/k8s
Content/articlesjoel-writing-style, joelclaw-web

3. Steering Check-In Cadence (SHIPPED)

  • Send status at session start
  • Check in every 60–120 seconds during active work
  • Hard cap: max 2 autonomous actions before a steering check-in
  • Always check in on state changes: delegated, blocked, recovered, done

4. Session Scope Cap (PROPOSED)

Max 2 unrelated workstreams per session. If a 3rd appears, checkpoint current state and either start a new session or get Joel’s approval to continue.

5. Commit Velocity Circuit Breaker (PROPOSED)

If >10 commits in a session without a human message, STOP and send Joel a Telegram status update. Do not resume code work until acknowledged or 5 minutes pass.

6. TSC Gate Before Commit (PROPOSED)

After codex finishes any apps/web/ change, run bunx tsc --noEmit before committing. Catches stale references, missing imports, type errors.

7. Deploy Failure Dedup (PROPOSED)

When ❌ Deploy Failed arrives as an automated event, check vercel ls for a newer Ready deploy before alerting Joel or taking action. Stale failure alerts are noise.

8. Session Start Scope Declaration (PROPOSED)

First message to Joel in each session should declare the planned scope. Example: “Starting up. Pending: Typesense recovery, content migration. Will check in after each.” Makes drift visible.

9. Availability-First Delegation Mode (SHIPPED)

Gateway behavior is orchestration-first:

  • do not go heads-down on heavy implementation/debug/research inside the gateway session
  • acknowledge quickly, delegate immediately, then track and report status
  • stay interruptible and high-availability during active work
  • keep frequent check-ins while delegated tasks run
  • prompt/suggest required skill loading before delegated domain work

10. System Awareness + Skill Discovery Contract (SHIPPED)

Gateway must maintain live awareness of health components and debugging prerequisites:

  • triage incidents by layer first (process, Redis, Inngest, worker, channel path, telemetry)
  • suggest required skills before debug/implementation actions
  • if coverage is unclear or missing, use find-skills flow and explicitly recommend missing canonical skills
  • recurring missing-skill patterns should trigger recommendation to create canonical skills

11. Operator vs System Message Routing (SHIPPED)

Gateway must classify inbound messages into:

  • user/operator messages (Joel direct messages)
  • system/automation messages (## 🔔, ## 📋, ## ❌, ## ⚠️, ## VIP)

Routing policy:

  • do not forward all system chatter to operator
  • escalate only actionable/high-signal system states (blocked flow, repeated unresolved failure, safety/security risk, explicit decision point)
  • low-signal/transient system traffic is triaged/logged/monitored without operator interruption

12. SYSTEM Role Profile for System pi Sessions (SHIPPED)

Add a dedicated roles/system.md profile for “system pi” sessions:

  • similar operational discipline to gateway (availability-first, high-signal check-ins, anti-frenzy)
  • geared toward deeper system work (cross-component diagnostics, reliability fixes, architecture maintenance)
  • requires explicit system-health awareness and skills discovery before debug/implementation work
  • role routing supports JOELCLAW_ROLE=system via identity-inject role alias resolution
  • identity-inject must reload role files on each session_start so role changes apply immediately to new sessions
  • identity-inject startup logs must print resolved rolePath for unambiguous role verification

Consequences

  • Gateway sessions have hard limits on autonomous scope
  • Deploy breakage is caught within 90 seconds instead of 45 minutes
  • Joel has visibility into what the gateway is doing without asking
  • Some overhead per push (~90s for deploy verification) — acceptable given the alternative
  • Skills loaded proactively reduce class of “obvious” build errors

Status

Guardrails 1–3 shipped to the prompt/docs surface in commit 6910d1a on 2026-03-01. Guardrail 1 now also has daemon-level deploy verification scheduling (2026-03-06). Guardrail 3 now also has daemon-level tool-budget checkpoint enforcement (2026-03-06). Guardrail 9 shipped on 2026-03-01 (gateway role/docs availability-first update). Guardrails 10–11 shipped on 2026-03-01 (system-awareness/skills discovery + operator-vs-system routing policy). Guardrail 12 shipped on 2026-03-01 (dedicated SYSTEM role profile + role alias routing). Guardrails 4–8 remain proposed / partially implemented pending follow-on runtime work.