ADR-0221accepted

Local Sandbox Isolation Primitives

2026-03-09T00:00:00.000Z

Context and Problem Statement

joelclaw already proved the local host-worker sandbox runner is a viable phase-1 isolation surface, but the current substrate is still thinner than it should be for serious local parallel work.

What exists now is enough to run isolated story execution without dirtying the operator checkout. What does not exist yet is a strong, explicit local isolation contract covering the non-git parts that actually collide in practice:

Docker Compose project names
container and network names
ports
env files and runtime identity
devcontainer materialization strategy
active sandbox inventory and cleanup
the difference between “minimal” and “full” local execution

This gap matters because git worktrees only isolate files. They do not isolate runtime state. Without explicit local isolation primitives, multiple local sandboxes can still collide through Docker, ports, env drift, shared service names, or sloppy teardown.

BranchBox is useful prior art here. It is not the runtime substrate we want for joelclaw — it is a local feature-environment product — but it surfaced several ideas worth adopting for our local execution path.

This ADR captures those ideas as joelclaw-native primitives, without adopting BranchBox’s product model or replacing the broader cloud-native direction in ADR-0205.

Decision

Adopt a worktree-backed local sandbox isolation contract for @joelclaw/agent-execution and related runtime surfaces.

The local sandbox path will standardize around these primitives:

1. Worktree-backed sandbox materialization

Every local sandbox gets its own real git worktree, materialized at the requested baseSha.

This becomes the canonical local repo surface for code-changing runs.

Requirements:

one sandbox workspace per requestId
deterministic pathing under a dedicated sandbox root
baseSha pinned at materialization time
host operator checkout must remain untouched

2. Unique compose/runtime identity per sandbox

Every sandbox must get a unique runtime identity, not just a unique filesystem path.

Minimum contract:

unique COMPOSE_PROJECT_NAME
unique sandbox id / slug
sandbox-specific env file
sandbox-specific service naming for local containers
derived runtime URLs/ports where required

This is the main correction to the false assumption that worktrees alone are sufficient isolation.

3. Copy-first devcontainer materialization

When a sandbox needs devcontainer configuration, materialize .devcontainer/ by copying it into the sandbox by default.

Rules:

default strategy: copy
optional override: symlink, only when explicitly requested
exclude runtime-generated env files, secret material, and other sandbox-specific artifacts from the copied set

The default must favor sandbox independence and debuggability over clever shared indirection.

4. Sandbox registry

Maintain a lightweight local registry of active and recent sandboxes.

Each entry must record enough metadata for observability and cleanup:

requestId
workflowId
storyId
baseSha
local path
mode (minimal or full)
runtime identity (composeProjectName, sandbox slug)
status / lifecycle timestamps
teardown state

The registry exists to prevent feral cleanup and to make the operator surface truthful.

5. Minimal vs full local modes

Local sandboxes must support at least two explicit execution modes:

minimal — worktree + env + toolchain, with no full stack materialization unless required
full — worktree + devcontainer/compose/services + runtime identity setup

This keeps docs/refactor/test tasks cheap while still allowing real app/runtime validation when needed.

6. Hot image remains the speed layer, not the correctness layer

ADR-0206 still governs startup optimization.

The local isolation contract here is the correctness substrate. Prebuilt images, cache warming, and faster startup sit on top of it. They do not replace it.

7. Deliberate shared host mounts

Local sandboxes may project selected host-side tool state to avoid pointless re-auth and repeated bootstrap work.

Rules:

minimum surface only
prefer read-only mounts where possible
shared tool state is a local convenience mechanism, not the long-term secret/proxy model
ADR-0219 still governs the stronger host-side credential boundary for durable runtime work

8. Teardown guardrails are part of isolation

Sandbox isolation includes cleanup discipline.

Every local sandbox lifecycle must provide:

deterministic teardown of worktree, compose project, network, and sandbox-specific env artifacts
dirty-state detection before destructive teardown
explicit force path when the operator or workflow chooses to override those checks

What This Is Not

This ADR does not do any of the following:

adopt BranchBox as joelclaw’s execution runtime
replace ADR-0205’s k8s/cloud-native direction
replace ADR-0219’s proxy-policy and typed-result goals
turn sandbox execution into long-lived human feature environments as a product
declare hot images done

BranchBox is prior art for local isolation primitives only.

Why This Shape

Worktrees are necessary but not sufficient

A git worktree isolates code. It does not isolate runtime identity. The practical collisions in local parallel development come from Docker, env, ports, and cleanup — not from Git itself.

Copy-first beats symlink cleverness

Symlink-heavy setups look elegant until one sandbox mutates shared config and contaminates the others. Copy-first defaults produce more predictable local behavior and easier debugging.

Local registry prevents invisible mess

Without a sandbox registry, the system eventually loses track of active environments, leaked networks, stale worktrees, and half-torn-down runtime state.

Minimal/full split keeps the system honest

Not every job needs a full app stack. Encoding this as a first-class mode avoids expensive over-provisioning and makes local execution intent legible.

Consequences

Positive

local sandbox execution becomes more collision-resistant
Docker/runtime isolation stops being implicit and fragile
operators can inspect and clean up local sandboxes with real metadata
minimal tasks stay cheap while full-stack tasks remain possible
ADR-0206 hot-image work has a clearer correctness substrate underneath it
the local path becomes stronger without contaminating the long-term k8s/cloud architecture

Negative

more lifecycle metadata to manage
more setup logic around compose/env/devcontainer materialization
cleanup paths become stricter and must be tested honestly

Risks

overfitting local isolation to joelclaw monorepo assumptions
accidentally drifting toward BranchBox-style persistent feature environments instead of execution sandboxes
broad shared host mounts becoming a security shortcut instead of a bounded convenience layer

Required Skills Preflight

Load before implementing or extending this ADR:

system-architecture — understand how local host execution, Restate, queueing, and future k8s execution fit together
docker-sandbox — existing sandbox/runtime isolation patterns and container ergonomics
adr-skill — maintain the ADR as an executable implementation contract

Current gap:

there is no canonical skill yet for local sandbox execution inside @joelclaw/agent-execution. Until that exists, implementers must read ADR-0205, ADR-0206, ADR-0217, ADR-0219, and the current package code directly.

Implementation Plan

Contract surface

Extend @joelclaw/agent-execution to add explicit local-isolation metadata and lifecycle helpers.

At minimum:

sandbox identity generation (sandboxId, slug, compose project name)
sandbox mode (minimal / full)
env materialization helper
devcontainer materialization helper with copy default
registry read/write helpers
teardown helpers with dirty-state/force semantics

Affected paths

Initial expected touch points:

packages/agent-execution/ — canonical local isolation contract and helpers
packages/restate/ — local sandbox runner integration points
scripts/restate/ or adjacent runtime launch surfaces — environment handoff and cleanup wiring
k8s/agent-runner.yaml and related docs only where the local-vs-k8s boundary needs clarification
docs/architecture.md / docs/deploy.md / skills/system-architecture/SKILL.md when implementation lands

Patterns to follow

sandbox paths are deterministic and scoped to request identity
runtime identity is derived once and reused everywhere (COMPOSE_PROJECT_NAME, env file naming, cleanup)
local shared mounts are explicit, bounded, and documented
hot-image logic stays separate from correctness logic
promotion boundary remains patch/artifact-first; sandboxes do not become implicit merge surfaces

What to avoid

treating worktree creation alone as full isolation
default symlink materialization for mutable config
leaking sandbox-specific env/runtime state back into the operator checkout
broad host home-directory mounts as a shortcut
mixing local sandbox lifecycle rules with unrelated k8s runtime concerns

Implementation Progress (2026-03-09)

Phase 1 — package primitives

Phase-1 package slice shipped in the monorepo:

added packages/agent-execution/src/local.ts
exported the new local helpers from packages/agent-execution/src/index.ts
added packages/agent-execution/__tests__/local.test.ts
updated docs/architecture.md
updated skills/system-architecture/SKILL.md

Shipped helpers in phase 1:

deterministic local sandbox identity generation
deterministic sandbox path resolution
per-sandbox env materialization
minimal/full mode vocabulary
JSON registry read/write/upsert/remove helpers
layout create/remove helpers for local sandbox directories

Phase 2 — host-worker integration

The real local backend now consumes those helpers in packages/system-bus/src/inngest/functions/agent-dispatch.ts:

local sandbox runs allocate deterministic paths under ~/.joelclaw/sandboxes/
running snapshots materialize .sandbox.env before execution
local sandbox state is written to the JSON registry on start and terminal completion/cancellation
local artifact bundles are persisted into the sandbox directory
inbox snapshots now include localSandbox metadata so operators and cancellation logic can see the sandbox identity/path/env/registry surface

Current honest limit after phase 2:

this is still minimal-mode-first
it does not yet implement copy-first devcontainer materialization or full compose/network lifecycle
teardown policy is not finished; local sandboxes are now intentionally inspectable rather than disposable temp dirs

Phase 3 — retention, devcontainer copy-first helper, and concurrency proof

This slice extends both the package contract and the live host-worker path:

@joelclaw/agent-execution now resolves terminal retention policy with explicit cleanupAfter deadlines
the local registry now carries retention metadata so cleanup is inspectable instead of implicit
expired retained local sandboxes are opportunistically pruned when a new local sandbox run starts
copy-first .devcontainer materialization now exists as a canonical helper, with exclusions for env/secret junk and optional symlink override when explicitly requested
the live local sandbox runner now injects the reserved sandbox env into the actual agent process, so COMPOSE_PROJECT_NAME and related identity are not just written to disk — they are present at execution time
live dogfood exposed a real path-collision bug: composeProjectName diverged, but sandboxId initially did not survive long shared requestId prefixes; identity generation was corrected so a request-derived hash is preserved in both sandbox path and compose identity
live completion diagnosis exposed a second bug class: the worker accepted abbreviated baseSha values in requests but repo materialization compared them against full commit SHAs, and a dispatch crash before write-inbox left inbox snapshots lying in running; repo materialization now accepts abbreviated SHAs that resolve correctly, and system/agent-dispatch now forces a terminal failed inbox snapshot on non-cancel failure
package tests now include a concurrent proof that two local sandboxes keep distinct compose identity and isolated copied devcontainer state
a repeatable operator probe now exists at bun scripts/verify-local-sandbox-dispatch.ts; it dispatches one happy-path local sandbox run and one intentional bad-SHA run, then waits for truthful terminal inbox state
the agent-workloads → workload run --dry-run front door was used again to shape the slice and inspect the canonical runtime request instead of inventing queue payloads by hand

Phase 4 — full local mode and workflow-rig dogfood

This slice extends the live local backend and the public front door:

joelclaw workload run can now carry --sandbox-mode minimal|full, and the canonical system/agent.requested payload now preserves that choice as sandboxMode
the live host-worker path now maps the requested cwd into the cloned sandbox checkout instead of forcing all work to repo root
full local mode now discovers compose files relative to that sandbox workdir, reserves the sandbox-specific COMPOSE_PROJECT_NAME, brings the compose project up before agent execution, and tears it down afterward
a tiny real fixture now lives under packages/agent-execution/__fixtures__/full-mode-runtime/ so full-mode dogfood can exercise compose + devcontainer surfaces without colliding with production ports
workflow-rig dogfood exposed a real substrate bug outside ADR-0221 proper: a stale long-running Restate worker rejected workload/requested as unregistered until it was restarted and reloaded the current queue registry
workflow-rig dogfood also exposed a second phase-4 bug in the stage contract itself: stage-2 agents could run scripts/verify-workload-full-mode.ts from inside the sandbox, and that verifier launches another joelclaw workload run plus inbox wait. This produced recursive self-dogfood instead of terminal completion, so nested workflow-rig execution now needs to be blocked by default inside sandboxed stage runs
a new canonical workflow-rig skill now front-loads workload planning and runtime invocation, with agent-workloads and restate-workflows demoted to compatibility aliases

Current honest limit after phase 4:

the main false-nonterminal failure mode was recursive self-dogfood (scripts/verify-workload-full-mode.ts → nested joelclaw workload run), not compose startup itself
nested workflow-rig execution is now blocked by default inside sandboxed stage runs, with explicit override only for deliberate recursion debugging
guarded rerun is earned: bun scripts/verify-workload-full-mode.ts produced WR_20260310_013158, stage-2 completed terminally, the compose runtime came up healthy, the returned summary included the required full-mode-ok|full|... proof line, and teardown left zero running containers
expiry pruning is still opportunistic at local-sandbox startup; a dedicated janitor/operator surface is the last operational gap

Phase 5 — operator surface and dedicated janitor path

This slice closes the remaining operational gap:

@joelclaw/agent-execution now exposes targeted cleanup helpers for local sandbox registry entries, plus an explicit isLocalSandboxEntryExpired(...) predicate for operator surfaces
joelclaw workload sandboxes list is now the operator-facing registry view for ADR-0221 local sandboxes
joelclaw workload sandboxes cleanup is the bounded manual cleanup path with selector-based targeting, --dry-run, and active-sandbox protection unless --force is explicit
joelclaw workload sandboxes janitor is the dedicated expired-sandbox cleanup path, so TTL pruning no longer waits on the next sandbox startup to run at all
the operator surface now reconciles registry entries against per-sandbox sandbox.json metadata before reporting or deleting, so older partial writeback residue stops lying about terminal state
live dogfood now proves the operator surface against the real registry:
- workload sandboxes list --limit 5 returned current registry/filesystem truth
- workload sandboxes cleanup --request-id WR_20260310_005002 --dry-run correctly refused to delete a running sandbox without --force
- workload sandboxes janitor --dry-run previewed the current expired set
- workload sandboxes janitor ran the dedicated janitor path successfully (no expired entries at the time, so it was an honest no-op)

Current honest state after phase 5:

ADR-0221 core implementation is earned end-to-end for the host-worker local sandbox path
full local mode is proved through the real workflow rig with terminal truth and clean teardown
operators now have a first-class CLI surface to inspect retained sandboxes and run janitor cleanup on demand
the CLI can now self-heal stale registry truth from per-sandbox metadata, which means completed sandboxes no longer require --force just because an older partial writeback left the registry behind
remaining follow-up is optional ergonomics and institutional-memory work, not missing correctness

Phase 6 — scheduled janitoring and bounded residue cleanup

This follow-through turns the on-demand janitor into an always-on maintenance surface:

repo-managed launchd asset infra/launchd/com.joel.local-sandbox-janitor.plist now schedules ADR-0221 janitoring at load and every 30 minutes
scripts/local-sandbox-janitor.sh is the single host wrapper, and it calls the canonical CLI path joelclaw workload sandboxes janitor instead of inventing a second cleanup implementation
bounded operator cleanup was also used to remove the known stale terminal residues whose inbox truth had already reached failed while the older sandbox metadata still said running

Current honest state after phase 6:

expired retained sandboxes no longer depend on a future sandbox startup or a human remembering to run janitor manually
scheduled janitoring stays CLI-first and repo-tracked instead of living as an opaque hand-edited launchd one-off
the remaining non-terminal stale running residues, if any, are now clearly separated from the cleaned terminal residue set and can be debugged on their own merits instead of hiding inside general sandbox clutter

Phase 7 — terminal closeout hardening for subprocess capture

This follow-through closes the historical false-running residue hole at the source:

host-worker system/agent-dispatch command capture now uses exit-driven temp files instead of waiting on stdout/stderr pipe EOF for codex/claude/bash subprocesses and sandbox infra commands
that closes the exact bug class where descendants inherited the parent descriptors, the real parent process exited, but terminal inbox writeback never happened because capture was still waiting for pipe closure
regression proof now covers both historical residue shapes:
- background-child descriptor inheritance no longer blocks terminal completion handling
- explicit command timeout returns promptly with timeout truth instead of hanging behind descendant-held pipes

Current honest state after phase 7:

verifier timeout and nested queue-admission failure residues no longer depend on janitor cleanup to disappear; the host worker can now reach terminal closeout even when descendants keep stdout/stderr open briefly after the parent exits
ADR-0221 cleanup remains necessary for historical garbage and normal retention, but the source runtime path is harder to strand in fake running state going forward

Phase 8 — deterministic non-LLM timeout canary

The next follow-through adds a narrow proof lane for the dispatch substrate itself:

system/agent-dispatch now accepts a deterministic verification tool (tool: "canary") for fixed scenarios only, not arbitrary operator shelling
the canary path reuses the same host-worker subprocess timeout/capture/writeback machinery as real codex/claude/pi work, but skips model behaviour entirely so timeout proof is deterministic
the canonical live proof script is bun scripts/verify-agent-dispatch-timeout.ts, which sends a sandboxed sleep-timeout canary, waits for terminal inbox truth, verifies registry state, and confirms the request does not remain in the running-sandbox surface

Current honest state after phase 8:

the outer timeout path can now be forced live on demand without relying on LLM cooperation or hoping Codex obeys a long-sleep prompt
ADR-0221 now has both regression tests and a deterministic live canary for the exact terminal-closeout path that historically left false running residue

Phase 9 — on-demand health surface

The timeout canary should not stay a tribal manual proof script.

joelclaw status --agent-dispatch-canary is now the canonical on-demand health surface for the deterministic system/agent-dispatch timeout proof
the CLI keeps the default fast health check path intact, but when the flag is present it runs the timeout verifier, folds the result into the returned envelope, and marks the whole status check unhealthy if the canary truth is wrong
this keeps the proof operator-facing and cheap to invoke without turning every routine status poll into a sandbox churn machine

Current honest state after phase 9:

the timeout proof no longer requires remembering a bespoke script path; it hangs off the existing joelclaw status health surface
scheduled automation remains optional future work, not a hidden default side effect of routine health polling

Phase 10 — gated scheduled health check

The follow-through lands the scheduled path without turning it into ambient noise:

the existing check/system-health-signals-schedule pipeline can now request the deterministic timeout canary, but only when the live worker environment sets HEALTH_AGENT_DISPATCH_CANARY_SCHEDULE=signals
default remains off, so routine health cadence does not silently churn local sandboxes
the scheduled health slice now carries the canary result in OTEL and marks the slice degraded if the timeout proof comes back wrong

Current honest state after phase 10:

the timeout proof is available in both operator-driven and scheduled health surfaces
scheduled execution is explicit and gated, not hidden background magic

Phase 11 — operator truth tightening

The next polish pass tightens the human-facing truth surfaces around the already-shipped proof lane:

joelclaw status now exposes the latest persisted deterministic canary result in-band, so operators can see the last proof outcome without digging through runs or inbox files
joelclaw workload run now writes a terminal inbox snapshot immediately when queue admission fails before the runtime request is accepted, instead of leaving the request with no inbox truth artifact at all

Current honest state after phase 11:

the timeout canary is visible both as an active proof run and as a last-known-good/last-known-bad operator summary
queue-admission failure now leaves an immediate terminal inbox artifact instead of an informational void

Verification

Follow-up

create a canonical local-sandbox skill once implementation patterns stabilize
treat this ADR as the local correctness layer underneath ADR-0206 speed work
keep the launchd janitor bounded: it should continue calling the canonical CLI path instead of growing a shadow cleanup implementation
keep ADR-0219 focused on stronger credential/proxy boundaries rather than letting local convenience mounts become the long-term security story