Local Sandbox Isolation Primitives
Status
Accepted
Context and Problem Statement
joelclaw already proved the local host-worker sandbox runner is a viable phase-1 isolation surface, but the current substrate is still thinner than it should be for serious local parallel work.
What exists now is enough to run isolated story execution without dirtying the operator checkout. What does not exist yet is a strong, explicit local isolation contract covering the non-git parts that actually collide in practice:
- Docker Compose project names
- container and network names
- ports
- env files and runtime identity
- devcontainer materialization strategy
- active sandbox inventory and cleanup
- the difference between “minimal” and “full” local execution
This gap matters because git worktrees only isolate files. They do not isolate runtime state. Without explicit local isolation primitives, multiple local sandboxes can still collide through Docker, ports, env drift, shared service names, or sloppy teardown.
BranchBox is useful prior art here. It is not the runtime substrate we want for joelclaw — it is a local feature-environment product — but it surfaced several ideas worth adopting for our local execution path.
This ADR captures those ideas as joelclaw-native primitives, without adopting BranchBox’s product model or replacing the broader cloud-native direction in ADR-0205.
Decision
Adopt a worktree-backed local sandbox isolation contract for @joelclaw/agent-execution and related runtime surfaces.
The local sandbox path will standardize around these primitives:
1. Worktree-backed sandbox materialization
Every local sandbox gets its own real git worktree, materialized at the requested baseSha.
This becomes the canonical local repo surface for code-changing runs.
Requirements:
- one sandbox workspace per
requestId - deterministic pathing under a dedicated sandbox root
baseShapinned at materialization time- host operator checkout must remain untouched
2. Unique compose/runtime identity per sandbox
Every sandbox must get a unique runtime identity, not just a unique filesystem path.
Minimum contract:
- unique
COMPOSE_PROJECT_NAME - unique sandbox id / slug
- sandbox-specific env file
- sandbox-specific service naming for local containers
- derived runtime URLs/ports where required
This is the main correction to the false assumption that worktrees alone are sufficient isolation.
3. Copy-first devcontainer materialization
When a sandbox needs devcontainer configuration, materialize .devcontainer/ by copying it into the sandbox by default.
Rules:
- default strategy:
copy - optional override:
symlink, only when explicitly requested - exclude runtime-generated env files, secret material, and other sandbox-specific artifacts from the copied set
The default must favor sandbox independence and debuggability over clever shared indirection.
4. Sandbox registry
Maintain a lightweight local registry of active and recent sandboxes.
Each entry must record enough metadata for observability and cleanup:
requestIdworkflowIdstoryIdbaseSha- local path
- mode (
minimalorfull) - runtime identity (
composeProjectName, sandbox slug) - status / lifecycle timestamps
- teardown state
The registry exists to prevent feral cleanup and to make the operator surface truthful.
5. Minimal vs full local modes
Local sandboxes must support at least two explicit execution modes:
- minimal — worktree + env + toolchain, with no full stack materialization unless required
- full — worktree + devcontainer/compose/services + runtime identity setup
This keeps docs/refactor/test tasks cheap while still allowing real app/runtime validation when needed.
6. Hot image remains the speed layer, not the correctness layer
ADR-0206 still governs startup optimization.
The local isolation contract here is the correctness substrate. Prebuilt images, cache warming, and faster startup sit on top of it. They do not replace it.
7. Deliberate shared host mounts
Local sandboxes may project selected host-side tool state to avoid pointless re-auth and repeated bootstrap work.
Rules:
- minimum surface only
- prefer read-only mounts where possible
- shared tool state is a local convenience mechanism, not the long-term secret/proxy model
- ADR-0219 still governs the stronger host-side credential boundary for durable runtime work
8. Teardown guardrails are part of isolation
Sandbox isolation includes cleanup discipline.
Every local sandbox lifecycle must provide:
- deterministic teardown of worktree, compose project, network, and sandbox-specific env artifacts
- dirty-state detection before destructive teardown
- explicit force path when the operator or workflow chooses to override those checks
What This Is Not
This ADR does not do any of the following:
- adopt BranchBox as joelclaw’s execution runtime
- replace ADR-0205’s k8s/cloud-native direction
- replace ADR-0219’s proxy-policy and typed-result goals
- turn sandbox execution into long-lived human feature environments as a product
- declare hot images done
BranchBox is prior art for local isolation primitives only.
Why This Shape
Worktrees are necessary but not sufficient
A git worktree isolates code. It does not isolate runtime identity. The practical collisions in local parallel development come from Docker, env, ports, and cleanup — not from Git itself.
Copy-first beats symlink cleverness
Symlink-heavy setups look elegant until one sandbox mutates shared config and contaminates the others. Copy-first defaults produce more predictable local behavior and easier debugging.
Local registry prevents invisible mess
Without a sandbox registry, the system eventually loses track of active environments, leaked networks, stale worktrees, and half-torn-down runtime state.
Minimal/full split keeps the system honest
Not every job needs a full app stack. Encoding this as a first-class mode avoids expensive over-provisioning and makes local execution intent legible.
Consequences
Positive
- local sandbox execution becomes more collision-resistant
- Docker/runtime isolation stops being implicit and fragile
- operators can inspect and clean up local sandboxes with real metadata
- minimal tasks stay cheap while full-stack tasks remain possible
- ADR-0206 hot-image work has a clearer correctness substrate underneath it
- the local path becomes stronger without contaminating the long-term k8s/cloud architecture
Negative
- more lifecycle metadata to manage
- more setup logic around compose/env/devcontainer materialization
- cleanup paths become stricter and must be tested honestly
Risks
- overfitting local isolation to joelclaw monorepo assumptions
- accidentally drifting toward BranchBox-style persistent feature environments instead of execution sandboxes
- broad shared host mounts becoming a security shortcut instead of a bounded convenience layer
Required Skills Preflight
Load before implementing or extending this ADR:
system-architecture— understand how local host execution, Restate, queueing, and future k8s execution fit togetherdocker-sandbox— existing sandbox/runtime isolation patterns and container ergonomicsadr-skill— maintain the ADR as an executable implementation contract
Current gap:
- there is no canonical skill yet for local sandbox execution inside
@joelclaw/agent-execution. Until that exists, implementers must read ADR-0205, ADR-0206, ADR-0217, ADR-0219, and the current package code directly.
Implementation Plan
Contract surface
Extend @joelclaw/agent-execution to add explicit local-isolation metadata and lifecycle helpers.
At minimum:
- sandbox identity generation (
sandboxId, slug, compose project name) - sandbox mode (
minimal/full) - env materialization helper
- devcontainer materialization helper with
copydefault - registry read/write helpers
- teardown helpers with dirty-state/force semantics
Affected paths
Initial expected touch points:
packages/agent-execution/— canonical local isolation contract and helperspackages/restate/— local sandbox runner integration pointsscripts/restate/or adjacent runtime launch surfaces — environment handoff and cleanup wiringk8s/agent-runner.yamland related docs only where the local-vs-k8s boundary needs clarificationdocs/architecture.md/docs/deploy.md/skills/system-architecture/SKILL.mdwhen implementation lands
Patterns to follow
- sandbox paths are deterministic and scoped to request identity
- runtime identity is derived once and reused everywhere (
COMPOSE_PROJECT_NAME, env file naming, cleanup) - local shared mounts are explicit, bounded, and documented
- hot-image logic stays separate from correctness logic
- promotion boundary remains patch/artifact-first; sandboxes do not become implicit merge surfaces
What to avoid
- treating worktree creation alone as full isolation
- default symlink materialization for mutable config
- leaking sandbox-specific env/runtime state back into the operator checkout
- broad host home-directory mounts as a shortcut
- mixing local sandbox lifecycle rules with unrelated k8s runtime concerns
Implementation Progress (2026-03-09)
Phase 1 — package primitives
Phase-1 package slice shipped in the monorepo:
- added
packages/agent-execution/src/local.ts - exported the new local helpers from
packages/agent-execution/src/index.ts - added
packages/agent-execution/__tests__/local.test.ts - updated
docs/architecture.md - updated
skills/system-architecture/SKILL.md
Shipped helpers in phase 1:
- deterministic local sandbox identity generation
- deterministic sandbox path resolution
- per-sandbox env materialization
- minimal/full mode vocabulary
- JSON registry read/write/upsert/remove helpers
- layout create/remove helpers for local sandbox directories
Phase 2 — host-worker integration
The real local backend now consumes those helpers in packages/system-bus/src/inngest/functions/agent-dispatch.ts:
- local sandbox runs allocate deterministic paths under
~/.joelclaw/sandboxes/ - running snapshots materialize
.sandbox.envbefore execution - local sandbox state is written to the JSON registry on start and terminal completion/cancellation
- local artifact bundles are persisted into the sandbox directory
- inbox snapshots now include
localSandboxmetadata so operators and cancellation logic can see the sandbox identity/path/env/registry surface
Current honest limit after phase 2:
- this is still minimal-mode-first
- it does not yet implement copy-first devcontainer materialization or full compose/network lifecycle
- teardown policy is not finished; local sandboxes are now intentionally inspectable rather than disposable temp dirs
Phase 3 — retention, devcontainer copy-first helper, and concurrency proof
This slice extends both the package contract and the live host-worker path:
@joelclaw/agent-executionnow resolves terminal retention policy with explicitcleanupAfterdeadlines- the local registry now carries retention metadata so cleanup is inspectable instead of implicit
- expired retained local sandboxes are opportunistically pruned when a new local sandbox run starts
- copy-first
.devcontainermaterialization now exists as a canonical helper, with exclusions for env/secret junk and optionalsymlinkoverride when explicitly requested - the live local sandbox runner now injects the reserved sandbox env into the actual agent process, so
COMPOSE_PROJECT_NAMEand related identity are not just written to disk — they are present at execution time - live dogfood exposed a real path-collision bug:
composeProjectNamediverged, butsandboxIdinitially did not survive long shared requestId prefixes; identity generation was corrected so a request-derived hash is preserved in both sandbox path and compose identity - live completion diagnosis exposed a second bug class: the worker accepted abbreviated
baseShavalues in requests but repo materialization compared them against full commit SHAs, and a dispatch crash beforewrite-inboxleft inbox snapshots lying inrunning; repo materialization now accepts abbreviated SHAs that resolve correctly, andsystem/agent-dispatchnow forces a terminalfailedinbox snapshot on non-cancel failure - package tests now include a concurrent proof that two local sandboxes keep distinct compose identity and isolated copied devcontainer state
- a repeatable operator probe now exists at
bun scripts/verify-local-sandbox-dispatch.ts; it dispatches one happy-path local sandbox run and one intentional bad-SHA run, then waits for truthful terminal inbox state - the
agent-workloads→workload run --dry-runfront door was used again to shape the slice and inspect the canonical runtime request instead of inventing queue payloads by hand
Phase 4 — full local mode and workflow-rig dogfood
This slice extends the live local backend and the public front door:
joelclaw workload runcan now carry--sandbox-mode minimal|full, and the canonicalsystem/agent.requestedpayload now preserves that choice assandboxMode- the live host-worker path now maps the requested
cwdinto the cloned sandbox checkout instead of forcing all work to repo root - full local mode now discovers compose files relative to that sandbox workdir, reserves the sandbox-specific
COMPOSE_PROJECT_NAME, brings the compose project up before agent execution, and tears it down afterward - a tiny real fixture now lives under
packages/agent-execution/__fixtures__/full-mode-runtime/so full-mode dogfood can exercise compose + devcontainer surfaces without colliding with production ports - workflow-rig dogfood exposed a real substrate bug outside ADR-0221 proper: a stale long-running Restate worker rejected
workload/requestedas unregistered until it was restarted and reloaded the current queue registry - workflow-rig dogfood also exposed a second phase-4 bug in the stage contract itself: stage-2 agents could run
scripts/verify-workload-full-mode.tsfrom inside the sandbox, and that verifier launches anotherjoelclaw workload runplus inbox wait. This produced recursive self-dogfood instead of terminal completion, so nested workflow-rig execution now needs to be blocked by default inside sandboxed stage runs - a new canonical
workflow-rigskill now front-loads workload planning and runtime invocation, withagent-workloadsandrestate-workflowsdemoted to compatibility aliases
Current honest limit after phase 4:
- the main false-nonterminal failure mode was recursive self-dogfood (
scripts/verify-workload-full-mode.ts→ nestedjoelclaw workload run), not compose startup itself - nested workflow-rig execution is now blocked by default inside sandboxed stage runs, with explicit override only for deliberate recursion debugging
- guarded rerun is earned:
bun scripts/verify-workload-full-mode.tsproducedWR_20260310_013158, stage-2 completed terminally, the compose runtime came up healthy, the returned summary included the requiredfull-mode-ok|full|...proof line, and teardown left zero running containers - expiry pruning is still opportunistic at local-sandbox startup; a dedicated janitor/operator surface is the last operational gap
Phase 5 — operator surface and dedicated janitor path
This slice closes the remaining operational gap:
@joelclaw/agent-executionnow exposes targeted cleanup helpers for local sandbox registry entries, plus an explicitisLocalSandboxEntryExpired(...)predicate for operator surfacesjoelclaw workload sandboxes listis now the operator-facing registry view for ADR-0221 local sandboxesjoelclaw workload sandboxes cleanupis the bounded manual cleanup path with selector-based targeting,--dry-run, and active-sandbox protection unless--forceis explicitjoelclaw workload sandboxes janitoris the dedicated expired-sandbox cleanup path, so TTL pruning no longer waits on the next sandbox startup to run at all- the operator surface now reconciles registry entries against per-sandbox
sandbox.jsonmetadata before reporting or deleting, so older partial writeback residue stops lying about terminal state - live dogfood now proves the operator surface against the real registry:
workload sandboxes list --limit 5returned current registry/filesystem truthworkload sandboxes cleanup --request-id WR_20260310_005002 --dry-runcorrectly refused to delete a running sandbox without--forceworkload sandboxes janitor --dry-runpreviewed the current expired setworkload sandboxes janitorran the dedicated janitor path successfully (no expired entries at the time, so it was an honest no-op)
Current honest state after phase 5:
- ADR-0221 core implementation is earned end-to-end for the host-worker local sandbox path
- full local mode is proved through the real workflow rig with terminal truth and clean teardown
- operators now have a first-class CLI surface to inspect retained sandboxes and run janitor cleanup on demand
- the CLI can now self-heal stale registry truth from per-sandbox metadata, which means completed sandboxes no longer require
--forcejust because an older partial writeback left the registry behind - remaining follow-up is optional ergonomics and institutional-memory work, not missing correctness
Phase 6 — scheduled janitoring and bounded residue cleanup
This follow-through turns the on-demand janitor into an always-on maintenance surface:
- repo-managed launchd asset
infra/launchd/com.joel.local-sandbox-janitor.plistnow schedules ADR-0221 janitoring at load and every 30 minutes scripts/local-sandbox-janitor.shis the single host wrapper, and it calls the canonical CLI pathjoelclaw workload sandboxes janitorinstead of inventing a second cleanup implementation- bounded operator cleanup was also used to remove the known stale terminal residues whose inbox truth had already reached
failedwhile the older sandbox metadata still saidrunning
Current honest state after phase 6:
- expired retained sandboxes no longer depend on a future sandbox startup or a human remembering to run janitor manually
- scheduled janitoring stays CLI-first and repo-tracked instead of living as an opaque hand-edited launchd one-off
- the remaining non-terminal stale
runningresidues, if any, are now clearly separated from the cleaned terminal residue set and can be debugged on their own merits instead of hiding inside general sandbox clutter
Phase 7 — terminal closeout hardening for subprocess capture
This follow-through closes the historical false-running residue hole at the source:
- host-worker
system/agent-dispatchcommand capture now uses exit-driven temp files instead of waiting on stdout/stderr pipe EOF for codex/claude/bash subprocesses and sandbox infra commands - that closes the exact bug class where descendants inherited the parent descriptors, the real parent process exited, but terminal inbox writeback never happened because capture was still waiting for pipe closure
- regression proof now covers both historical residue shapes:
- background-child descriptor inheritance no longer blocks terminal completion handling
- explicit command timeout returns promptly with timeout truth instead of hanging behind descendant-held pipes
Current honest state after phase 7:
- verifier timeout and nested queue-admission failure residues no longer depend on janitor cleanup to disappear; the host worker can now reach terminal closeout even when descendants keep stdout/stderr open briefly after the parent exits
- ADR-0221 cleanup remains necessary for historical garbage and normal retention, but the source runtime path is harder to strand in fake
runningstate going forward
Phase 8 — deterministic non-LLM timeout canary
The next follow-through adds a narrow proof lane for the dispatch substrate itself:
system/agent-dispatchnow accepts a deterministic verification tool (tool: "canary") for fixed scenarios only, not arbitrary operator shelling- the canary path reuses the same host-worker subprocess timeout/capture/writeback machinery as real codex/claude/pi work, but skips model behaviour entirely so timeout proof is deterministic
- the canonical live proof script is
bun scripts/verify-agent-dispatch-timeout.ts, which sends a sandboxedsleep-timeoutcanary, waits for terminal inbox truth, verifies registry state, and confirms the request does not remain in the running-sandbox surface
Current honest state after phase 8:
- the outer timeout path can now be forced live on demand without relying on LLM cooperation or hoping Codex obeys a long-sleep prompt
- ADR-0221 now has both regression tests and a deterministic live canary for the exact terminal-closeout path that historically left false
runningresidue
Phase 9 — on-demand health surface
The timeout canary should not stay a tribal manual proof script.
joelclaw status --agent-dispatch-canaryis now the canonical on-demand health surface for the deterministicsystem/agent-dispatchtimeout proof- the CLI keeps the default fast health check path intact, but when the flag is present it runs the timeout verifier, folds the result into the returned envelope, and marks the whole status check unhealthy if the canary truth is wrong
- this keeps the proof operator-facing and cheap to invoke without turning every routine status poll into a sandbox churn machine
Current honest state after phase 9:
- the timeout proof no longer requires remembering a bespoke script path; it hangs off the existing
joelclaw statushealth surface - scheduled automation remains optional future work, not a hidden default side effect of routine health polling
Phase 10 — gated scheduled health check
The follow-through lands the scheduled path without turning it into ambient noise:
- the existing
check/system-health-signals-schedulepipeline can now request the deterministic timeout canary, but only when the live worker environment setsHEALTH_AGENT_DISPATCH_CANARY_SCHEDULE=signals - default remains
off, so routine health cadence does not silently churn local sandboxes - the scheduled health slice now carries the canary result in OTEL and marks the slice degraded if the timeout proof comes back wrong
Current honest state after phase 10:
- the timeout proof is available in both operator-driven and scheduled health surfaces
- scheduled execution is explicit and gated, not hidden background magic
Phase 11 — operator truth tightening
The next polish pass tightens the human-facing truth surfaces around the already-shipped proof lane:
joelclaw statusnow exposes the latest persisted deterministic canary result in-band, so operators can see the last proof outcome without digging through runs or inbox filesjoelclaw workload runnow writes a terminal inbox snapshot immediately when queue admission fails before the runtime request is accepted, instead of leaving the request with no inbox truth artifact at all
Current honest state after phase 11:
- the timeout canary is visible both as an active proof run and as a last-known-good/last-known-bad operator summary
- queue-admission failure now leaves an immediate terminal inbox artifact instead of an informational void
Verification
- local sandbox creation produces a unique worktree path and unique runtime identity for each request
- two concurrent local sandboxes can start with distinct
COMPOSE_PROJECT_NAMEvalues, distinct sandbox paths even under long shared requestId prefixes, and isolated copied devcontainer state (package proof plus live runner allocation proof) - default devcontainer materialization uses copy mode and does not mutate sibling sandboxes
- minimal mode vocabulary and env materialization exist without provisioning unnecessary full-stack services
- full mode now provisions a real compose-backed local runtime through the workflow rig, completes guarded stage-2 dogfood terminally, emits the required proof line inside the returned summary, and tears the runtime down cleanly
- sandbox registry truthfully reports active sandboxes, retention state, and teardown state, and the CLI now reconciles stale registry entries from per-sandbox metadata before surfacing or cleaning them
- teardown helpers exist for sandbox-specific directory artifacts, and the live host-worker path now carries an explicit retention/cleanup policy with startup-time expiry pruning
- live sandbox completion is now proven by a repeatable operator probe: one local sandbox request reaches
completed, and one intentional bad-SHA request reachesfailedinstead of lying inrunning - operators can list retained local sandboxes, preview cleanup, refuse active cleanup unless forced, and run a dedicated janitor path from the installed CLI
- scheduled janitoring is repo-managed via launchd and runs the canonical CLI cleanup path instead of relying on startup opportunism or human memory
- host-worker subprocess capture no longer waits on stdout/stderr pipe EOF, so descendant-held descriptors cannot strand sandbox runs in fake
runningstate after the real parent process exits - the dispatch substrate now has a deterministic non-LLM timeout canary (
tool: "canary"+scripts/verify-agent-dispatch-timeout.ts) that forces the live outer-timeout path and proves terminal inbox + registry closeout - the canary is exposed through an operator-facing on-demand health surface (
joelclaw status --agent-dispatch-canary) instead of living as a bespoke manual proof script only - the existing scheduled health pipeline can include the same timeout proof behind an explicit live-worker gate (
HEALTH_AGENT_DISPATCH_CANARY_SCHEDULE=signals) while defaulting to off - the default status surface now exposes the latest persisted deterministic canary result without requiring manual run archaeology
- queue-admission failure now writes an immediate terminal inbox snapshot so the request has truthful closeout even when the runtime never gets admitted
- documentation and system-architecture skill are updated in the same implementation session when code lands
Follow-up
- create a canonical local-sandbox skill once implementation patterns stabilize
- treat this ADR as the local correctness layer underneath ADR-0206 speed work
- keep the launchd janitor bounded: it should continue calling the canonical CLI path instead of growing a shadow cleanup implementation
- keep ADR-0219 focused on stronger credential/proxy boundaries rather than letting local convenience mounts become the long-term security story