ADR-0239superseded

Headless user-domain boot bridge for critical launchd services

2026-04-12T00:00:00.000Z

> Superseded on 2026-04-12 by ADR-0240 — Boot-safe LaunchDaemons for critical host services. > > Why: the installed bridge never earned the reboot path on Panda. The system daemon ran, but launchctl bootstrap user/501 <plist> kept failing with Input/output error, so the bridge could not reliably restore the critical services after headless boot.

Context

A post-reboot failure left Panda in a headless state: the machine was up, the user/$UID launchd domain existed, but the Aqua gui/$UID domain did not. Critical services that had only been managed as GUI LaunchAgents never came back:

com.joel.colima
com.joel.k8s-reboot-heal
com.joel.agent-secrets
com.joel.system-bus-worker
com.joel.gateway
com.joel.typesense-portforward
com.joelclaw.agent-mail

The recovery required manual nohup starts just to get the system back on its feet. That is not earned infrastructure.

The reboot also exposed two configuration drifts:

the repo-tracked Colima launchd asset still declared 4 CPU / 8 GiB / 60 GiB instead of the stable 8 / 16 / 100 profile.
several important launchd assets (gateway, typesense-portforward, agent-mail) still lived as hand-edited files under ~/Library/LaunchAgents instead of repo-tracked sources in infra/launchd/.

Decision

Adopt a headless boot bridge:

Canonical launchd assets live in the repo under infra/launchd/, including newly tracked plists for:
- com.joel.gateway
- com.joel.typesense-portforward
- com.joelclaw.agent-mail
Add a system LaunchDaemon asset: infra/launchd/com.joel.headless-bootstrap.plist.
That LaunchDaemon runs infra/headless-bootstrap.sh as root on boot and every 60 seconds.
The script detects whether gui/$UID exists:
- if GUI is absent: bootstrap the critical repo-managed launch agents into user/$UID
- if GUI is present again: boot out the temporary user/$UID copies so normal GUI ownership can resume without duplicate processes
Add infra/install-headless-bootstrap.sh as the canonical installer:
- symlink critical user launch agents from ~/Library/LaunchAgents/ back to repo sources
- install the system LaunchDaemon to /Library/LaunchDaemons/
- bootstrap and kickstart the system bridge
Correct the repo-tracked Colima launchd asset to the stable runtime profile: 8 CPU / 16 GiB / 100 GiB.

Why this

Survives headless reboots — boot no longer depends on Aqua login just to restore the core control plane.
No shadow plist drift — launchd assets become git-tracked truth, not hand-edited local snowflakes.
Minimal change to service code — keep existing launchd-managed services; add a domain bridge instead of rewriting every runtime.
Clean handoff when GUI returns — the bridge is temporary ownership, not a permanent duplicate runtime.

Consequences

Positive

Core services recover automatically after reboot even when no GUI session exists.
Colima boot no longer regresses to the stale undersized profile from the repo asset.
Gateway, agent-mail, and Typesense port-forward become canonical repo-managed launchd assets.

Negative

The bridge installer requires root once (sudo infra/install-headless-bootstrap.sh).
Critical services now have a cross-domain handoff path that must stay documented and tested.
CLI/ops commands that assume gui/$UID still need gradual cleanup to become fully domain-aware.

Implementation notes

Repo assets landed in the reboot-hardening session:

infra/launchd/com.joel.gateway.plist
infra/launchd/com.joel.typesense-portforward.plist
infra/launchd/com.joelclaw.agent-mail.plist
infra/launchd/com.joel.headless-bootstrap.plist
infra/headless-bootstrap.sh
infra/install-headless-bootstrap.sh
infra/launchd/com.joel.colima.plist updated to 8 / 16 / 100

Follow-up

Make CLI launchd management domain-aware (gui/$UID vs user/$UID) for gateway, worker, Talon, and secrets surfaces.
Add a deterministic smoke test for the headless bridge install path.
Extend the daily steering repair work to agent-mail search reliability, which was independently degraded during the same recovery window.