Headless user-domain boot bridge for critical launchd services
> Superseded on 2026-04-12 by ADR-0240 — Boot-safe LaunchDaemons for critical host services.
>
> Why: the installed bridge never earned the reboot path on Panda. The system daemon ran, but launchctl bootstrap user/501 <plist> kept failing with Input/output error, so the bridge could not reliably restore the critical services after headless boot.
Context
A post-reboot failure left Panda in a headless state: the machine was up, the user/$UID launchd domain existed, but the Aqua gui/$UID domain did not. Critical services that had only been managed as GUI LaunchAgents never came back:
com.joel.colimacom.joel.k8s-reboot-healcom.joel.agent-secretscom.joel.system-bus-workercom.joel.gatewaycom.joel.typesense-portforwardcom.joelclaw.agent-mail
The recovery required manual nohup starts just to get the system back on its feet. That is not earned infrastructure.
The reboot also exposed two configuration drifts:
- the repo-tracked Colima launchd asset still declared
4 CPU / 8 GiB / 60 GiBinstead of the stable8 / 16 / 100profile. - several important launchd assets (
gateway,typesense-portforward,agent-mail) still lived as hand-edited files under~/Library/LaunchAgentsinstead of repo-tracked sources ininfra/launchd/.
Decision
Adopt a headless boot bridge:
- Canonical launchd assets live in the repo under
infra/launchd/, including newly tracked plists for:com.joel.gatewaycom.joel.typesense-portforwardcom.joelclaw.agent-mail
- Add a system LaunchDaemon asset:
infra/launchd/com.joel.headless-bootstrap.plist. - That LaunchDaemon runs
infra/headless-bootstrap.shas root on boot and every 60 seconds. - The script detects whether
gui/$UIDexists:- if GUI is absent: bootstrap the critical repo-managed launch agents into
user/$UID - if GUI is present again: boot out the temporary
user/$UIDcopies so normal GUI ownership can resume without duplicate processes
- if GUI is absent: bootstrap the critical repo-managed launch agents into
- Add
infra/install-headless-bootstrap.shas the canonical installer:- symlink critical user launch agents from
~/Library/LaunchAgents/back to repo sources - install the system LaunchDaemon to
/Library/LaunchDaemons/ - bootstrap and kickstart the system bridge
- symlink critical user launch agents from
- Correct the repo-tracked Colima launchd asset to the stable runtime profile:
8 CPU / 16 GiB / 100 GiB.
Why this
- Survives headless reboots — boot no longer depends on Aqua login just to restore the core control plane.
- No shadow plist drift — launchd assets become git-tracked truth, not hand-edited local snowflakes.
- Minimal change to service code — keep existing launchd-managed services; add a domain bridge instead of rewriting every runtime.
- Clean handoff when GUI returns — the bridge is temporary ownership, not a permanent duplicate runtime.
Consequences
Positive
- Core services recover automatically after reboot even when no GUI session exists.
- Colima boot no longer regresses to the stale undersized profile from the repo asset.
- Gateway, agent-mail, and Typesense port-forward become canonical repo-managed launchd assets.
Negative
- The bridge installer requires root once (
sudo infra/install-headless-bootstrap.sh). - Critical services now have a cross-domain handoff path that must stay documented and tested.
- CLI/ops commands that assume
gui/$UIDstill need gradual cleanup to become fully domain-aware.
Implementation notes
Repo assets landed in the reboot-hardening session:
infra/launchd/com.joel.gateway.plistinfra/launchd/com.joel.typesense-portforward.plistinfra/launchd/com.joelclaw.agent-mail.plistinfra/launchd/com.joel.headless-bootstrap.plistinfra/headless-bootstrap.shinfra/install-headless-bootstrap.shinfra/launchd/com.joel.colima.plistupdated to8 / 16 / 100
Follow-up
- Make CLI launchd management domain-aware (
gui/$UIDvsuser/$UID) for gateway, worker, Talon, and secrets surfaces. - Add a deterministic smoke test for the headless bridge install path.
- Extend the daily steering repair work to agent-mail search reliability, which was independently degraded during the same recovery window.