ADR-0089shipped
Single-Source Inngest Worker Deployment (Retire Dual-Clone Sync)
Status
implemented
Context
Inngest function code currently exists in two working copies:
- Monorepo source:
~/Code/joelhooks/joelclaw/packages/system-bus/src - Runtime clone:
~/Code/system-bus-worker/packages/system-bus/src
This creates operational drag and drift risk:
- Manual/automated rsync copy flow is required (
joelclaw inngest sync-worker) - Agent loop completion includes special clone sync/restart logic
- Startup scripts perform
git pull+bun installat runtime - Debugging requires checking which copy is actually live
The local cluster already runs Inngest server in connect mode and is the intended long-term runtime target for durable workflows.
Drift Evidence (validated 2026-02-21)
- Gateway launch path is monorepo-based:
~/.joelclaw/scripts/gateway-start.shpoints at~/Code/joelhooks/joelclaw/packages/gateway/src/daemon.ts. - Worker launch path is clone-based:
~/Library/LaunchAgents/com.joel.system-bus-worker.plistruns~/Code/system-bus-worker/packages/system-bus/start.sh. - Worker startup is mutable at runtime:
start.shexecutesgit pull --ff-onlyandbun installon every restart. - Live logs showed drift symptoms: repeated worker import failures for
../../observability/emitduring restart cycles while gateway/runtime behavior remained healthy enough to drain events. - Result: deploy state is not derivable from a single git SHA in the monorepo.
Decision
Adopt a single source of truth for Inngest worker code in the joelclaw monorepo and deploy workers as immutable k8s workloads built from that source.
What changes
packages/system-businjoelclawbecomes the only authoritative source for worker code.- The worker runtime moves from launchd + separate clone to k8s Deployment(s).
- The rsync-based dual-copy workflow is removed.
- Host-coupled functions are split explicitly from cluster-safe functions during migration.
- Runtime
git pull/bun installmutation is removed from worker boot paths.
Runtime model
- Primary worker (k8s): runs cluster-safe Inngest functions.
- Host worker (temporary, optional): runs only functions that require host-only capabilities (local filesystem paths under
/Users/joel, launchd interaction, host-only binaries/secrets flow). - Both register against the same Inngest server in connect mode.
Consequences
Positive
- One place to edit functions; no copy/sync tax.
- Reproducible deploys via image tags instead of mutable local state.
- Fewer hidden failure modes from stale clones.
- Cleaner path to CI/CD and rollback.
Negative
- Requires migration work to classify/split host-coupled functions.
- Secret management must be formalized for k8s worker pods.
- Short-term complexity while running dual workers during transition.
Risks
- Moving all functions too quickly could break host-dependent behavior.
- If function ownership boundaries are unclear, duplicate registration may occur.
Non-Goals
- No full rewrite of existing Inngest functions.
- No immediate elimination of all host-executed workflows on day one.
- No dependency on publishing NPM packages from k8s.
Implementation Plan
Phase 0: Stop further drift (immediate)
- Mark
~/Code/system-bus-workeras transitional runtime-only and block new feature edits there. - Add explicit warning in
packages/system-bus/start.shand related ops docs that dual-clone runtime is deprecated. - Record current launch points in docs:
~/Library/LaunchAgents/com.joel.system-bus-worker.plist~/.joelclaw/scripts/gateway-start.shpackages/system-bus/start.sh
Phase 1: Define ownership boundaries
- Classify every function in
packages/system-bus/src/inngest/functions/as:cluster-safehost-required
- Add explicit function group exports:
packages/system-bus/src/inngest/functions/index.cluster.tspackages/system-bus/src/inngest/functions/index.host.ts
- Ensure no function appears in both groups.
Phase 2: Containerize system-bus worker
- Add a production Dockerfile for
packages/system-buswith deterministic install/build. - Add k8s manifests for
system-bus-workerDeployment + Service ink8s/. - Move required env/secrets from launchd/start script assumptions into k8s Secret/config wiring.
Phase 3: Route registrations by worker role
- Update
packages/system-bus/src/serve.tsto register function lists by worker role env:
WORKER_ROLE=cluster-> registerclusterfunctions onlyWORKER_ROLE=host-> registerhostfunctions only
- Keep connect-mode registration behavior compatible with current Inngest server deployment.
Phase 4: Remove dual-clone coupling
- Remove rsync-based sync command from
packages/cli/src/commands/inngest.ts:
- delete
sync-workersubcommand - delete hardcoded source/target clone paths
- Remove worker-clone sync logic from
packages/system-bus/src/inngest/functions/agent-loop/complete.ts. - Deprecate and then remove runtime
git pullbehavior from worker startup scripts.
Phase 5: Operability hardening
- Add health/readiness checks for worker Deployment.
- Add a quick diagnostics command showing:
- registered function count by worker role
- last registration timestamp
- duplicate function-id detection result
- Document rollout and rollback playbook in k8s/inngest docs.
Verification
- New/updated function code is edited in exactly one repo path (
joelclawmonorepo) before deployment. -
joelclaw inngest status(or equivalent) reports healthy worker registration without rsync sync step. - No code path references
~/Code/system-bus-workerafter migration completion. - Worker pods in k8s pass readiness and execute cluster-safe functions successfully.
- Host-required functions continue to execute during transition without regressions.
- Duplicate function IDs across worker roles are detected and blocked.
- Gateway and worker are both running code derived from the same monorepo git SHA at deploy time.
- A synthetic gateway event (
joelclaw gateway test/system.fatalprobe) lands inotel_eventswithsource=gatewaywithin the expected ingestion window.
Implementation Outcome (2026-02-21)
Shipped now
- Worker
launchdruntime now points at monorepopackages/system-bus/start.shinstead of~/Code/system-bus-worker. - Runtime startup mutation was removed from worker bootstrap (
git pull/bun installno longer run at startup). - Explicit worker role split was implemented:
packages/system-bus/src/inngest/functions/index.host.tspackages/system-bus/src/inngest/functions/index.cluster.tsWORKER_ROLErouting inpackages/system-bus/src/serve.ts
- Worker diagnostics surface was added and wired:
joelclaw inngest workers- role counts, duplicate function-id detection, last registration timestamp.
- Dual-clone coupling was removed from primary operator paths:
joelclaw inngest sync-workernow acts as a compatibility alias for restart/register only (no file copy)- removed worker-clone sync/restart step from
agent-loop/complete.ts.
- Live observability verification passed with this runtime:
- gateway events drained
source=gatewayevents written intootel_events- immediate fatal escalation path recorded (
events.immediate_telegram).
Live verification snapshot (2026-02-21 18:06 UTC)
joelclaw gateway testpushedtest.gateway-e2eevent id67c7130c-de7a-4756-b67d-d0be4f6e0bc4; gateway drained queue (joelclaw gateway eventsreturnedtotalCount: 0).joelclaw otel list --source gateway --hours 1 --limit 20confirmed active gateway ingestion withfound: 124and recent events including:redis-channel.events.triagedredis-channel.events.dispatchedcommand-queue.queue.enqueued
- Fatal escalation probe:
joelclaw gateway push --type system.fatal ...pushed event ided718c28-5475-439c-9865-51de1b55480f.joelclaw otel search "events.immediate_telegram" --source gateway --hours 1 --limit 5returnedfound: 2including fresh event idad68ee74-98f5-4568-a620-88e59a87d32e.
- Canonical worker write-path probe:
POST http://localhost:3111/observability/emitwith fatal event idf2018c37-ca71-4393-a522-c9dc34c81ccareturnedtypesense.written=trueandconvex.written=true.joelclaw otel list --source verification --hours 1 --limit 5returned the same fatal event with actionprobe.convex.mirror.- Direct Convex query to
contentResources.listByType(type=\"otel_event\")returnedresourceId: otel:f2018c37-ca71-4393-a522-c9dc34c81ccawithlevel: fatal.
Remaining for full completion
- Remove all residual references to
~/Code/system-bus-worker. - Expand cluster-safe ownership beyond the initial activation set and retire host role over time.
- Complete k8s worker deployment + readiness rollout/rollback runbook.
Cluster activation update (2026-02-21)
- GHCR image published and deployed:
ghcr.io/joelhooks/system-bus-worker:20260221-110606. - K8s deployment manifest now uses:
imagePullSecrets: ghcr-pullWORKER_ROLE=clusterINNGEST_APP_ID=system-bus-cluster(role-specific app identity)INNGEST_SERVE_HOST=""(connect-mode default)
- Publish/deploy flow is codified in:
k8s/publish-system-bus-worker.sh(build + push GHCR + apply + rollout wait)
- Live pod verification:
- Running pod:
system-bus-worker-5f6ffd6999-ggxkp - Health endpoint from inside pod:
status=200,count=9 - Worker role counts:
host=55,cluster=9,active=9 - Duplicate function IDs: none
- Running pod:
- Initial cluster-safe function set activated:
approvalRequest,approvalResolvetodoistCommentAdded,todoistTaskCompleted,todoistTaskCreatedfrontMessageReceived,frontMessageSent,frontAssigneeChangedtodoistMemoryReviewBridge
- Cluster boot logs no longer attempt local
secrets lease:"[secrets] skipping local webhook secret leasing in cluster worker role"
Enforcement update (2026-02-22)
- Added hard runtime guard in
packages/system-bus/start.sh:- worker startup exits non-zero when launched from legacy clone path
~/Code/system-bus-worker.
- worker startup exits non-zero when launched from legacy clone path
- Added explicit source-verification surface in CLI:
joelclaw inngest source [--repair]inspects launchd binding and can re-install the monorepo plist.
- Added automatic source enforcement on worker lifecycle commands:
joelclaw inngest restart-workerandjoelclaw inngest sync-workernow enforce ADR-0089 launchd binding before restart/register.
- Added runtime source telemetry:
- worker health payload now includes
runtime.cwd,runtime.deploymentModel, andruntime.legacyCloneDetected.
- worker health payload now includes
- Swept legacy clone wording from non-ADR historical artifacts:
- marked
packages/system-bus/prd.jsonandpackages/system-bus/prd-harden.jsonas[historical]and removedsync-worker-clonereferences.
- marked
- Added CI policy guard for future drift (shared validators workflow):
scripts/validate-no-legacy-worker-clone.ts.github/workflows/agent-contracts.yml- fails when
Code/system-bus-workerappears outside ADR history and runtime guard allowlist.
Migration Trigger to Revisit
Revisit this ADR when host-required functions drop to zero; at that point, remove the host worker and run a single k8s worker role.