ADR-0185shipped

Session-Scoped Webhook Subscriptions with NDJSON Streaming

  • Status: shipped
  • Date: 2026-03-01
  • Deciders: Joel, Panda
  • Relates to: ADR-0035, ADR-0048, ADR-0058, ADR-0103, ADR-0123, ADR-0169

Context

joelclaw already had a webhook gateway (ADR-0048) that normalized inbound provider payloads and emitted Inngest events.

What was missing:

  1. No joelclaw webhook ... command surface for operator/session-level webhook workflows.
  2. No first-class subscription model binding webhook patterns to a specific pi session.
  3. No session-targeted bridge for webhook matches to wake a session immediately.
  4. NDJSON stream behavior existed for gateway stream, but not for webhook-specific subscriptions.

Result: webhooks could arrive, but we could not express: “subscribe this session to workflow completions, then act when artifacts arrive.”

Decision

1) Add joelclaw webhook command tree

joelclaw webhook
├── subscribe <provider> <event>
   [--repo <owner/repo>] [--workflow <name>] [--branch <name>] [--conclusion <status>]
   [--session <session-id>] [--ttl <duration>] [--stream]
├── unsubscribe <subscription-id>
├── list [--provider <provider>] [--session <session-id>]
└── stream <subscription-id> [--timeout <seconds>] [--replay <count>]

Rules:

  • Non-stream commands return standard HATEOAS envelopes (ok, command, result, next_actions).
  • --stream emits NDJSON lines and ends with terminal result or error (ADR-0058 contract).

2) Introduce Redis-backed webhook subscription registry

Canonical keys:

  • joelclaw:webhook:subscriptions (hash: id -> json)
  • joelclaw:webhook:index:<provider>:<event> (set of subscription IDs)
  • joelclaw:webhook:events:<subscription-id> (replay list)
  • joelclaw:webhook:notify:<subscription-id> (pub/sub channel)
  • joelclaw:webhook:dedup:<subscription-id>:<delivery-key> (idempotency)

Subscription document:

{
  "id": "whs_...",
  "provider": "github",
  "event": "workflow_run.completed",
  "filters": {
    "repo": "joelhooks/joelclaw",
    "workflow": "CI",
    "branch": "main",
    "conclusion": "success"
  },
  "sessionId": "gateway",
  "createdAt": "2026-03-01T...Z",
  "expiresAt": "2026-03-02T...Z",
  "active": true
}

3) Add dispatch function for matched subscriptions

A dedicated Inngest function fans out normalized webhook events to matching subscriptions.

First shipped slice:

  • trigger: github/workflow_run.completed
  • function id: webhook-subscription-dispatch-github-workflow-run-completed

Behavior:

  • load candidate subscriptions by indexed provider/event key
  • apply deterministic filter matching (repo, workflow, branch, conclusion)
  • prune invalid/expired subscriptions
  • dedupe per subscription delivery key
  • write matched payload to replay list and notify channel
  • when sessionId exists, push webhook.subscription.matched to gateway with originSession

4) Best-effort artifact enrichment for workflow completions

For github/workflow_run.completed:

  • call GET /repos/{owner}/{repo}/actions/runs/{run_id}/artifacts
  • include artifact metadata when available
  • if fetch fails, continue delivery and include artifactFetchError

5) Safety and durability constraints

  • idempotency key uses provider delivery id + run id + subscription id
  • default TTL is 24h unless overridden
  • expired subscriptions are pruned during list/match evaluation
  • no raw webhook payload execution
  • matching is pure data filtering
  • cluster worker dispatch path requires Redis runtime env (REDIS_HOST, REDIS_PORT)

Consequences

Good

  • External systems can drive autonomous session behavior directly.
  • Webhook workflows are inspectable and scriptable via CLI.
  • NDJSON stream works for both humans and agents.
  • Clear path to add more providers/events.

Tradeoffs

  • Adds another subscription subsystem (alongside feed subscriptions).
  • Session wake routing depends on Redis/gateway health.
  • Artifact enrichment introduces external API failure path (handled as soft-error field).

Out of Scope

  • Web UI for subscriptions
  • expression/regex filter language
  • provider-specific action executors beyond wake-and-context handoff

Implementation (Shipped)

CLI

  • packages/cli/src/commands/webhook.ts (new)
  • packages/cli/src/commands/webhook.test.ts (new)
  • packages/cli/src/cli.ts (register root command)
  • packages/cli/src/commands/gateway.ts (safe stream disconnect fix)
  • packages/cli/src/commands/send.ts (safe stream disconnect fix)
  • packages/cli/src/commands/watch.ts (safe stream disconnect fix)

System bus / webhook pipeline

  • packages/system-bus/src/lib/webhook-subscriptions.ts (new)
  • packages/system-bus/src/lib/webhook-subscriptions.test.ts (new)
  • packages/system-bus/src/inngest/functions/webhook-subscription-dispatch.ts (new)
  • packages/system-bus/src/inngest/functions/index.ts (register)
  • packages/system-bus/src/inngest/functions/index.cluster.ts (register)
  • packages/system-bus/src/inngest/client.ts (typed event schema)
  • packages/system-bus/src/webhooks/providers/github.ts (deliveryId normalization)

Docs

  • docs/cli.md
  • docs/inngest-functions.md
  • docs/webhooks.md

Pi extension integration (shipped)

  • packages/pi-extensions/inngest-monitor/index.ts
    • new tools: webhook_subscribe, webhook_monitors
    • non-blocking monitor lifecycle + compact status widget
  • packages/pi-extensions/inngest-monitor/run-tracker.ts
    • extracted Inngest run polling lifecycle
  • packages/pi-extensions/inngest-monitor/webhook-tracker.ts
    • extracted webhook stream monitor lifecycle + reconnect behavior
  • packages/pi-extensions/sdk/joelclaw-cli-sdk.ts (new)
    • typed JoelclawCliSdk interface for pi extensions
    • standardized envelope parsing and error taxonomy (JoelclawCliError)
    • best-effort OTEL emission for CLI command start/finish/failure

Infra/runtime hardening discovered during E2E

  • k8s/system-bus-worker.yaml: set
    • REDIS_HOST=redis
    • REDIS_PORT=6379

Without this, dispatch run failed with ioredis MaxRetriesPerRequestError.

Verification

  • joelclaw webhook subscribe github workflow_run.completed --repo joelhooks/joelclaw --stream --timeout 5 emits NDJSON and exits cleanly.
  • Synthetic github/workflow_run.completed event triggers webhook.subscription.matched for matching subscriptions only.
  • Matched payload includes artifact list or explicit artifactFetchError.
  • Session-targeted subscription triggers gateway event (webhook.subscription.matched) with immediate routing.
  • Duplicate delivery for same subscription is ignored (idempotent behavior).
  • Expired subscriptions are not matched and are removed during dispatch/list pruning.

Evidence (run IDs):

  • 01KJN5BHFHDCNBQWY6H8EAJJD5 — dispatch completed, matchedSubscriptions=1, notifiedSessions=1
  • 01KJN5SY6BW8NNBHN0XCT59E3A — first delivery (duplicates=0)
  • 01KJN5SZ5R78FRK79Q0C6FZPKG — duplicate delivery (duplicates=1, notifiedSessions=0)
  • 01KJN5VE79J91C26F9K6MNWCWW — expired/no-match noop (matchedSubscriptions=0)

Vector Clock (Execution Order)

  1. CLI command surface + NDJSON stream wiring
  2. Redis registry + matcher + replay/notify channels
  3. Dispatch function + function registration
  4. GitHub artifact enrichment + error-tolerant payload path
  5. Session wake routing (originSession)
  6. Stream disconnect hardening in existing commands
  7. E2E validation + runtime Redis env fix in cluster worker
  8. Pi extension integration with non-blocking webhook monitor + CLI SDK abstraction

Notes

Initial implementation intentionally scoped to GitHub workflow completions. Contract is now stable and shipped; provider expansion can follow this pattern without changing CLI semantics. Pi extension integration now uses a typed joelclaw CLI SDK boundary so additional extensions can reuse the same error handling and OTEL instrumentation instead of re-implementing subprocess plumbing.