ADR-0140shipped

One place to pick and trace LLM models

2026-02-25T00:00:00.000Z

Context and Problem Statement

joelclaw currently routes LLM calls through multiple incompatible paths:

pi -p --no-session --no-extensions calls in packages/system-bus/src/inngest/functions/*.
The shared infer() utility in packages/system-bus/src/lib/inference.ts.
Gateway session-based inference with its own fallback controller in packages/gateway/src/model-fallback.ts and packages/gateway/src/daemon.ts.
Direct inference providers in CLI and workers (for example packages/cli/src/commands/recall.ts and packages/system-bus/src/inngest/functions/transcript-process.ts).
step.ai.infer in packages/system-bus/src/inngest/functions/daily-digest.ts, which uses direct provider model selection (claude-3-5-sonnet-latest) disconnected from current model allowlists.

This fragmentation causes:

Model policy drift across components (packages/gateway/src/commands/config.ts vs packages/system-bus/src/lib/models.ts vs ad hoc env-configured defaults).
Cost opacity (no single routing policy, incomplete model attribution across all execution planes).
Observability gaps for a subset of inference calls (partial Langfuse + partial OTEL instrumentation).
Reduced model flexibility (every caller tends to re-harden policy in-place).

The goal is a single system-level inference router that normalizes model selection, supports rich fallback/policy logic, and emits uniform telemetry while preserving the same inference surface from CLI, gateway, and Inngest worker functions.

Decision Drivers

We already have successful fallback behavior in gateway (ModelFallbackController) and should keep its failure semantics while centralizing the policy and decision logic.
Cost discipline requires a consistent model tiering strategy with auditable spend attribution, not per-call ad hoc policy.
Langfuse is already accepted as the LLM observability plane, and OTEL remains the canonical event fabric for system-wide triage.
Maximum model flexibility is needed because callers change by task (triage, summarization, vision, classification, batch workflows) and provider ecosystems evolve quickly.
The existing router implementation from claw-llm-router provides a proven local classification + tier fallback pattern we can adapt to joelclaw’s architecture.

Considered Options

Option A: Keep current paths and add best-effort wrappers per caller
- Keep current inference implementations and add wrapper helpers around all call sites.
- Pros: minimal immediate refactor.
- Cons: never reaches single source of truth; long-term debt remains and policy remains distributed.
Option B: Force everything through gateway session inference
- Route all inference through gateway and retire system-bus/CLI-specific calls.
- Pros: quick to standardize at transport level.
- Cons: introduces avoidable coupling, recursion risks, and weaker fit for non-interactive/background Inngest jobs that do not need sessions.
Option C: Build/introduce a joelclaw inference-router service used by gateway, system-bus, and CLI
- Add a small, policy-first inference control plane with explicit model registry, classifiers, fallback policy, and provider adapters.
- Pros: centralized policy, highest flexibility, and uniform OTEL + Langfuse wiring.
- Cons: larger migration effort and requires strict rollout guardrails.

Decision Outcome

Chosen option: Option C, because it preserves task-specific optimization and allows joelclaw to keep pi where useful while centralizing policy, validation, cost metadata, and fallback governance across all AI consumers.

Consequences

Good, because all inference callsites will consume the same policy contract (InferRequest) and no longer duplicate routing assumptions.
Good, because router policy can be adjusted without editing unrelated functions (just update registry + policy version).
Good, because every call can emit:
- OTEL semantic events for operational triage.
- Langfuse traces for token/cost/quality review.
Bad, because this is a migration-heavy change touching gateway, system-bus, and CLI paths with temporary dual-stack risk.
Bad, because we need robust policy governance to avoid introducing centralized misconfiguration as a new blast radius.

Scope

In scope:
- Canonical model registry and aliases (anthropic/claude-*, openai/openai-codex, and provider models).
- Central inference router service API and shared client library.
- Migration of infer, recall, daily-digest, and high-frequency Inngest function inference callsites.
- Unified cost/event telemetry + policy observability.
- Policy-driven fallback and retry strategy consistent across gateway, system-bus, and CLI.
Out of scope:
- Rewriting all domain logic and model prompts.
- Changing gateway transport mechanics unrelated to LLM invocation.
- Introducing new vector/embedding infrastructure in this ADR (may happen later under a separate ADR).

Deep-dive findings before decision lock

Source-of-truth drift discovered

Model allowlists are split and inconsistent:
- packages/gateway/src/commands/config.ts
- packages/system-bus/src/lib/models.ts
- inline hard-coded models in packages/cli/src/commands/recall.ts and packages/system-bus/src/cli helpers.
Direct and indirect inference calls are mixed:
- packages/cli/src/commands/recall.ts (direct ANTHROPIC_MESSAGES_URL and OPENAI_CHAT_COMPLETIONS_URL).
- packages/system-bus/src/inngest/functions/transcript-process.ts (direct Anthropic).
- direct pi -p --no-session --no-extensions in x-content-hook, x-discovery-hook, discovery-capture, summarize.
- step.ai.infer in daily-digest.ts with model value outside current allowlist conventions.

Git history consulted

packages/system-bus/src/lib/inference.ts: migrations to shared pi utility (4123d9d, 2c8a997).
packages/gateway/src/model-fallback.ts: fallback model controller and event surface are already foundational.
packages/system-bus/src/lib/langfuse.ts and packages/system-bus/src/lib/models.ts are recent but still partial relative to total call surface.

External architecture reference

claw-llm-router already demonstrates:
- rule-based tier classification,
- provider matrix + override and fallback chain,
- adapter boundaries for direct provider calls vs gateway fallback,
- recursion protection for provider/model override.

Implementation Plan

Phase 1 — Canonical policy and contract (foundational)

Add new package module:
- packages/inference-router/ (or equivalent joelclaw-owned package).
- Core contract:
  - schema.ts with InferencePolicy, ModelRef, ProviderRef, RouteAttempt, InferenceEvent.
  - catalog.ts with canonical model IDs and compatibility aliases (including legacy aliases such as anthropic/claude-haiku -> anthropic/claude-haiku-4-5 where safe).
- Runtime config source:
  - file-based default policy under repo config and Redis override for live updates.
- Add deterministic event names:
  - model_router.request, model_router.route, model_router.fallback, model_router.result, model_router.fail.
Add a migration note for policy versioning and deprecation windows.

Phase 2 — Central client API + telemetry parity

Update inference entrypoints to consume the router:
- packages/system-bus/src/lib/inference.ts: re-implement using router contract instead of ad hoc model handling.
- packages/cli/src/commands/recall.ts: replace dual-provider direct calls with the same router client.
- packages/system-bus/src/inngest/functions/daily-digest.ts: replace step.ai.infer model path with central router request wrapper and explicit policy scope (digest, reasoning, speed).
Instrument in one place:
- OTEL span lifecycle:
  - inference.request, inference.classify, inference.fallback, inference.success, inference.failure.
- Langfuse per-call trace wrapper with:
  - policy version,
  - model/tier,
  - caller key (component.function_id),
  - estimated-cost metadata.
Preserve direct pi fallback behavior temporarily:
- where pi sessions are currently mandatory for existing auth flows, route via router provider adapters rather than bypassing policy.

Phase 3 — Gateway alignment

Replace direct model-id decisions in gateway command/config with policy-backed IDs.
- packages/gateway/src/commands/config.ts moves to canonical catalog names.
- packages/gateway/src/model-fallback.ts and packages/gateway/src/daemon.ts emit router events and consume the same routing telemetry fields.
Add a policy bridge:
- if request uses Anthropic OAuth-like auth or gateway-native session mode, router sets explicit provider override and avoids recursive routing loops.

Phase 4 — Inngest and background worker migration

Replace remaining direct calls and shell invocations:
- packages/system-bus/src/inngest/functions/transcript-process.ts
- packages/system-bus/src/inngest/functions/x-content-hook.ts
- packages/system-bus/src/inngest/functions/x-discovery-hook.ts
- packages/system-bus/src/inngest/functions/discovery-capture.ts
- packages/system-bus/src/inngest/functions/summarize.ts
- all other files from inventory that still use raw model strings (rg-discoverable hotspots in friction-fix, vip-email-received, observe, task-triage, content-sync, media-process, etc.).
Add optional Inngest guardrails:
- use inngest-events event names for policy rejections and degraded-mode escalation,
- use inngest-flow-control concurrency keys by policy (inference.<policy_id>) and failure classes (provider.*, model.*).

Phase 5 — Enforcement and rollout

Add a lint/lint-like check:
- block direct fetch to model provider endpoints in non-router modules (except gateway/provider adapters and temporary migration windows).
Add config validation at startup:
- unknown model IDs fail fast,
- ambiguous model aliases require explicit compatibility mapping.
Rollout by traffic class:
- low-risk background functions -> user-facing gateway flows -> recall/classification functions -> high-QPS ingestion functions.

Affected paths

packages/system-bus/src/lib/{inference.ts, models.ts, langfuse.ts}
packages/system-bus/src/inngest/functions/{channel-message-classify.ts,daily-digest.ts,discovery-capture.ts,x-content-hook.ts,x-discovery-hook.ts,summarize.ts,transcript-process.ts,observe.ts,sleep-mode.ts,task-triage.ts,vip-email-received.ts,friction-fix.ts} (exact set to be finalized in phase 4)
packages/gateway/src/{commands/config.ts,model-fallback.ts,daemon.ts}
packages/cli/src/commands/recall.ts
packages/inference-router/ new package (or equivalent package boundary)
apps/web/content/adrs/ and ~/Vault/docs/decisions/ for ADR records and discoverability

Patterns to follow

Use one inference request schema and enforce all callsites use it.
Keep policies declarative and versioned (policy_version).
Keep provider adapter logic isolated to one package.
Emit both OTEL and Langfuse metadata on the same inference lifecycle event.

Patterns to avoid

Do not add provider-specific branching in every function.
Do not keep local fallback arrays per callsite.
Do not emit success telemetry without prompt length/tier/model metadata.

Configuration

Add policy configuration:
- canonical model catalog,
- tier targets (simple, medium, complex, reasoning, vision, json),
- fallback chain per tier,
- provider quotas/limits.
Add opt-in environment flags:
- JOELCLAW_INFERENCE_ROUTER_URL,
- JOELCLAW_INFERENCE_POLICY_VERSION,
- JOELCLAW_INFERENCE_STRICT_MODE=true.

Migration order (explicit)

Read-only mode: router computes route but callsites still execute old calls while comparing emitted events.
Dual-write mode: emit OTEL + Langfuse for old and new paths in test environments.
Hard switch: all production callsites use router, direct provider calls removed.

Verification

rg inventory has zero direct provider endpoints in business callsites except whitelisted adapter modules (packages/inference-router, packages/gateway/src/providers, temporary migration files).
pnpm (or equivalent workspace test harness) runs smoke check for:
- packages/system-bus/src/inngest/functions/daily-digest.ts
- packages/system-bus/src/inngest/functions/observe.ts
- packages/cli/src/commands/recall.ts
- packages/gateway/src/daemon.ts that all invoke the shared inference client.
OTEL shows router lifecycle events for at least 4 policy branches (simple/medium/complex/reasoning) in one sample hour.
Langfuse receives parity traces for the same sample set with model/tier/cost metadata.
Policy conflict can be simulated in staging: force one provider to fail and validate fallback progression (no silent recursion, no unbounded retry loop).
A migration report shows number of direct pi -p --no-session --no-extensions calls reduced by phase and identifies remaining exceptions.

Implementation progress update (2026-02-25)

packages/system-bus/src/lib/inference.ts is the shared inference entrypoint in use for migrated callsites.
Added infer() adoption to:
- packages/system-bus/src/inngest/functions/content-review.ts
- packages/system-bus/src/inngest/functions/reflect.ts
- packages/system-bus/src/inngest/functions/memory/batch-review.ts
- packages/system-bus/src/inngest/functions/self-healing-router.ts
- packages/system-bus/src/inngest/functions/nas-backup.ts
- packages/system-bus/src/inngest/functions/discovery-capture.ts
- packages/system-bus/src/inngest/functions/meeting-analyze.ts
- packages/system-bus/src/inngest/functions/transcript-process.ts
- packages/cli/src/commands/recall.ts
Remaining in-scope direct pi-style / provider call paths still pending:
- packages/gateway/src/commands/config.ts (legacy model allowlist alignment)

ADR Review (Phase 3 checklist summary)

Context is understandable without prior tribal knowledge.
Trigger is explicit: distributed inference policy and observability fragmentation.
Decision is concrete and executable.
Scope includes in/out bullets.
Consequences list includes risks and benefits with follow-up tasks.
Plan names affected paths and configuration changes.
Verification criteria are specific and testable.
At least two options are compared with rejection reasons.
ADR status/metadata and filename convention match ADR conventions.

Resolved questions (2026-02-25)

Package boundary: Keep inference-router as separate packages/inference-router/. CLI and gateway already import independently — separate package enforces the contract boundary.
Fallback telemetry: OTEL always emits. Langfuse always emits (cloud-hosted, cost nominal vs inference spend). Complete coverage — every call is traceable for cost attribution, latency analysis, and prompt version tracking.
Enforcement mode: Strict in CI/deploy (unknown model ID = error), permissive in dev (warn + pass through). Environment-split avoids blocking experimentation while catching config drift in production.

Open questions

What migration SLA is acceptable for bringing gateway/src/commands/config.ts and non-router gateway command surfaces fully onto the same policy metadata discipline as system bus CLI inference callsites? Resolved 2026-02-25: Gateway resolveModel() and providerForModel() now use catalog lookup from @joelclaw/inference-router. Hardcoded MODEL_PROVIDERS map removed. Commit 865e10d.

Ship log (2026-02-25)

Package: 16/16 tests green, resolveModelFromCatalog, normalizeCatalogModel, routeInference exported
Gateway wiring: daemon.ts:resolveModel() uses catalog with console.log for observability. config.ts:providerForModel() delegates to catalog, falls back to “anthropic”
Fallback chain: env vars PI_MODEL/PI_MODEL_ID/PI_MODEL_PROVIDER still honored — catalog is advisory layer
Biome enforcement (ADR-0144): noRestrictedImports prevents regression to hardcoded maps
Validation scheduled: 2026-02-26 — verify Langfuse traces, catalog resolution logs, fallback behavior in production