ADR-0192accepted

Recall Rewrite Reliability Contract

  • Status: proposed
  • Date: 2026-03-02
  • Deciders: Joel, Panda
  • Relates to: ADR-0096, ADR-0140, ADR-0190, ADR-0191

Context

joelclaw recall query rewrite is intended to improve retrieval quality, but recent traces show frequent fallback (inference_rewrite_empty) with repeated latency cost and little semantic gain.

Rewrite must be treated as an optimization layer, not a mandatory step.

Decision

1) Make rewrite strictly optional at runtime

Recall retrieval must always succeed without rewrite.

  • Rewrite path may run only when budget policy and circuit state allow.
  • If rewrite is skipped/fails, retrieval proceeds immediately with normalized raw query.

2) Add rewrite skip heuristics

Skip rewrite before any LLM call when query is already simple/high-signal:

  • short exact-keyword queries,
  • quoted literal queries,
  • low-entropy command-like queries,
  • direct IDs/slugs/paths.

Emit reason codes (e.g. skip.short_query, skip.literal_query).

3) Bound rewrite failure loops

Apply ADR-0191 circuit behavior to component=recall-cli, action=recall.rewrite:

  • repeated inference_rewrite_empty opens circuit,
  • open circuit bypasses rewrite and logs skip,
  • half-open probes test recovery after cooldown.

4) Cache successful rewrite outputs

For identical normalized input query + context fingerprint:

  • cache successful rewrite result for a short TTL,
  • reuse cached rewrite to avoid repeated LLM calls,
  • invalidate cache on context fingerprint change.

5) Promote rewrite outcome observability

Every recall execution must record:

  • rewriteStrategy: disabled|skipped|haiku|openai|fallback
  • rewriteReason: success|skip.*|failure.*|circuit_open
  • rewriteDurationMs
  • rewriteUsed: boolean (rewritten query materially differs)

6) Set reliability SLO

Target contract for steady state:

  • fallback rate ≤ 20% over rolling window,
  • rewrite empty-output rate ≤ 5%,
  • median rewrite latency ≤ 1.2s,
  • rewrite path disabled automatically when outside SLO.

Consequences

Good

  • recall becomes robust even when rewrite is unstable,
  • eliminates repeated no-op rewrite spend,
  • preserves semantic gains where rewrite actually works.

Tradeoffs

  • added cache/state complexity in CLI recall adapter,
  • rewrite quality variability may reduce semantic expansion in short term.

Required Skills (Preflight)

  • recall — retrieval semantics, trust pass, budget profiles
  • langfuse — rewrite strategy/error visibility and trend validation
  • system-architecture — interaction with system-bus inference and telemetry
  • joelclaw — CLI behavior validation and operational checks

Implementation Plan (vector clock)

  1. V1: add skip-heuristic classifier in packages/cli/src/capabilities/adapters/typesense-recall.ts.
  2. V2: add rewrite circuit checks and state transitions (ADR-0191 contract) for recall.rewrite.
  3. V3: add rewrite-result cache keyed by normalized query + context fingerprint.
  4. V4: enrich Langfuse/OTEL payloads with strategy/reason/used flags.
  5. V5: add regression tests for skip, fallback, circuit-open, and cache reuse paths.

Implementation Progress

V1 (skip heuristics) — pre-existing

detectRewriteSkipReason() already skips rewrite for: short queries (≤24 chars, ≤3 tokens), quoted literals, direct identifiers (paths/slugs), and command-like queries (show|find|list|get|open). Emits rewriteReason: skip.*.

V2 (circuit breaker) — shipped 2026-03-04

File-persisted circuit breaker at ~/.joelclaw/state/recall-rewrite-circuit.json. State survives across CLI invocations (each joelclaw run is a fresh process).

  • Threshold: 3 consecutive failures → open circuit
  • Cooldown: 5 minutes → half-open probe
  • States: closed → open (after N failures) → half-open (after cooldown) → closed (on success)
  • OTEL: rewriteReason: circuit_open

V3 (rewrite cache) — shipped 2026-03-04

File-persisted rewrite cache at ~/.joelclaw/state/recall-rewrite-cache.json. Keyed on normalized query text.

  • TTL: 3 minutes
  • Max entries: 50 (LRU eviction)
  • Performance: 6.2s cold → 0.4s cache hit (15x speedup)
  • OTEL: rewriteReason: cache_hit

Timeout fix — shipped 2026-03-04

Default REWRITE_TIMEOUT_MS bumped from 2s to 6s. Pi cold-start on M4 Pro is ~3-4s, so the 2s timeout guaranteed failure on every first invocation — this was the root cause of the 41% empty recall rate seen in ADR-0190 scorecard. Configurable via JOELCLAW_RECALL_REWRITE_TIMEOUT env var.

V4 (OTEL enrichment) — pre-existing

Every recall already emits: rewriteStrategy, rewriteReason, rewriteDurationMs, rewriteModel, rewriteProvider, rewriteUsage, budgetApplied, budgetReason.

Verification Checklist

  • recall succeeds with equivalent output quality when rewrite is disabled/circuit-open
  • repeated rewrite-empty failures open rewrite circuit and stop repeated rewrite calls
  • successful rewrites are cached and reused for identical inputs
  • strategy/reason telemetry is present on every recall run
  • fallback and empty-output rates trend down after rollout (monitor via joelclaw memory scorecard)
  • V5: regression tests for skip, fallback, circuit-open, and cache reuse paths