Recall Rewrite Reliability Contract
- Status: proposed
- Date: 2026-03-02
- Deciders: Joel, Panda
- Relates to: ADR-0096, ADR-0140, ADR-0190, ADR-0191
Context
joelclaw recall query rewrite is intended to improve retrieval quality, but recent traces show frequent fallback (inference_rewrite_empty) with repeated latency cost and little semantic gain.
Rewrite must be treated as an optimization layer, not a mandatory step.
Decision
1) Make rewrite strictly optional at runtime
Recall retrieval must always succeed without rewrite.
- Rewrite path may run only when budget policy and circuit state allow.
- If rewrite is skipped/fails, retrieval proceeds immediately with normalized raw query.
2) Add rewrite skip heuristics
Skip rewrite before any LLM call when query is already simple/high-signal:
- short exact-keyword queries,
- quoted literal queries,
- low-entropy command-like queries,
- direct IDs/slugs/paths.
Emit reason codes (e.g. skip.short_query, skip.literal_query).
3) Bound rewrite failure loops
Apply ADR-0191 circuit behavior to component=recall-cli, action=recall.rewrite:
- repeated
inference_rewrite_emptyopens circuit, - open circuit bypasses rewrite and logs skip,
- half-open probes test recovery after cooldown.
4) Cache successful rewrite outputs
For identical normalized input query + context fingerprint:
- cache successful rewrite result for a short TTL,
- reuse cached rewrite to avoid repeated LLM calls,
- invalidate cache on context fingerprint change.
5) Promote rewrite outcome observability
Every recall execution must record:
rewriteStrategy:disabled|skipped|haiku|openai|fallbackrewriteReason:success|skip.*|failure.*|circuit_openrewriteDurationMsrewriteUsed: boolean (rewritten query materially differs)
6) Set reliability SLO
Target contract for steady state:
- fallback rate ≤ 20% over rolling window,
- rewrite empty-output rate ≤ 5%,
- median rewrite latency ≤ 1.2s,
- rewrite path disabled automatically when outside SLO.
Consequences
Good
- recall becomes robust even when rewrite is unstable,
- eliminates repeated no-op rewrite spend,
- preserves semantic gains where rewrite actually works.
Tradeoffs
- added cache/state complexity in CLI recall adapter,
- rewrite quality variability may reduce semantic expansion in short term.
Required Skills (Preflight)
recall— retrieval semantics, trust pass, budget profileslangfuse— rewrite strategy/error visibility and trend validationsystem-architecture— interaction with system-bus inference and telemetryjoelclaw— CLI behavior validation and operational checks
Implementation Plan (vector clock)
- V1: add skip-heuristic classifier in
packages/cli/src/capabilities/adapters/typesense-recall.ts. - V2: add rewrite circuit checks and state transitions (ADR-0191 contract) for
recall.rewrite. - V3: add rewrite-result cache keyed by normalized query + context fingerprint.
- V4: enrich Langfuse/OTEL payloads with strategy/reason/used flags.
- V5: add regression tests for skip, fallback, circuit-open, and cache reuse paths.
Implementation Progress
V1 (skip heuristics) — pre-existing
detectRewriteSkipReason() already skips rewrite for: short queries (≤24 chars, ≤3 tokens), quoted literals, direct identifiers (paths/slugs), and command-like queries (show|find|list|get|open). Emits rewriteReason: skip.*.
V2 (circuit breaker) — shipped 2026-03-04
File-persisted circuit breaker at ~/.joelclaw/state/recall-rewrite-circuit.json. State survives across CLI invocations (each joelclaw run is a fresh process).
- Threshold: 3 consecutive failures → open circuit
- Cooldown: 5 minutes → half-open probe
- States: closed → open (after N failures) → half-open (after cooldown) → closed (on success)
- OTEL:
rewriteReason: circuit_open
V3 (rewrite cache) — shipped 2026-03-04
File-persisted rewrite cache at ~/.joelclaw/state/recall-rewrite-cache.json. Keyed on normalized query text.
- TTL: 3 minutes
- Max entries: 50 (LRU eviction)
- Performance: 6.2s cold → 0.4s cache hit (15x speedup)
- OTEL:
rewriteReason: cache_hit
Timeout fix — shipped 2026-03-04
Default REWRITE_TIMEOUT_MS bumped from 2s to 6s. Pi cold-start on M4 Pro is ~3-4s, so the 2s timeout guaranteed failure on every first invocation — this was the root cause of the 41% empty recall rate seen in ADR-0190 scorecard. Configurable via JOELCLAW_RECALL_REWRITE_TIMEOUT env var.
V4 (OTEL enrichment) — pre-existing
Every recall already emits: rewriteStrategy, rewriteReason, rewriteDurationMs, rewriteModel, rewriteProvider, rewriteUsage, budgetApplied, budgetReason.
Verification Checklist
- recall succeeds with equivalent output quality when rewrite is disabled/circuit-open
- repeated rewrite-empty failures open rewrite circuit and stop repeated rewrite calls
- successful rewrites are cached and reused for identical inputs
- strategy/reason telemetry is present on every recall run
- fallback and empty-output rates trend down after rollout (monitor via
joelclaw memory scorecard) - V5: regression tests for skip, fallback, circuit-open, and cache reuse paths