ADR-0096accepted

Budget-Aware Memory Retrieval Policy

2026-02-22T00:00:00.000Z

Update (2026-02-22)

Initial implementation slice landed:
- joelclaw recall now supports --budget lean|balanced|deep|auto
- budget plan controls rewrite enablement, fetch depth, and inject cap behavior
- recall OTEL metadata now includes budget selection diagnostics
- memory context prefetch now supports budget-profile-based fetch scaling
ADR status remains proposed until quality/latency validation gates are met.

ADR-0077 deferred budget-aware retrieval. ADR-0078 established token-cost pressure as a system concern. Current memory retrieval uses mostly fixed behavior (rewrite attempts, fetch depth, inject caps), which is suboptimal across contexts:

simple recalls overpay in latency and token cost,
complex recalls may under-search,
no explicit budget policy exists for operators or agents.

Budget-aware retrieval is needed to trade off quality/latency/cost intentionally.

Decision

Introduce a shared retrieval budget policy with three profiles:

lean — minimal cost/latency
balanced — default
deep — higher quality effort for hard queries

All memory retrieval paths must declare or infer a profile, then apply profile-specific limits for rewrite/search/ranking/injection.

Policy Contract

Profile matrix (initial)

lean
- query rewrite: disabled by default
- candidate fetch: low
- trust-pass: strict
- injected memories: 3-5
balanced
- query rewrite: single attempt
- candidate fetch: medium
- trust-pass: standard
- injected memories: 6-10
deep
- query rewrite: full fallback chain
- candidate fetch: high
- trust-pass: permissive with diagnostics
- injected memories: up to configured max

Budget metadata must be visible in CLI JSON and OTEL events.

Implementation Plan

1) Shared policy module

packages/system-bus/src/memory/retrieval-budget.ts (new)
packages/cli/src/commands/recall.ts (consume policy)
packages/system-bus/src/memory/context-prefetch.ts (consume policy)

2) CLI/API profile controls

Add profile selection and auto mode:

--budget lean|balanced|deep|auto
optional --max-latency-ms and --max-inject

Files:

packages/cli/src/commands/recall.ts

3) Auto profile inference

Infer budget from query complexity + caller context + optional cost mode.

packages/system-bus/src/memory/retrieval-budget.ts
packages/system-bus/src/inngest/functions/check-email.ts (and other prefetch callers)

4) Observability + governance

Emit profile and budget diagnostics:

profile selected
rewrite attempts
candidate count
injected count
latency

Files:

packages/cli/src/commands/recall.ts
packages/system-bus/src/observability/*
packages/cli/src/commands/inngest.ts (health/reporting surfaces)

Acceptance Criteria

Every retrieval path emits budget_profile and budget diagnostics in OTEL.
joelclaw recall --budget <profile> deterministically changes retrieval behavior.
Lean profile reduces latency/cost for simple queries without catastrophic quality loss.
Deep profile improves difficult-query hit quality compared to balanced baseline.
Default auto profile is explainable in output (why this profile was selected).

Verification Commands

bunx tsc --noEmit -p packages/system-bus/tsconfig.json
bunx tsc --noEmit -p packages/cli/tsconfig.json
bun test packages/cli/src/commands/recall.test.ts
joelclaw recall "redis lock pattern" --budget lean --json
joelclaw recall "cross-session memory dedupe failure mode" --budget deep --json
joelclaw otel search "budget_profile|memory.recall" --hours 24

Non-Goals

Per-provider billing reconciliation in this ADR.
Changing memory storage backend.
Knowledge-graph retrieval.

Consequences

Positive

Predictable quality/cost/latency tradeoffs.
Better defaults for autonomous agents under variable workloads.
Strong foundation for future global budget controls.

Negative / Risks

Mis-tuned profile defaults can degrade relevance.
More policy surface to maintain and test.

References

ADR-0077: Memory System — Next Phase
ADR-0078: Opus Token Reduction
ADR-0095: Typesense-Native Memory Categories (dependency for domain-aware budgeting)

More Information

2026-02-22 validation snapshot

Budget corpus run (6 queries, --limit 5, lean vs deep):

lean average latency: ~618ms
deep average latency: ~5375ms

Observed behavior:

Lean consistently disables rewrite and returns quickly.
Deep consistently enables rewrite and higher fetch depth.
Auto mode surfaces explicit selection reason in output/OTEL (budgetRequested, budgetApplied, budgetReason).
Context prefetch now emits dedicated OTEL budget diagnostics (memory.context_prefetch.completed) with budget_profile, fetch depth, and filter/drop metrics.
Added daily ADR evidence capture loop (system/memory-adr-evidence-capture) so memory/adr-evidence.daily.captured records rolling budget diagnostics across a 7-day window.
Quality uplift from deep over baseline is mixed in this corpus; not yet consistently better.

Remaining acceptance gaps:

Deep-quality superiority gate is not yet met.

Status

Proposed (pending deep-quality evidence gate).