ADR-0203shipped

Compaction Recovery Pipeline

2026-03-04T00:00:00.000Z

Context and Problem Statement

Compaction is the single largest source of context loss in joelclaw sessions. When pi compacts, the LLM summarizes the conversation into ~2K tokens, discarding conversational nuance, debugging insights, decision rationale, and task intent. The agent continues with a lossy summary and whatever file lists pi tracks.

The canonical memory spec explicitly names session compaction as an observation trigger: “when a long session hits a context limit and compacts, the compacted content is observed before it’s discarded.” The write pipeline exists — session-lifecycle fires memory/session.compaction.pending, the observe-session Inngest function extracts observations via Haiku, applies write gates (ADR-0094: allow/hold/discard), dedupes them, and writes to Typesense.

But the spec’s pipeline diagram has a gap: nothing reads memory back after compaction. The memory-rag extension only injects on turn 2 of a fresh session — it has no compaction awareness. The feedback loop is broken at the retrieval-injection step.

Current flow (broken feedback loop)

session_before_compact
  → flush file ops to daily log (session-lifecycle)
  → fire memory/session.compaction.pending Inngest event
  
pi built-in compaction
  → LLM summarizes conversation → lossy summary replaces context
 
post-compaction
  → NOTHING. Agent resumes with summary only.
  → Inngest observe pipeline runs async (may complete minutes later)
  → Observations land in Typesense but are never re-injected

What gets lost

Signal type	Survives compaction?	Example
File paths read/modified	✅ Pi tracks these	`apps/web/src/app/page.tsx`
Current task description	⚠️ Sometimes in summary	”Building skill-tracker extension”
Debugging insights	❌ Lost	”The type is ExtensionAPI not PiExtensionContext”
Decision rationale	❌ Lost	”Chose tool_call hook over tool_result because we need the path before execution”
Failed approaches	❌ Lost	”Tried bun build but it can’t resolve pi imports at compile time”
User preferences expressed	❌ Lost	”Joel wants data-driven auditing, not vibes-based pruning”
Task checkpoint (what’s done, what’s next)	❌ Lost	”3 of 5 sub-tasks complete, blocked on X”

Measured impact

42 compactions per session (average, from context budget analysis 2026-03-04)
~35K token system prompt means compaction triggers at ~65K tokens of conversation
Each compaction discards ~63K tokens of raw context, replacing with ~2K summary
The grind-mode bug (fixed today) was also triggering compaction at 50% in ALL sessions with forced continuation turns, but that’s now resolved

Key Constraint: Post-Compaction Context is 40% Full

Pi’s compaction settings:

reserveTokens: 16384 — summary target size
keepRecentTokens: 20000 — recent conversation preserved verbatim

After compaction on a 200K context window:

~35K system prompt (skills, AGENTS.md, identity)
~16K compaction summary
~20K kept recent conversation
= ~71K consumed = ~36% immediately

On a 128K window it’s worse: ~55% consumed post-compaction.

This means any post-compaction injection must be pointers and references, not content. Dumping recall results as text burns tokens on stuff the agent can look up on demand. The injection should be: “here’s what you were doing, here are the recall queries to run if you need more.”

Decision

Implement a three-stage compaction recovery pipeline in the session-lifecycle extension, using Typesense hybrid search (keyword + vector) as the retrieval backbone — not regex pattern matching.

Stage 1: Typesense recall (pre-compaction, continuous)

Hook: turn_end — runs after EVERY turn, not just at compaction time.

The critical insight: session_before_compact fires milliseconds before the compaction LLM call — too late for meaningful extraction without adding latency. Instead, query Typesense incrementally every turn when context usage crosses a threshold.

Mechanism:

On every turn_end, check ctx.getContextUsage().
When context crosses 40% (“warm zone”), fire async recall queries against Typesense using recent user messages as search terms. Cache results in-memory.
When context crosses 60% (“hot zone”), write task context to durable memory via joelclaw memory write and re-query if user messages changed.
Continue querying each turn in the hot zone — stale-hash check prevents redundant queries.

This means by the time compaction fires (at ~80-90%), we’ve already retrieved relevant memories from Typesense, with zero impact on the compaction critical path.

Why Typesense recall, not regex extraction:

Local Typesense is fast (~370ms with lean budget, no query rewrite)
Hybrid search (keyword + vector) finds semantically relevant memories that regex can’t
The async Inngest observe pipeline already handles semantic extraction from transcripts via LLM — this pipeline focuses on retrieval of existing knowledge, not extraction of new signal
Regex was removed entirely — it was lossy, crude, and under-leveraged the vector capabilities we already have

From user messages (highest signal — per the spec: “user messages are the signal”):

Last 3 user messages → currentTask field (user intent is the most reliable task signal)
Last 2 messages joined → recall query (what Typesense searches for)

Recall deduplication: A hash of recent user messages prevents re-querying Typesense when the task hasn’t changed. A recallInFlight flag prevents concurrent recall spawns.

File tracking: tool_execution_start hook captures edit/write file paths into an in-memory Set. These are included in the checkpoint and pointer message.

Memory writes: Task context (user messages + modified files) is written via joelclaw memory write with --category ops --tags compaction-extract,session-task. This lands in Typesense and becomes available for recall in future sessions.

OTEL: Emit compaction.extract.warm (40% threshold crossed) and compaction.extract.hot (each hot-zone flush).

Stage 2: In-memory task checkpoint

Hook: turn_end (at 60% threshold) + session_before_compact (final flush)

Maintain an in-memory TaskCheckpoint built from tracked state. No Redis needed — the checkpoint data flows within the same pi process (turn_end → session_compact).

Checkpoint structure:

interface TaskCheckpoint {
  currentTask: string;           // last 3 user msgs joined, ≤500 chars
  filesModified: string[];       // most recent 10
  recallHits: string[];          // actual observations from Typesense (max 5)
  recallQueries: string[];       // validated queries that returned results (max 3)
  compactionCount: number;
  contextPercentAtCapture: number;
}

Storage: In-memory variables in the extension factory closure. Same process lifecycle as the pi session — no cross-process coordination needed. Redis was considered but rejected as unnecessary complexity for same-process data flow.

The recallHits and recallQueries fields are key. These contain real observations and validated queries from Typesense, not string-munged guesses. The recall cache accumulates across turns in the warm/hot zone. If Typesense returned results, those queries are known-good. Fallback: derive a simple query from task text if the recall cache is empty.

Stage 3: Post-compaction pointer injection

Hook: session_compact (fires after compaction completes)

Inject a pointer message with real memories from Typesense — validated observations and queries that are known to return results.

Injection content (~150-300 tokens):

## Session Recovery
**Task:** {currentTask, 200 chars max}
**Modified:** {top 5 files}
**Related memories:**
- {actual observation from Typesense, 150 chars}
- {actual observation from Typesense, 150 chars}
- {actual observation from Typesense, 150 chars}
**Deeper context:** `recall "{validatedQuery1}"` or `recall "{validatedQuery2}"`

This is a signpost backed by real data. The “Related memories” section contains actual observations from Typesense that scored well against the current task. The recall queries are validated — they returned results when we ran them during the warm/hot zone.

Injection: Hidden sendMessage with display: false. NO triggerTurn: true — passive context only.

Post-injection reset: Zone flags (warmZoneEntered, hotZoneEntered) reset after each compaction. Context drops back to 36-55% post-compaction, so the warm/hot zone detection restarts for the next cycle. The recall cache persists across compactions for continuity.

OTEL: Emit compaction.inject with checkpoint metadata (file count, recall hits count, queries count).

Flow after implementation

turn_end (context at 40%)
  → [Stage 1] fire async Typesense recall query (lean budget, ~370ms)
  → cache results in-memory
 
turn_end (context at 60%)
  → [Stage 1] re-query Typesense if user messages changed
  → [Stage 2] write task context to durable memory (joelclaw memory write)
 
turn_end (context at 70%, 75%, 80%...)
  → [Stage 1+2] continue querying + accumulating recall cache
 
session_before_compact (context at ~85%)
  → [EXISTING] flush file ops to daily log
  → [EXISTING] fire memory/session.compaction.pending
  → [Stage 2] final task context write + OTEL checkpoint
 
pi built-in compaction
  → LLM summarizes → lossy summary (unchanged)
 
session_compact
  → [Stage 3] build pointer from in-memory checkpoint + Typesense recall cache
  → inject recovery message (display: false)
  → reset warm/hot zone flags for next cycle
 
(async, minutes later)
  → [EXISTING] Inngest observe pipeline extracts deeper observations via LLM

Consequences

Positive

Compaction recovery is no longer zero. The agent resumes with summary + pointer message (task, files, real memories, validated recall queries).
Leverages existing Typesense infrastructure. Local hybrid search (~370ms) replaces regex pattern matching. Vector + keyword search finds semantically relevant memories that regex can’t.
Retrieval happens gradually, not in the critical path. Async recall queries fire during normal turns. By compaction time, the cache is warm.
Validated queries over guesses. Recall queries in the pointer message are known-good — they returned results from Typesense during the warm/hot zone.
No new infrastructure. Typesense and joelclaw recall already exist. ctx.getContextUsage() is a built-in pi API. In-memory state needs no external storage.
Grind mode benefits most. Long autonomous sessions hit compaction repeatedly. Each recovery gets richer as more recall results accumulate.
Observable. OTEL events at each threshold: compaction.extract.warm, compaction.extract.hot, compaction.checkpoint, compaction.inject.
Two complementary loops. This pipeline does real-time retrieval. The Inngest observe pipeline does deep LLM-based extraction. They serve different purposes and reinforce each other.

Negative

turn_end handler runs every turn. The context % check is cheap (one function call). Recall queries only fire when task hash changes — not every turn. Async and non-blocking.
Recall quality depends on existing memory corpus. If the memory system has few observations, recall returns sparse results. The pipeline degrades gracefully — falls back to task-derived queries.
Recall cache is in-memory only. If pi crashes (not compacts — crashes), the cache is lost. Acceptable: crash recovery is a different problem than compaction recovery.
Pointer message token budget varies. With 3 recall hits (150 chars each) + task + files + queries, the injection is ~150-300 tokens. Larger than the original 100-token target but still well within budget given 36-55% post-compaction headroom.

Neutral

Does not change pi’s built-in compaction. The LLM summary is unmodified. We supplement it.
Does not change the async observe pipeline. Inngest still fires memory/session.compaction.pending. This pipeline is additive.

Alternatives Considered

A: Custom compaction via `session_before_compact` return

Pi allows extensions to return a custom CompactionResult, bypassing the built-in LLM summary entirely.

Rejected: Too risky. Pi’s compaction handles edge cases (split turns, file ops, token budgets). Replacing it means maintaining a parallel summarizer. Supplementing is safer.

B: LLM-based pre-compaction extraction (synchronous)

Run an LLM call in session_before_compact to extract decisions and insights.

Rejected: Too slow for the compaction critical path. The async Inngest pipeline already does LLM extraction. The continuous retrieval approach (Stage 1) avoids the critical path entirely.

C: Regex-based signal extraction from assistant messages

Pattern-match decision/failure markers (“decided to”, “the fix was”, “turns out”) from assistant text.

Rejected during implementation. Regex is crude and under-leverages the Typesense infrastructure we already have. Local Typesense with hybrid search (keyword + vector) returns semantically relevant real memories in ~370ms. The async Inngest observe pipeline handles nuanced semantic extraction via LLM — duplicating that with regex is the worst of both worlds.

D: Redis checkpoint with TTL

Write task checkpoint to Redis key session:checkpoint:{sessionId} with 4h TTL.

Rejected during implementation. The checkpoint data flows within a single pi process (turn_end → session_compact). No cross-process coordination needed. In-memory state is simpler, faster, and has no connection/TTL failure modes.

E: Extract only at `session_before_compact`

Wait for the compaction event to do all extraction.

Rejected: session_before_compact fires in the compaction critical path. Any work there adds latency. Worse, the hook fires once — if it fails or times out, there’s no recovery. Continuous retrieval across turns is resilient (each turn is independent) and zero-latency on the compaction itself.

Implementation

Files modified

File	Change
`pi/extensions/session-lifecycle/index.ts`	All three stages implemented inline. Added `tool_execution_start` handler for file tracking, `turn_end` handler for recall + checkpoint, enhanced `session_before_compact` with final flush, added `session_compact` handler for pointer injection.

All code is inline in index.ts

All pi extensions in the repo are single index.ts files. No multi-file extensions exist. The recall helper (runRecall), CLI helpers (spawnJoelclaw, emitOtel, writeMemoryObs), and types (TaskCheckpoint, RecallHit, RecallResult) are all defined in the same file.

External interactions via joelclaw CLI only

All interactions with external services go through the joelclaw CLI:

joelclaw recall — Typesense hybrid search (spawned async via runJoelclawJsonCommand)
joelclaw memory write — write observations to Typesense (spawned fire-and-forget via spawnJoelclaw)
joelclaw otel emit — telemetry (spawned fire-and-forget via spawnJoelclaw)

No direct Redis calls, no direct Typesense HTTP calls, no redis-cli, no ioredis.

Verification

Start a session with the extension loaded, do substantive work (10+ turns to hit warm zone)
Check OTEL: joelclaw otel search "compaction.extract" --hours 1 — should show warm and hot events
Wait for compaction (or work until ~85% context)
After compaction, check OTEL: joelclaw otel search "compaction.inject" --hours 1 — should show inject event
Check the session JSONL for customType: "compaction-recovery" message with real Typesense observations
Verify the agent can follow the recall query pointers to get deeper context

Rollback

All three stages are additive — they inject supplementary context but don’t modify pi’s compaction or the existing observe pipeline. To rollback: remove the turn_end, tool_execution_start, and session_compact handlers from session-lifecycle. The extension continues to function with its existing file-op flush and Inngest event emission.

Known Issues & Fixes

Double-compaction cascade (fixed 2026-03-04)

Symptom: Two auto-compactions fire in rapid succession. The first is correct (compacts substantial context). The second fires seconds later and barely compacts anything.

Root cause chain (traced through pi-mono source):

Turn completes → agent_end → _checkCompaction() → context exceeds contextWindow - reserveTokens → first compaction fires ✅
Pi generates the compaction summary, saves it, fires session_compact event to extensions
Both extension handlers (session-lifecycle + gateway) call pi.sendMessage() to inject recovery pointers as queued messages

Back in pi’s _runAutoCompaction(), after the extension emit:

} else if (this.agent.hasQueuedMessages()) {
    setTimeout(() => { this.agent.continue().catch(() => {}); }, 100);
}

The sendMessage() calls made hasQueuedMessages() return true → continue() fires
Model processes the recovery pointer messages, produces a response
That response triggers agent_end → _checkCompaction() again
The compaction summary is large (~16K tokens) + ~35K system prompt + ~20K kept messages + recovery pointers + model response → context exceeds threshold again
Second compaction fires — but there’s only the recovery pointers + one model response to summarize

Key insight: pi.sendMessage() inside session_compact handlers queues messages that trigger continue(), which gets a model response, which triggers another compaction check. The compaction summary + system prompt already consumes 36-55% of the context window (see “Key Constraint” section above), so any additional content can push past the threshold.

Fix: Added a 60-second cooldown guard in both session_compact handlers:

const isRapidRecompaction = lastCompactionTs > 0 && (now - lastCompactionTs) < COMPACTION_COOLDOWN_MS;
if (isRapidRecompaction) {
  // Skip injection — first compaction's pointers are still in context
  emitOtel("compaction.inject.skipped", { compactionCount, reason: "rapid-recompaction" });
  return;
}

First compaction gets the full recovery pointers. The cascading second compaction (if it still fires due to summary size) skips injection entirely — no new messages queued, no continue() trigger, no cascade.

OTEL observability for tuning:

Event	When	Key fields
`compaction.before`	`session_before_compact`	`compactionCount`, `contextPercent`, `contextTokens`, `contextLimit`, `timeSinceLastCompactionMs`
`compaction.inject`	`session_compact` (first)	`compactionCount`, `contentChars`, `estimatedTokens`, file/recall counts
`compaction.inject.skipped`	`session_compact` (rapid)	`compactionCount`, `reason: "rapid-recompaction"`
`gateway.compaction.inject`	gateway `session_compact` (first)	`gwCompactionCount`, `contentChars`, `estimatedTokens`
`gateway.compaction.inject.skipped`	gateway `session_compact` (rapid)	`gwCompactionCount`, `reason: "rapid-recompaction"`

Tuning queries:

# Check if double-compaction is still occurring
joelclaw otel search "compaction.inject.skipped" --hours 24
 
# See the timing between compactions
joelclaw otel search "compaction.before" --hours 24
 
# Compare injection sizes to see if pointers are too large
joelclaw otel search "compaction.inject" --hours 24

Future considerations:

If the 60s cooldown is too aggressive (blocks legitimate rapid compactions on small context windows), reduce to 30s or make it relative to context window size.
If the compaction summary itself is too large (pushing past threshold even without our injections), consider returning a custom compaction via session_before_compact that caps summary size.
The ideal fix would be in pi itself: don’t call continue() for display-false custom messages, or add a triggerContinue: false option to sendMessage(). Filed as a potential upstream PR.