User Messages Are the Signal — Agent Session Recall Without LLM Extraction

repoaimemoryrecallagent-loopsclaude-codepostgresqlsearchtypescript

directly comparable to joelclaw observe pipeline — raw message preservation vs LLM extraction, worth benchmarking retrieval quality against Typesense semantic search

Alex Hillman built Kuato to solve the amnesia problem that hits anyone running Claude Code seriously: the model forgets everything between sessions. His answer is to mine the JSONL session transcripts Claude Code writes to ~/.claude/projects locally, extracting the parts that actually matter. The key insight: user messages are the signal. You don’t need the full 50k-token transcript. The user’s requests, decisions, and corrections — combined with files_touched and tools_used — reconstruct what happened cleanly and cheaply.

The name comes from the mutant in Total Recall who helps Quaid remember who he is. The quote from the README earns its place: “You are what you do. A man is defined by his actions, not his memory.” In session terms: what you asked for is more durable than what the model generated. “Let’s build an email filtering system” → “yes, use that approach” → “actually make it async” → “commit this” — that’s the whole session in four user messages.

Two modes ship: file-based (zero dependencies, Bun only, substring search on JSONL, works in minutes) and PostgreSQL (tsvector full-text with weighted ranking — user messages weighted B, tools weighted C, working directory weighted D, GIN index for sub-100ms lookup). The tiered approach is smart. Start with zero setup, upgrade to Postgres when session volume makes scanning slow. File paths get tokenized by splitting on /, -, _, . so septa-holiday-bus/App.jsx becomes searchable tokens — search “septa” and find sessions that touched that directory.

This sits directly adjacent to the joelclaw recall and observe pipeline but makes a different bet. The joelclaw observe pipeline uses LLM extraction to distill facts from sessions into semantic memory — structured compression, clean assertions, searchable via Typesense. Kuato preserves raw user messages as-is: cheaper (no LLM call at ingest), preserves original phrasing, no extraction loss. The tradeoff is real: LLM extraction compresses but loses nuance; raw messages preserve intent but are noisier at retrieval time. Worth running both and comparing what you actually find when you search.

Key Ideas

  • User messages are the semantic core of a session — requests, decisions, corrections — not model responses. The model’s output is derivable from the user’s input; the reverse isn’t true
  • Two-tier architecture: file-based (zero-dep, instant) → PostgreSQL (tsvector, GIN index, weighted ranking) — start simple, graduate when volume demands it
  • tsvector weighting maps to signal strength: user messages (B) > tools used (C) > working directory (D) — higher weight for higher signal
  • File path tokenization: paths split on /, -, _, . make directory names like septa-holiday-bus searchable without special handling
  • Named after the Total Recall mutant — “you are what you do, not what you remember” — the session identity is in the actions
  • Tradeoff vs LLM extraction: raw messages = zero cost, preserves intent, noisier; LLM extraction = compressed, structured, lossy — explicit design choice, not an oversight
  • No instrumentation required — works directly on ~/.claude/projects JSONL files Claude Code already writes
  • Token tracking baked into the PostgreSQL version, including per-model breakdowns — useful for understanding session cost distribution
  • The skill template in shared/claude-skill.md shows how to teach Claude to query its own history — recursive session awareness