User Messages Are the Signal — Agent Session Recall Without LLM Extraction
directly comparable to joelclaw observe pipeline — raw message preservation vs LLM extraction, worth benchmarking retrieval quality against Typesense semantic search
Alex Hillman built Kuato to solve the amnesia problem that hits anyone running Claude Code seriously: the model forgets everything between sessions. His answer is to mine the JSONL session transcripts Claude Code writes to ~/.claude/projects locally, extracting the parts that actually matter. The key insight: user messages are the signal. You don’t need the full 50k-token transcript. The user’s requests, decisions, and corrections — combined with files_touched and tools_used — reconstruct what happened cleanly and cheaply.
The name comes from the mutant in Total Recall who helps Quaid remember who he is. The quote from the README earns its place: “You are what you do. A man is defined by his actions, not his memory.” In session terms: what you asked for is more durable than what the model generated. “Let’s build an email filtering system” → “yes, use that approach” → “actually make it async” → “commit this” — that’s the whole session in four user messages.
Two modes ship: file-based (zero dependencies, Bun only, substring search on JSONL, works in minutes) and PostgreSQL (tsvector full-text with weighted ranking — user messages weighted B, tools weighted C, working directory weighted D, GIN index for sub-100ms lookup). The tiered approach is smart. Start with zero setup, upgrade to Postgres when session volume makes scanning slow. File paths get tokenized by splitting on /, -, _, . so septa-holiday-bus/App.jsx becomes searchable tokens — search “septa” and find sessions that touched that directory.
This sits directly adjacent to the joelclaw recall and observe pipeline but makes a different bet. The joelclaw observe pipeline uses LLM extraction to distill facts from sessions into semantic memory — structured compression, clean assertions, searchable via Typesense. Kuato preserves raw user messages as-is: cheaper (no LLM call at ingest), preserves original phrasing, no extraction loss. The tradeoff is real: LLM extraction compresses but loses nuance; raw messages preserve intent but are noisier at retrieval time. Worth running both and comparing what you actually find when you search.
Key Ideas
- User messages are the semantic core of a session — requests, decisions, corrections — not model responses. The model’s output is derivable from the user’s input; the reverse isn’t true
- Two-tier architecture: file-based (zero-dep, instant) → PostgreSQL (tsvector, GIN index, weighted ranking) — start simple, graduate when volume demands it
- tsvector weighting maps to signal strength: user messages (B) > tools used (C) > working directory (D) — higher weight for higher signal
- File path tokenization: paths split on
/,-,_,.make directory names likesepta-holiday-bussearchable without special handling - Named after the Total Recall mutant — “you are what you do, not what you remember” — the session identity is in the actions
- Tradeoff vs LLM extraction: raw messages = zero cost, preserves intent, noisier; LLM extraction = compressed, structured, lossy — explicit design choice, not an oversight
- No instrumentation required — works directly on
~/.claude/projectsJSONL files Claude Code already writes - Token tracking baked into the PostgreSQL version, including per-model breakdowns — useful for understanding session cost distribution
- The skill template in
shared/claude-skill.mdshows how to teach Claude to query its own history — recursive session awareness
Links
- alexknowshtml/kuato — the repo
- Alex Hillman — builder, co-founder of Stacking the Bricks
- PostgreSQL Full-Text Search — tsvector/tsquery
- GIN Indexes — what makes the Postgres mode fast
- Total Recall (1990) — the naming source
- Claude Code — the tool whose session files Kuato parses