The memory system that watches itself

Apr 20, 2026· updated Apr 20, 2026

memoryjoelclawtypesenseollamapdsatprotoclaude-codepi

My session table has a column with the full jsonl transcript, then specific columns for parsed out meta for easy and fast recall.

Columns for:

user messages

agent messages

tool calls

files touched

entities mentioned/resolved

skills invoked

I don’t know that you even need anything beyond that 🤔

I’d been wanting to do this for weeks and couldn’t see the shape. Alex saw the shape.

Two days later the memory system is live, capturing every claude-code and pi turn across two of my machines into a hybrid-searchable archive. It’s also capturing the session I’m using to write this article. The tree view shows the recursion in real time.

Here’s how it went.

I used a skill to slow myself down

The temptation was to start writing code. Instead I loaded Matt Pocock’s domain-model skill (MIT, vendored into skills/domain-model/) and let it interview me.

The skill is a grilling pattern. One question at a time, with its recommended answer, waiting for me to confirm or redirect. No plans, no dumps — just Socratic drilling down the design tree until every branch resolves. You can’t skip ahead; the next question doesn’t form until you commit on this one.

Twelve questions later I had a CONTEXT.md at the repo root with 13 terms and 21 architectural rules. Things that would have bitten me mid-build got named and settled:

“Session” is already booked. pi calls a conversation a session, claude-code writes SESSION.md files, the gateway’s always-on daemon is “the session.” I needed a new word for “one captured agent thing.” Run. Atomic unit = one invocation. Runs form trees via parent_run_id.
Ingestion is Central. Other machines ship raw jsonl + identity. Central does the chunking, embedding, indexing. Machines never run embedding models. Non-technical family members’ devices need to work with one CLI installed and nothing else.
Private by default, sharing is explicit. Kids’ data is the forcing case. Default-private scales as trust evolves; default-open does not.
NAS is authoritative, the search index is rebuildable. Invert the usual DB+search pattern. Each Run is a jsonl blob + metadata.json on NAS. Typesense is a derived index that can be rebuilt from NAS at any time. Schema changes, embedding upgrades, chunking shifts — all “re-walk NAS and rebuild” operations. Not database migrations.
Identity is PDS. I already run an AT Protocol Personal Data Server. Every family member gets a DID. Every Machine gets an AT Proto App Password. For once the nerd-appeal option was also the right call — portable identity I don’t own the keys to hoard, revocable per-device.

The grill also produced the thing I’m most grateful for: explicit non-goals. Federation with external DIDs, signed-request envelopes, PDS audit records, invite-link self-serve, entity linking to canonical contacts — all listed, each with a designed insertion point. When a maintainer comes along (me, tomorrow) those explicit “not now” items prevent the slow slide into scope creep.

I should have been using domain-model on everything for the last month.

The spike before the spec

Before accepting ADR-0243, I wanted to know the embedding pipeline actually worked on my data. A two-hour vertical slice:

Pull qwen3-embedding:8b via Ollama. It’s currently #1 on the MTEB multilingual leaderboard at 70.58. It supports Matryoshka dimension truncation — embed once at 4096-dim, serve at any dimension from 32 to 4096 with no re-embedding. I picked 768 for storage to match nomic’s RAM footprint while keeping the upgrade lever.
Ingest a real 1,247-line claude-code session directly into a Typesense spike collection.
Query for things I remembered happening in that session.

708 chunks. “why did the cluster fail” semantically retrieved the connection-refused errors with vector distance 0.28. The qwen3 quality on mixed English+code+tool-output was better than I expected. The architectural choices looked right. I accepted the ADR.

The 25x win I almost missed

Running the bulk ingest, I saw this:

query latency (idle):       ~220ms
query latency (bulk running): 8000-10000ms

Query embedding during a bulk backfill was 8-10 seconds. Unusable for interactive search. The fix I’d sketched in the ADR — “use Inngest priority to schedule embeds” — was wrong. Inngest priority only matters at the HTTP layer. Ollama serializes embed calls internally. Pooling concurrent HTTP requests doesn’t help; they just queue inside Ollama.

The real fix had to be before the HTTP call. An in-process priority queue with three lanes:

query — interactive search, never starved
ingest-realtime — live captures, normal priority
ingest-bulk — reindex and backfill, lowest, drops out when anything else arrives

PriorityEmbedClient in @joelclaw/inference-router/embeddings.ts. Single-writer semaphore. When a query arrives, it jumps to the head of the queue and goes as soon as the currently-in-flight embed finishes.

First test failed at 2354ms under bun test. Long enough to make me think the whole premise was wrong.

Turns out the repo uses vitest, not bun:test — a detail from a stale assumption. Re-ran under vitest:

query_total_ms=338  queued_ms=155  compute_ms=183
depth_before_submit={"query":0,"ingest-realtime":0,"ingest-bulk":98}

338ms with 98 bulk embeds queued ahead. A 25x improvement over the 8-10s baseline. Rule 9a validated, committed as a verification checkbox in the ADR.

I updated the ADR with a deviation note: the original Rule 20 said “validate bearer via PDS createSession on every request, cached 60s.” When I got there I realized sha256-hash-the-token-and-look-it-up-in-Typesense was cheaper, safer, and didn’t create server-side session state at the PDS. The deviation is documented alongside the accepted checkbox. The spec is a living document, not a contract.

Phase 1: substrate live

With the spike validated, I scaffolded the package structure:

packages/memory/ — types, Typesense schemas, NAS path helpers (user-partitioned: <base>/<user_id>/<yyyy-mm>/<run-id>.{jsonl,metadata.json}), format-aware chunker that handles both claude-code and pi jsonl shapes.
POST /api/runs — accepts jsonl + metadata, writes the blob to NAS-equivalent storage, fires a memory/run.captured Inngest event. Returns a 202 HATEOAS envelope with run_id and _links.
memory/run.captured Inngest function — loads jsonl, chunks per-turn, embeds each chunk at ingest-realtime priority through the router, writes chunks to Typesense, writes the Run row, emits OTEL. Retries on idempotent steps.
POST /api/runs/search — hybrid BM25 + vector. Auto-applied privacy filter (readable_by:=<caller>) derived from the bearer token, never from the request body. No way to spoof whose Runs you see.
GET /api/runs/:id, :id/jsonl, :id/descendants, /api/runs/forest — traversal endpoints.
Claude Code Stop hook that delta-captures each turn into ~/.joelclaw/session-state.json.
pi extension on turn_end + session_shutdown with the same delta-capture pattern.

First E2E: a 169KB claude-code fixture → 82 chunks indexed in milliseconds, search returned them in 202ms. The architecture worked end-to-end on real data before I touched auth.

One visible failure along the way: my first deploy of the system-bus-worker crashloop’d in k8s because the Dockerfile’s runtime stage has an explicit COPY list per package. I’d added @joelclaw/memory to the workspace but not to that list. Five restarts before I looked at kubectl logs and saw:

error: ENOENT reading "/app/packages/system-bus/node_modules/@joelclaw/memory"

One-line fix in the Dockerfile. The lesson was exactly the one in ADR-0243’s operational failure modes: runtime assumptions that feel obvious in dev can fail in k8s in a way that looks like “the hook isn’t firing” when it’s actually “the worker binary doesn’t contain the code yet.”

Phase 3: real identity

The dev bearer was a placeholder. Phase 3 replaced it with real AT Proto App Passwords.

joelclaw-machine-register --name <n> --user <u> calls com.atproto.server.createAppPassword on my existing DID session. Returns a plaintext App Password once (never stored on Central in plaintext).
The script hashes it (sha256), upserts a row in the machines_dev Typesense collection, writes the plaintext to ~/.joelclaw/auth.json (0600 permissions).
authenticateMemoryRequest middleware in the Next.js API routes: read bearer, hash, look up machine. Get back (user_id, machine_id, did). No per-request PDS call.

Revocation is a two-step: call PDS revokeAppPassword + mark the Machine row revoked_at. The next POST with that bearer fails cleanly.

Registered my central Mac Mini as the first Machine. POST /api/runs with the new bearer returned 202. Search with the new bearer returned hits. Dev bearer fallback is still in the middleware for graceful transition; in prod you set MEMORY_DEV_BEARER_TOKENS={} and real App Passwords take over entirely.

The laptop joined the Network in twelve minutes

Phase 3 done on the central node, I wanted to prove the second-Machine story. ssh joel@<laptop>:

Install bun via the one-liner (curl -fsSL https://bun.sh/install | bash).
rsync three scripts to ~/.joelclaw/bin/ + symlink into ~/.bun/bin/: capture-session, runs-search, runs-tree.
From the central node: joelclaw-machine-register --name <laptop> --user joel --no-write-auth. Prints a new App Password once. Writes the Machine row to Typesense.
ssh the plaintext into the laptop’s ~/.joelclaw/auth.json (0600).
Add a Stop hook to the laptop’s ~/.claude/settings.json with JOELCLAW_CENTRAL_URL pointing at the central node via Tailscale MagicDNS.

That was it. Twelve minutes, most of it waiting for bun to download.

Seven minutes after that, the laptop was already capturing. The log showed:

[17:00:33Z] captured run_id=e62bb02... session=b7af7966-...
            delta_bytes=4429883 turns=648
[17:09:36Z] captured run_id=7f3805c... session=b7af7966-...
            delta_bytes=35028 turns=12

First fire captured the day’s backlog (4.4MB, 648 turns). Second fire, nine minutes later, a 35KB / 12-turn delta. The incremental byte-offset tracking meant I didn’t re-embed anything I’d already indexed. Tree view now shows both Machines, color-badged so I can see which one produced which Run at a glance.

├─ claude-code @overlook 17m ago  19t #captured
│     "seems like an excessive amount of time..."
│  └─ claude-code @overlook 5m ago   13t #captured
│        "is that how books always were..."
│     └─ claude-code @overlook 1m ago   22t #captured
│           "<task-notification>..."
│        └─ claude-code @overlook 0m ago    5t #captured
│              "this is a critical job of joelclaw..."
│
├─ claude-code @flagg 15m ago 2t  #laptop-hello
│     "hello from laptop via tailnet"
└─ claude-code @flagg 5m ago  18t #captured
      "..."

The system is watching itself

This article is being drafted in a claude-code session on the central node. When I hit submit and this response finishes generating, the Stop hook will fire. It’ll read ~/.joelclaw/session-state.json, see that the last capture ended at byte offset N, read from N to current EOF, and POST the delta to /api/runs.

The Inngest function will chunk the delta, embed each turn through the priority queue at ingest-realtime, write chunks to Typesense, write a Run row with parent_run_id linking to the previous turn of this same session.

Thirty seconds from now, the memory system will contain the article about the memory system. joelclaw-runs-search "memory that watches itself" will surface this text as chunks I can retrieve into any future agent’s context.

That’s not a metaphor. That’s the architecture delivered.

Credits

Alex Hillman for the column structure that started this. The shape was hiding in plain sight until he posted it.
Matt Pocock for the domain-model skill. MIT-licensed Socratic grill that produced more clarity in an hour than I would have gotten from a week of solo design doc writing.
pi’s /tree (from badlogic/pi-mono) for the session-tree navigation pattern I ported to work globally across runtimes.
Every prior memory ADR in joelclaw — ADR-0021, ADR-0077, ADR-0082, ADR-0190, ADR-0195 — that built the curated-notes layer this Run archive sits beneath.

The next agent to read this file — whether it’s me tomorrow or claude-code on the laptop next week — will get it from the memory system itself. Which was the point.