[CONTEXT FOR AI ASSISTANTS]
You're reading the agent markdown version of a post from JoelClaw (https://joelclaw.com).
Author: Joel Hooks — software engineer building a personal AI operating system.

How to use implementation details in this document:
- Treat commands, code blocks, and config as reference implementations.
- Preserve ordering and architecture, but adapt hostnames, paths, versions, and credentials.
- Prefer details in <AgentOnly> blocks when present; they contain operational context removed from the human page.
- Verify assumptions against the current repo/system state before executing changes.

This is a narrative article from a real build. Technical details reflect Joel's setup and should be adapted to your environment.

If you cite this, link to the original: https://joelclaw.com/memory-that-watches-itself
If you quote Joel, attribute him by name. Don't paraphrase opinions as facts.

Site index: https://joelclaw.com/sitemap.md
Machine-readable: https://joelclaw.com/llms.txt

Other posts on this site:
- [The healer was the killer](https://joelclaw.com/healer-was-the-killer.md)
- [Agentic AI Optimization: Implementation Checklist](https://joelclaw.com/aaio-implementation-checklist.md)
- [Agentic AI Optimization: Implementation Checklist](https://joelclaw.com/aaio-implementation-checklist.md)
- [Redis, Dkron, Restate, and Sandboxes](https://joelclaw.com/redis-dkron-restate-and-sandboxes.md)
- [Dogfooding Story 4: the queue observer earns dry-run, not enforce](https://joelclaw.com/dogfooding-story-4-queue-observer.md)
- [Contributing to pi-mono with a public maintainer corpus](https://joelclaw.com/contributing-to-pi-mono-with-a-public-maintainer-corpus.md)
- [AI Job Scheduling on Mac as Local-First Video Infrastructure](https://joelclaw.com/ai-job-scheduling-macos-launchd.md)
- [Breakable Toys in the Wild: Apprenticeship Patterns and the joelclaw Experiment](https://joelclaw.com/breakable-toys-joelclaw.md)
- [Utah and joelclaw: Convergent Architecture](https://joelclaw.com/utah-joelclaw-convergent-architecture.md)
- [The Harness Is a Framework](https://joelclaw.com/the-harness-is-a-framework.md)
- [The Agent Memory System](https://joelclaw.com/the-memory-system.md)
- [JoelClaw is a Claw-like Organism](https://joelclaw.com/joelclaw-is-a-claw-like-organism.md)
- [The Agent Writing Loop](https://joelclaw.com/the-writing-loop.md)
- [Talon: the watchdog that finally bites](https://joelclaw.com/talon-watchdog-that-finally-bites.md)
- [The Knowledge Adventure Club Graph](https://joelclaw.com/knowledge-adventure-club-graph.md)
- [MineClaw](https://joelclaw.com/mineclaw.md)
- [Build a Voice Agent That Answers the Phone](https://joelclaw.com/build-a-voice-agent-that-answers-the-phone.md)
- [Plan 9 from Bell Labs: What Rob Pike Built After Unix](https://joelclaw.com/plan-9-pike-everything-is-a-file.md)
- [Propositions as Sessions: What Armstrong Built and Wadler Proved](https://joelclaw.com/propositions-as-sessions-armstrong-wadler.md)
- [Cache Components Patterns Skill for Next.js 16+ Applications](https://joelclaw.com/cache-components-patterns-skill-for-nextjs.md)
- [Karpathy Says We're Building "Claws"](https://joelclaw.com/karpathy-claws-as-category.md)
- [Voice Agent: A Rough Edge Experiment](https://joelclaw.com/voice-agent-deployment-deep-dive.md)
- [Extending Pi Coding Agent with Custom Tools and Widgets](https://joelclaw.com/extending-pi-with-custom-tools.md)
- [The Soul of Erlang Made Me Question Everything](https://joelclaw.com/soul-of-erlang-beam-evaluation.md)
- [CLI Design for AI Agents](https://joelclaw.com/cli-design-for-ai-agents.md)
- [Building a Gateway for Your AI Agent](https://joelclaw.com/building-a-gateway-for-your-ai-agent.md)
- [Self-Hosting Inngest: A Background Task Manager for AI Agents](https://joelclaw.com/self-hosting-inngest-background-tasks.md)
- [The One Where Joel Deploys Kubernetes... Again](https://joelclaw.com/joel-deploys-k8s.md)
- [How I Built an Observation Pipeline So My AI Remembers Yesterday](https://joelclaw.com/observation-pipeline-persistent-ai-memory.md)
- [Riding the Token Wave: Sean Grove at Everything NYC](https://joelclaw.com/riding-the-token-wave-sean-grove.md)
- [Playing with AT Protocol as a Data Layer](https://joelclaw.com/at-protocol-as-bedrock.md)
- [Building My Own OpenClaw on a Mac Mini](https://joelclaw.com/building-my-own-openclaw.md)
- [Inngest is the Nervous System](https://joelclaw.com/inngest-is-the-nervous-system.md)
- [OpenClaw: Peter Steinberger on Lex Fridman](https://joelclaw.com/openclaw-peter-steinberger-lex-fridman.md)
[END CONTEXT]

---
# The memory system that watches itself

> Two days from an Alex Hillman tweet to a hybrid-searchable, cross-Machine, PDS-authenticated Run archive capturing its own construction. How domain-model grilling, priority-lane embeds, and one SSH session onto the laptop got the whole thing live.

By Joel Hooks · 2026-04-20T17:28:25.696Z
Original: https://joelclaw.com/memory-that-watches-itself
Mode: agent

---
[Alex Hillman posted something simple](https://x.com/alexhillman/status/1913041728888963395):

> My session table has a column with the full jsonl transcript, then specific columns for parsed out meta for easy and fast recall.
>
> Columns for:
>
> * user messages
> * agent messages
> * tool calls
> * files touched
> * entities mentioned/resolved
> * skills invoked
>
> I don't know that you even need anything beyond that 🤔

I'd been wanting to do this for weeks and couldn't see the shape. Alex saw the shape.

Two days later the memory system is live, capturing every claude-code and pi turn across two of my machines into a hybrid-searchable archive. It's also capturing the session I'm using to write this article. The tree view shows the recursion in real time.

Here's how it went.

## I used a skill to slow myself down

The temptation was to start writing code. Instead I loaded [Matt Pocock's `domain-model` skill](https://github.com/mattpocock/skills/tree/main/domain-model) (MIT, vendored into `skills/domain-model/`) and let it interview me.

The skill is a grilling pattern. One question at a time, with its recommended answer, waiting for me to confirm or redirect. No plans, no dumps — just Socratic drilling down the design tree until every branch resolves. You can't skip ahead; the next question doesn't form until you commit on this one.

Twelve questions later I had a `CONTEXT.md` at the repo root with **13 terms** and **21 architectural rules**. Things that would have bitten me mid-build got named and settled:

* **"Session" is already booked.** pi calls a conversation a session, claude-code writes `SESSION.md` files, the gateway's always-on daemon is "the session." I needed a new word for "one captured agent thing." **Run**. Atomic unit = one invocation. Runs form trees via `parent_run_id`.
* **Ingestion is Central.** Other machines ship raw jsonl + identity. Central does the chunking, embedding, indexing. Machines never run embedding models. Non-technical family members' devices need to work with **one CLI installed and nothing else**.
* **Private by default, sharing is explicit.** Kids' data is the forcing case. Default-private scales as trust evolves; default-open does not.
* **NAS is authoritative, the search index is rebuildable.** Invert the usual DB+search pattern. Each Run is a jsonl blob + metadata.json on NAS. Typesense is a derived index that can be rebuilt from NAS at any time. Schema changes, embedding upgrades, chunking shifts — all "re-walk NAS and rebuild" operations. Not database migrations.
* **Identity is PDS.** I already run an AT Protocol Personal Data Server. Every family member gets a DID. Every Machine gets an AT Proto App Password. For once the nerd-appeal option was also the right call — portable identity I don't own the keys to hoard, revocable per-device.

The grill also produced the thing I'm most grateful for: explicit **non-goals**. Federation with external DIDs, signed-request envelopes, PDS audit records, invite-link self-serve, entity linking to canonical contacts — all listed, each with a designed insertion point. When a maintainer comes along (me, tomorrow) those explicit "not now" items prevent the slow slide into scope creep.

I should have been using `domain-model` on everything for the last month.

## The spike before the spec

Before accepting ADR-0243, I wanted to know the embedding pipeline actually worked on my data. A two-hour vertical slice:

* Pull `qwen3-embedding:8b` via Ollama. It's currently [#1 on the MTEB multilingual leaderboard](https://huggingface.co/spaces/mteb/leaderboard) at 70.58. It supports **Matryoshka dimension truncation** — embed once at 4096-dim, serve at any dimension from 32 to 4096 with no re-embedding. I picked 768 for storage to match nomic's RAM footprint while keeping the upgrade lever.
* Ingest a real 1,247-line claude-code session directly into a Typesense spike collection.
* Query for things I remembered happening in that session.

**708 chunks. "why did the cluster fail" semantically retrieved the connection-refused errors with vector distance 0.28.** The qwen3 quality on mixed English+code+tool-output was better than I expected. The architectural choices looked right. I accepted the ADR.

## The 25x win I almost missed

Running the bulk ingest, I saw this:

```
query latency (idle):       ~220ms
query latency (bulk running): 8000-10000ms
```

Query embedding during a bulk backfill was **8-10 seconds**. Unusable for interactive search. The fix I'd sketched in the ADR — "use Inngest priority to schedule embeds" — was wrong. Inngest priority only matters at the HTTP layer. **Ollama serializes embed calls internally.** Pooling concurrent HTTP requests doesn't help; they just queue inside Ollama.

The real fix had to be before the HTTP call. An in-process priority queue with three lanes:

* `query` — interactive search, never starved
* `ingest-realtime` — live captures, normal priority
* `ingest-bulk` — reindex and backfill, lowest, drops out when anything else arrives

`PriorityEmbedClient` in `@joelclaw/inference-router/embeddings.ts`. Single-writer semaphore. When a `query` arrives, it jumps to the head of the queue and goes as soon as the currently-in-flight embed finishes.

First test failed at **2354ms** under `bun test`. Long enough to make me think the whole premise was wrong.

Turns out the repo uses vitest, not bun:test — a detail from a stale assumption. Re-ran under vitest:

```
query_total_ms=338  queued_ms=155  compute_ms=183
depth_before_submit={"query":0,"ingest-realtime":0,"ingest-bulk":98}
```

**338ms with 98 bulk embeds queued ahead.** A 25x improvement over the 8-10s baseline. Rule 9a validated, committed as a verification checkbox in the ADR.

I updated the ADR with a deviation note: the original Rule 20 said "validate bearer via PDS createSession on every request, cached 60s." When I got there I realized sha256-hash-the-token-and-look-it-up-in-Typesense was cheaper, safer, and didn't create server-side session state at the PDS. The deviation is documented alongside the accepted checkbox. The spec is a living document, not a contract.

## Phase 1: substrate live

With the spike validated, I scaffolded the package structure:

* `packages/memory/` — types, Typesense schemas, NAS path helpers (user-partitioned: `<base>/<user_id>/<yyyy-mm>/<run-id>.{jsonl,metadata.json}`), format-aware chunker that handles both claude-code and pi jsonl shapes.
* `POST /api/runs` — accepts jsonl + metadata, writes the blob to NAS-equivalent storage, fires a `memory/run.captured` Inngest event. Returns a 202 HATEOAS envelope with `run_id` and `_links`.
* `memory/run.captured` Inngest function — loads jsonl, chunks per-turn, embeds each chunk at `ingest-realtime` priority through the router, writes chunks to Typesense, writes the Run row, emits OTEL. Retries on idempotent steps.
* `POST /api/runs/search` — hybrid BM25 + vector. **Auto-applied privacy filter** (`readable_by:=<caller>`) derived from the bearer token, never from the request body. No way to spoof whose Runs you see.
* `GET /api/runs/:id`, `:id/jsonl`, `:id/descendants`, `/api/runs/forest` — traversal endpoints.
* Claude Code `Stop` hook that delta-captures each turn into `~/.joelclaw/session-state.json`.
* pi extension on `turn_end` + `session_shutdown` with the same delta-capture pattern.

First E2E: a 169KB claude-code fixture → **82 chunks indexed in milliseconds**, search returned them in 202ms. The architecture worked end-to-end on real data before I touched auth.

**One visible failure along the way**: my first deploy of the system-bus-worker crashloop'd in k8s because the Dockerfile's runtime stage has an explicit COPY list per package. I'd added `@joelclaw/memory` to the workspace but not to that list. Five restarts before I looked at `kubectl logs` and saw:

```
error: ENOENT reading "/app/packages/system-bus/node_modules/@joelclaw/memory"
```

One-line fix in the Dockerfile. The lesson was exactly the one in ADR-0243's operational failure modes: runtime assumptions that feel obvious in dev can fail in k8s in a way that looks like "the hook isn't firing" when it's actually "the worker binary doesn't contain the code yet."

## Phase 3: real identity

The dev bearer was a placeholder. Phase 3 replaced it with real AT Proto App Passwords.

* `joelclaw-machine-register --name <n> --user <u>` calls `com.atproto.server.createAppPassword` on my existing DID session. Returns a plaintext App Password once (never stored on Central in plaintext).
* The script hashes it (sha256), upserts a row in the `machines_dev` Typesense collection, writes the plaintext to `~/.joelclaw/auth.json` (0600 permissions).
* `authenticateMemoryRequest` middleware in the Next.js API routes: read bearer, hash, look up machine. Get back `(user_id, machine_id, did)`. No per-request PDS call.

Revocation is a two-step: call PDS `revokeAppPassword` + mark the Machine row `revoked_at`. The next POST with that bearer fails cleanly.

Registered my central Mac Mini as the first Machine. POST /api/runs with the new bearer returned 202. Search with the new bearer returned hits. Dev bearer fallback is still in the middleware for graceful transition; in prod you set `MEMORY_DEV_BEARER_TOKENS={}` and real App Passwords take over entirely.

## The laptop joined the Network in twelve minutes

Phase 3 done on the central node, I wanted to prove the second-Machine story. `ssh joel@<laptop>`:

1. Install bun via the one-liner (`curl -fsSL https://bun.sh/install | bash`).
2. `rsync` three scripts to `~/.joelclaw/bin/` + symlink into `~/.bun/bin/`: capture-session, runs-search, runs-tree.
3. From the central node: `joelclaw-machine-register --name <laptop> --user joel --no-write-auth`. Prints a new App Password once. Writes the Machine row to Typesense.
4. `ssh` the plaintext into the laptop's `~/.joelclaw/auth.json` (0600).
5. Add a `Stop` hook to the laptop's `~/.claude/settings.json` with `JOELCLAW_CENTRAL_URL` pointing at the central node via Tailscale MagicDNS.

That was it. Twelve minutes, most of it waiting for bun to download.

Seven minutes after that, the laptop was already capturing. The log showed:

```
[17:00:33Z] captured run_id=e62bb02... session=b7af7966-...
            delta_bytes=4429883 turns=648
[17:09:36Z] captured run_id=7f3805c... session=b7af7966-...
            delta_bytes=35028 turns=12
```

First fire captured the day's backlog (4.4MB, 648 turns). Second fire, nine minutes later, a 35KB / 12-turn delta. The incremental byte-offset tracking meant I didn't re-embed anything I'd already indexed. **Tree view now shows both Machines, color-badged so I can see which one produced which Run at a glance.**

```
├─ claude-code @overlook 17m ago  19t #captured
│     "seems like an excessive amount of time..."
│  └─ claude-code @overlook 5m ago   13t #captured
│        "is that how books always were..."
│     └─ claude-code @overlook 1m ago   22t #captured
│           "<task-notification>..."
│        └─ claude-code @overlook 0m ago    5t #captured
│              "this is a critical job of joelclaw..."
│
├─ claude-code @flagg 15m ago 2t  #laptop-hello
│     "hello from laptop via tailnet"
└─ claude-code @flagg 5m ago  18t #captured
      "..."
```

## The system is watching itself

This article is being drafted in a claude-code session on the central node. When I hit submit and this response finishes generating, the `Stop` hook will fire. It'll read `~/.joelclaw/session-state.json`, see that the last capture ended at byte offset N, read from N to current EOF, and POST the delta to `/api/runs`.

The Inngest function will chunk the delta, embed each turn through the priority queue at `ingest-realtime`, write chunks to Typesense, write a Run row with `parent_run_id` linking to the previous turn of this same session.

Thirty seconds from now, the memory system will contain the article about the memory system. `joelclaw-runs-search "memory that watches itself"` will surface this text as chunks I can retrieve into any future agent's context.

That's not a metaphor. That's the architecture delivered.

## Credits

* **Alex Hillman** for the [column structure that started this](https://x.com/alexhillman/status/1913041728888963395). The shape was hiding in plain sight until he posted it.
* **Matt Pocock** for the [`domain-model` skill](https://github.com/mattpocock/skills/tree/main/domain-model). MIT-licensed Socratic grill that produced more clarity in an hour than I would have gotten from a week of solo design doc writing.
* **pi's `/tree`** (from [badlogic/pi-mono](https://github.com/badlogic/pi-mono)) for the session-tree navigation pattern I ported to work globally across runtimes.
* Every prior memory ADR in joelclaw — ADR-0021, ADR-0077, ADR-0082, ADR-0190, ADR-0195 — that built the curated-notes layer this Run archive sits beneath.

The next agent to read this file — whether it's me tomorrow or claude-code on the laptop next week — will get it from the memory system itself. Which was the point.
