Runs-Based Memory Capture Architecture
Status
accepted — 2026-04-19. Phase 1 build-out began with a priority-lane validation slice (Rule 9a).
Context
Problem
Joelclaw is evolving from a personal AI infrastructure into a central service for a distributed network of user machines belonging to Joel and his family (wife, kids). Agents across these machines — pi, claude-code, codex, workload-rig stages, gateway reply flows, loops — produce jsonl transcripts of every invocation, but there is no single archive:
- Transcripts are scattered across per-tool directories (
~/.claude/projects/,~/.pi/agent/sessions/) and never make it off the Machine that produced them. - There is no cross-Machine or cross-User semantic search over what agents have done.
- Existing memory infrastructure (ADR-0021, 0077, 0082) stores curated, distilled memory notes written by agents as summaries of significant observations. It does not store the raw source material those observations were drawn from.
- Without raw Runs, the system has no rebuildable ground truth — if the curated memory is corrupted, embeddings are upgraded, or a schema changes, there is nothing to re-derive from.
- Agents on one Machine cannot discover what agents on another Machine learned, even when the User is the same person.
Existing memory lineage (complementary, not replaced)
- ADR-0021 — agent memory system foundation (curated notes, Typesense-backed)
- ADR-0077 — memory system next phase (Typesense consolidation, observability)
- ADR-0082 — Typesense as unified search layer
- ADR-0190 — memory yield contract (quality + cost discipline)
- ADR-0195 — mandatory memory participation contract (hooks enforce agent participation)
These cover the distillate layer: curated memory notes that agents consciously write. This ADR introduces the source layer beneath them: raw Run capture, turn-level chunks, hybrid search. The two layers compose — a Run captured here may later generate a memory note there; a memory note here may link back to the Run chunks that supported it.
Forcing function
The Mac Studio is being brought online as a dedicated inference node; the spike validates qwen3-embedding:8b on real agent transcripts at 768-dim Matryoshka truncation; the Ollama + Typesense + Inngest stack is already running on Panda. The architectural questions have been grilled to resolution via the domain-model skill (see CONTEXT.md at repo root). Every decision below has been confirmed.
Why the NRC score is do-now
- Need 5/5: multi-Machine memory is blocked without this; every day without capture is a day of irreversible signal loss from agent Runs.
- Readiness 5/5: design is locked in
CONTEXT.md; spike inscripts/memory-spike/proves qwen3-embedding + Typesense hybrid search works end-to-end on real data. - Confidence 4/5: implementation will surface edge cases (capture-hook coverage, throughput under burst, Share Grant consistency window) — none of them structural.
- Novelty 4/5: new pillar, not a repeat of prior memory work; extends existing Typesense + Inngest + inference-router patterns.
Decision
Adopt a Runs-based memory capture architecture that ingests every agent invocation across the joelclaw Network into a centrally-hosted, rebuildable hybrid search index. Full specification — 13 terms, 21 architectural rules, complete API surface — lives at ~/Code/joelhooks/joelclaw/CONTEXT.md; this ADR summarizes the binding decisions.
1 — Topology: central service + thin Machines
Rule 1. Ingestion is Central. Machines ship raw jsonl + identity metadata to /api/runs. Chunking, embedding, indexing, denormalization, and re-indexing all happen on the Central worker. Machines never run embedding models, never write to Typesense, never touch NAS directly. This is the “KISS the Machines” rule — non-technical family members’ devices must work with zero crypto concepts and one CLI installed.
Rule 2. Embedding is an interface, not an implementation. The Central worker calls embeddings through @joelclaw/inference-router (extending ADR-0140). Local Ollama today on Panda (localhost:11434); Mac Studio Ollama tomorrow via Tailscale MagicDNS. Caller code unchanged — only a config URL swap.
Rule 5. Design for horizontal migration, not RAM optimization. Panda (64GB) runs everything today. Mac Studio (128GB unified memory) is the upgrade target for RAM-bound services (Typesense). Current ceiling is ~270K Runs hot in RAM on Panda — several years of headroom at realistic family rates. Services must move across Mac-class nodes over Tailscale without a refactor: stable typed HTTP interfaces, persistent state on NAS or PVC, no colocation assumptions.
Rule 7. Ingress is Tailnet-only. /api/runs/* and /api/memory/* are not reachable from the public internet. Public joelclaw.com stays marketing/content; memory endpoints route through Tailscale-bound ingress. Defense in depth beneath the bearer-token layer.
2 — Data model: Runs (trees), Chunks (turn-level), Share Grants (tag-primary)
Rule 3. Every Run carries User + Machine identity at capture time. Ownership is not inferred downstream. A Run is one agent invocation — the atomic unit of capture. A single pi -p call, one claude-code turn, one codex call, one loop iteration, one gateway reply generation. Runs form trees via parent_run_id + root_run_id (workload-rig stages, nested agent calls). Conversations are a lightweight conversation_id label linking sibling Runs (e.g. turns of one claude-code session). Not a first-class entity.
Rule 17. Parsed metadata is inline-deterministic; entity extraction is async-LLM. During ingest, the memory/run.captured Inngest function populates: turn_count, user_turn_count, assistant_turn_count, tool_turn_count, token_total, tool_call_count, files_touched (from structured tool calls), skills_invoked (string match against skills/ dir), intent (first 500 chars of first user message), status. A separate memory/run.enrich.requested function fires fire-and-forget: one local pi -p call per Run with a strict JSON schema extracting five entity kinds — people, projects, tools, concepts, resources. Stored as a flat prefix-kinded string[] (e.g. people:Kristina, tools:typesense) on the Run row. Runs become searchable immediately; entities_mentioned populates within minutes. Entity linking (resolving to canonical Contacts/Projects) is a Path 2 enhancement, not v1.
3 — Privacy: private by default, tag-primary Share Grants
Rule 4. Runs are private by default; sharing is explicit. Queries filter to owner_user_id or a readable_by grant. No Network-wide pool.
Rule 18. Share Grants are their own Typesense collection. POST /api/share-grants { grantee_user_id, scope: "tag:<tag>" | "run:<id>", expires_at? } creates a row and fires memory/share-grant.created → fanout update of readable_by on affected chunks. Revoke fires memory/share-grant.revoked. Nightly Inngest cron expires time-bounded grants. GET /api/share-grants returns grants given + received for the caller.
Tag-primary is the default because workloads and topics cluster by tag far more cleanly than by individual Run id (e.g. “share everything I tag household:travel” is common; “share this single Run” is rare). Per-Run scope remains available as scope: run:<id> and grants access to the Run plus all its descendants in the tree.
4 — Storage: NAS authoritative, Typesense rebuildable
Rule 10. NAS is authoritative; Typesense is rebuildable. Each Run writes <run-id>.jsonl + <run-id>.metadata.json to NAS as the source of truth. Typesense is a derived index. Schema changes, embedding-model upgrades, chunk-strategy shifts, and service migrations are all “re-walk NAS and rebuild the collection” — a safe bulk operation, not a database migration. Typesense corruption or loss is recoverable. This inverts the usual DB+search pattern but matches the “observability data” framing: Runs are append-only trace data, not transactional state.
Rule 11. NAS path convention is user-partitioned. /nas/memory/runs/<user_id>/<yyyy-mm>/<run-id>.{jsonl,metadata.json}. User-first partitioning makes per-User export, deletion, and privacy audits trivial filesystem operations (rm -rf /nas/memory/runs/kristina/ if ever needed).
This builds on ADR-0088 (NAS-backed storage tiering). The Runs archive is a new tenant on the existing NAS infrastructure.
5 — Identity: PDS DIDs + AT Proto App Passwords + bearer wire
Rule 6. Identity is PDS; the wire is a bearer token. Every User has a DID in the joelclaw PDS. Every Machine has an AT Protocol App Password scoped to its User’s DID. Machines present the App Password (as a bearer token in v1) to authenticate Run POSTs.
Rule 20. PDS integration: createAppPassword + bearer + 60s session cache. User creation calls the PDS admin API to mint a did:plc:... + handle. Machine registration calls com.atproto.server.createAppPassword on behalf of the User’s DID; the app password is returned to the CLI once and written to ~/.joelclaw/auth.json (0600). On every POST, Central validates the bearer token via com.atproto.server.createSession (cached 60s), extracts the DID, maps to user_id. Revocation calls com.atproto.server.revokeAppPassword. Full AT Proto signed-request envelopes, dev.joelclaw.run.captured audit records, and federation with external DIDs are reserved upgrades — not v1.
Rule 19. Admin = a DID in the ADMIN_DIDS env var on Central. No separate admin token. /api/admin/* endpoints check caller’s resolved DID against ADMIN_DIDS; non-members get 403. V1 list is Joel’s DID. KISS extends all the way through authorization.
6 — Capture: native hooks + file Outbox
Rule 8. Capture uses native runtime hooks; wrappers are the fallback. Pi extension (extending packages/pi-extensions), claude-code Stop hook in ~/.claude/settings.json, codex hook where supported — each invokes joelclaw capture-stdin which enriches jsonl with identity + lineage and POSTs. Explicit joelclaw capture -- <cmd> only for tools with no hook surface. Machines get one CLI installed and nothing else. Parent linkage propagates via JOELCLAW_PARENT_RUN_ID + JOELCLAW_CONVERSATION_ID env vars — best-effort; orphan Runs are acceptable. Failed POSTs go to the Outbox (~/.joelclaw/outbox/*.jsonl) and are drained by any joelclaw CLI invocation plus a launchd/systemd timer every 5 minutes.
7 — Search: hybrid-by-default with auto-applied privacy filters
Rule 13. Search API shape is D — one hybrid search + convenience traversal endpoints. Primary call is POST /api/runs/search with hybrid-by-default mode, AND-semantics tag filters, and auto-applied user_id + readable_by filters from the bearer token (never from the request body — no way to spoof privacy from client). Traversal endpoints (GET /api/runs/:id, :id/jsonl, :id/descendants) are separate. Mutation endpoints (POST /api/runs/:id/tags) are owner-gated.
Rule 12. Agent-first; humans are a vestigial afterthought. Every API, response shape, error, and pagination choice is optimized for agents consuming them. Stable typed JSON envelopes, machine-readable error codes, idempotency keys on mutating POSTs, cursor-based pagination, rich _links and next_actions, deterministic result ordering. No dashboard, no web UI, no visual manual-operations surface in v1 — humans use the CLI, which is itself an agent-shaped thin wrapper over the same endpoints.
8 — Embeddings: qwen3-embedding:8b, Matryoshka 768-dim
Rule 9. Embeddings: qwen3-embedding:8b via Ollama, Matryoshka-truncated to 768-dim. Chunking is per-turn (40K-token context window makes sub-turn splits rare). Every Chunk carries its Embedding Model Tag (qwen3-embedding-8b@768). Dimension is a query-time/deployment knob, not a data commitment — full 4096-dim can be re-computed at zero marginal cost since the same model produces it. Ingest path calls the model through @joelclaw/inference-router; swap via config.
Rationale: qwen3-embedding:8b scores 70.58 on the MTEB multilingual leaderboard (June 2025) vs nomic-embed-text’s 62.39 — 8-point gap is meaningful for targeted family-scale retrieval. The 40K-token context window handles long agent turns (including claude-code turns with large code blocks) without sub-turn splitting. Matryoshka truncation provides deployment flexibility (storage vs RAM tradeoff) without re-embedding on dimension changes.
Embed concurrency is an Inngest-managed knob with priority lanes (Rule 9a in CONTEXT.md). Ollama serializes embed calls internally, so naive HTTP concurrency doesn’t help — what matters is which caller waits. Every embed routes through Inngest with one of three priorities: query (interactive search — never starved), ingest-realtime (live Run captures — normal), ingest-bulk (reindex, backfill — lowest, drops out when anything else arrives). Implementation: memory/embed.requested event with a priority field; Inngest priority.run expression gates scheduling. This is the remediation for the query-starvation failure mode enumerated in Operational Failure Modes.
Spike validation (2026-04-19) on a 1247-line claude-code session:
- 708 chunks embedded in 572s (1.2 ch/s sequential, ~2.5 ch/s concurrent-8)
- Query latency: ~420ms end-to-end for semantic, ~20ms for keyword, ~250ms hybrid
- Retrieval quality on real queries surfaced the actual root-cause chunks (e.g. “why did the cluster fail” → connection-refused tool_results at vec_distance 0.28)
9 — Retention + deletion
Rule 14. Retention is keep-forever. No TTLs, no rolling windows, no auto-expiration. Storage is not the constraint; the value of agent memory compounds across years. Explicit deletion is the privacy lever.
Rule 15. Deletion is owner-only, hard, cascade-by-default, durable via Inngest. DELETE /api/runs/:id fires memory/run.delete.requested → remove Typesense chunks → remove Run row → remove NAS jsonl + metadata. Idempotent at every step; safe to retry. Descendant Runs cascade-delete (root_run_id match). Bulk delete is always filter-scoped and owner-scoped; no wildcard. DR via nightly NAS snapshots. Optional dev.joelclaw.run.deleted PDS record available per-User but off by default in v1.
10 — Re-indexing: three distinct paths
Rule 16. Re-indexing is three distinct paths, each an Inngest function.
- Embedding/chunking rebuild — admin-triggered, fans out from NAS (not Typesense), writes to a new collection
run_chunks_v2, atomic alias swap on completion, throttled to Ollama throughput, resumable. Preferred over in-place mutation: rollback is an alias swap, failed rebuild never corrupts live data, cost is 2× Typesense disk during the window. - Metadata enrichment — updates Run rows only, no chunk work. Used for async entity extraction and future field additions.
- Share-Grant fanout — updates
readable_byon affected chunks only. Used on Share Grant create/revoke.
NAS is always the source of truth for “what to reindex.” Delivers Rule 10’s promise as a concrete operational capability, not a hope.
11 — Package home
Rule 21. packages/memory is the new canonical home. Types (Run, Chunk, ShareGrant, User, Machine, RunStatus, AgentRuntime, Role), Typesense collection schemas, NAS path helpers, the per-turn chunker (claude-code + pi format detection), and interface definitions (RunStore, ChunkStore, ShareGrantStore) all live there, mirroring the @joelclaw/telemetry pattern (ADR-0144 hexagonal style). Heavy logic (chunking, embedding, indexing) lives in packages/memory and is consumed by packages/system-bus/src/inngest/functions/memory/*. Route handlers in apps/web/app/api/runs/* and apps/web/app/api/share-grants/* are thin composition roots — they authenticate, enforce Rule 4, delegate to memory functions, and return HATEOAS envelopes. Embeddings lane lives at packages/inference-router/src/embeddings.ts (extending ADR-0140).
Consequences
Positive
- Unified archive across the Network. Any agent on any Machine produces a Run that lands in one searchable place. Cross-Machine retrieval for a single User becomes a solved problem in v1.
- Rebuildable search index. Embedding model upgrades, chunking strategy changes, schema additions are all “re-walk NAS and rebuild” operations. No painful migrations.
- Privacy is enforced at the schema layer, not the query layer. Denormalized
readable_bymeans a malformed query cannot accidentally leak across Users — the index literally doesn’t return rows the caller isn’t authorized for. - Agent-first API shape matches existing joelclaw CLI conventions (HATEOAS,
_links,next_actions). Agents consuming it already understand the pattern. - Family-ready. Setup for non-technical Users: install Tailscale, then
joelclaw register. Done. No crypto, no key management, no dashboard to learn. - Complements existing memory system (ADR-0021, 0077, 0195). Curated memory notes and raw Runs can cross-reference — a note that cites a Run, a Run that generates a note.
Negative / costs
- New infrastructure footprint: Ollama pod,
packages/memory, new Typesense collections, new Inngest functions, new CLI commands, new pi + claude-code hooks, new PDS integration code. Real implementation effort. - Capture hook coverage is uneven — claude-code
Stophook is well-documented; pi extensions exist; codex hook surface is less certain. The wrapper fallback (joelclaw capture -- <cmd>) will fire more than we’d like until we close gaps per runtime. - Share Grant fanout has an eventual-consistency window of seconds to minutes between grant creation and chunks becoming visible to the grantee. Must be documented in the CLI and tolerated by agents.
- Embedding throughput is the ingest-path bottleneck. On Panda at 1.2 ch/s sequential / ~2.5 ch/s with concurrency, bursty Run production by many agents will back up. Mac Studio migration is the fix.
- Family-scale does not stress-test multi-User federation. If the Network grows beyond family, the
ADMIN_DIDSenv-var authorization model and the bearer-token wire protocol will both need to harden (toward full AT Proto signed requests). That’s a known upgrade path, not a current blocker.
Operational failure modes (enumerated)
Known failure modes and their intended behavior. Each must have an OTEL event and a remediation path documented in the relevant runbook before build completion:
- Ollama pod down →
memory/run.capturedstep retries;memory/run.enrich.requesteddefers; ingest continues buffering jsonl + metadata to NAS while embeddings queue. On restore, Inngest drains the queue. Raw Runs remain searchable by BM25 ontextfield even whileembeddingis null for unprocessed chunks. Invariant: no Run is dropped due to Ollama outage. - NAS unmounted on Central →
POST /api/runsreturns 503 with{"error":"nas_unavailable","retryable":true}. Machine-side Outbox retains the jsonl and retries on its next drain. Invariant: no Run is acknowledged before NAS write succeeds. - Typesense full or unreachable → ingest returns 202 after NAS write; embedding + indexing are async via Inngest. Search returns 503 until Typesense recovers. Rebuild from NAS (Rule 10, Path 1) is the recovery procedure.
- PDS unreachable during auth → 60s session cache absorbs short outages; on cache miss + PDS outage,
POST /api/runsreturns 503 (not 401 — we don’t want Machines to treat a PDS blip as a credential problem and re-register). Machine-side Outbox retries. - App Password leaked from a Machine → Admin revokes via
joelclaw machine revoke <id>. NextcreateSessioncall fails; Machine falls back to Outbox until re-registered. Other Users and Machines unaffected (Rule 4). - Ollama throughput ceiling exceeded (bursty Run production) → Inngest throttle on
memory/run.capturedqueues chunks; ingest latency increases but nothing is dropped. Dashboard alert when queue depth exceeds 10 minutes of embedding budget. Mac Studio migration is the remediation. - Query embeds starved by bulk embeds → observed during the 2026-04-19 spike: query-time embedding went from ~220 ms idle to 8-10 s while bulk ingest saturated Ollama. Ollama serializes internally, so raw HTTP concurrency is a fake optimization. Fix: every embed call routes through Inngest with one of three priorities —
query(interactive, never starved),ingest-realtime(live Run captures, normal),ingest-bulk(reindex/backfill, lowest).@joelclaw/inference-routersets the priority based on caller; Inngestpriority.runexpression gates scheduling. Background ingest must never steal query latency. Mac Studio migration helps but does not substitute for the priority discipline. - Share Grant fanout lag → grants take seconds to minutes to propagate across chunks. CLI + API docs must state “grants may take up to 2 minutes to take effect.” Grantee search queries during the window return a correct-but-incomplete result set.
- Capture hook scrubs env vars mid-subprocess → orphan Runs (no
parent_run_iddespite being nested). Accepted as a known limit per Rule 8. Ingest still succeeds; tree linkage is best-effort.
Explicitly deferred (v1 non-goals)
- Full AT Proto signed-request envelope on every POST
dev.joelclaw.run.captured/dev.joelclaw.run.deletedPDS audit records (schema slot reserved, write path deferred)- Federation with external DIDs (brother’s self-hosted PDS, etc.)
- Invite-link self-serve User creation
- Entity linking (resolving surface strings to canonical Contacts/Projects)
- Archive tier (
status=archived— NAS retained, Typesense chunks dropped) - Per-chunk redaction without full Run deletion
- User lifecycle transitions (kid reaches 18 → ownership change)
- Derived retrieval endpoint
POST /api/memory/retrieve(composed context injection across Runs) - Web UI / dashboard
- Per-User opt-out of enrichment
- Search rerank via full 4096-dim Matryoshka (the 768-dim hybrid is sufficient for v1)
Each has a designed insertion point per CONTEXT.md; none requires structural change.
Implementation Plan
Required skills (load before implementation starts)
inngest-durable-functions— all memory/run/share-grant lifecycle runs through Inngest; must follow step/flow conventions.inngest-steps— idempotent step patterns for chunking, embedding, indexing, cascade delete.inngest-events— event naming + contracts (memory/run.captured,memory/run.enrich.requested,memory/run.delete.requested,memory/share-grant.created,memory/share-grant.revoked,memory/reindex.requested).inngest-flow-control— throttle + concurrency tuning for embedding throughput and reindex walks.system-bus— repo conventions for adding new functions underpackages/system-bus/src/inngest/functions/memory/.next-best-practices— route handler patterns (auth middleware, streaming responses, cookies).next-cache-components— response caching where safe (metadata GETs are cacheable, search is not).nextjs-static-shells—apps/web/app/api/conventions.pds—createAppPassword,createSession,revokeAppPasswordflows; PDS admin user creation.k8s— Ollama pod deployment, Tailscale exposure, PVC sizing.system-architecture— cross-cutting integration with gateway, workload-rig, loops.adr-skill— ADR lifecycle management, including post-acceptance sync tosystem_knowledge.
Affected paths
- New packages:
packages/memory/ - Extended packages:
packages/inference-router/(embeddings lane),packages/cli/(runs/user/machine/admin commands, extended recall),packages/pi-extensions/(capture extension),packages/system-bus/src/inngest/functions/memory/(new functions) - Extended apps:
apps/web/app/api/runs/,apps/web/app/api/share-grants/,apps/web/app/api/admin/ - Deferred apps:
apps/web/app/api/memory/retrieve(stub only; body in later ADR) - k8s:
k8s/ollama-deployment.yaml,k8s/ollama-service.yaml - Client-side:
~/.claude/settings.jsonhook entry on register,~/.joelclaw/auth.json,~/.joelclaw/outbox/,~/.joelclaw/memory-spike-ingested.jsonl(spike only; removed after build-out) - Docs:
~/Code/joelhooks/joelclaw/CONTEXT.md(canonical; do not duplicate into this ADR)
Testing discipline for each build step
Every build step below includes three test obligations, not just the implementation:
- Unit test in the same package (Rule 21 boundaries) for any pure logic (chunking, embedding interface, NAS path helpers, Share Grant scope evaluation).
- Integration test for any Inngest function that writes to Typesense or NAS — real Typesense, real NAS mount, fixtures committed under
packages/memory/__tests__/fixtures/. - Privacy enforcement test for every route handler: an explicit unit/integration test that a second User’s bearer token cannot retrieve the first User’s data under any combination of filter spoofing, direct-by-id access, or descendant traversal.
Privacy tests are first-class Rule 4 enforcement and must exist before a route is merged.
Build order (sequenced to compound signal early)
- Graduate
packages/memory/from spike quality to production quality — promote fromscripts/memory-spike/patterns. Types, Typesense schemas, NAS path helpers, per-turn chunker (with fixed tool-result role detection for claude-code), interface exports. - Typesense collection bootstrap script — idempotent create for
runs,run_chunks,share_grants,users,machineswith aliasrun_chunks_current. @joelclaw/inference-routerembeddings lane (packages/inference-router/src/embeddings.ts) — catalog entry forqwen3-embedding:8b, Ollama provider, Matryoshka dimension parameter, tracing integration.- Ollama k8s pod running
qwen3-embedding:8b, exposed tosystem-bus-workervia Tailscale MagicDNS. memory/run.capturedInngest function — receives event, chunks jsonl, calls embeddings via router, writes to NAS + Typesense, populates deterministic metadata columns.apps/web/app/api/runs/route.ts(POST) — auth via bearer → PDS createSession, persist jsonl to NAS, firememory/run.captured.apps/web/app/api/runs/search/route.ts— hybrid Typesense query with auto-applied privacy filters.joelclaw runs searchCLI command.joelclaw user create+joelclaw machine register+ PDS admin wiring.- Pi capture extension in
packages/pi-extensions/+ claude-codeStophook installed byjoelclaw register. - Gateway integration —
packages/gateway/src/channels/*fire server-sidecaptureRun()for replies; extends ADR-0144. - Share Grants endpoints + Path 3 reindex (
memory/share-grant.created|revoked). - Delete endpoints + cascade + bulk delete.
memory/run.enrich.requested— async entity extraction via localpi -pwith 5-kind schema.- Path 1 reindex (embedding/chunking rebuild) — not critical until first model swap; build late.
Non-goals call-out (prevents scope creep)
Implementation MUST NOT include anything listed under “Explicitly deferred” above. If a gap there begins to bite, open a new narrow ADR referencing this one.
Verification criteria
-
packages/memory/typechecks viabunx tsc --noEmit -p packages/memory/tsconfig.json; biome-clean. (2026-04-19) - Typesense collections
run_chunks_dev,runs_devexist with schemas inpackages/memory/src/schemas/.share_grants,users,machinesdeferred to their respective phases. (2026-04-19) -
@joelclaw/inference-routerexports anembeddingslane with priority queue; tests confirm Matryoshka truncation to 768-dim round-trip. Full catalog integration deferred (current impl bypassesMODEL_CATALOGfor directness). (2026-04-19) - Ollama pod is running in the joelclaw namespace with qwen3-embedding:8b loaded;
joelclaw statusshows it healthy. Current state: Ollama on Panda localhost, not yet in k8s. Deferred pending Mac Studio migration per Rule 5. -
joelclaw-machine-register --name <n> --user <u>issues a real PDS App Password (viacom.atproto.server.createAppPassword), upsertsmachines_devrow with the sha256 of the plaintext, writes~/.joelclaw/auth.json(0600). Shipped 2026-04-20: Panda registered (did:plc:5w6ably…). Auth middleware atapps/web/lib/memory-auth.tsdoes hash lookup per request (NOT PDS createSession — see deviation note below). Multi-userjoelclaw user createdeferred to Phase 3.5. Deviation from Rule 20: hash-based lookup instead of per-request PDS validation — cheaper + safer + no PDS-side session state. (2026-04-20) -
POST /api/runswith a valid bearer token accepts a jsonl payload, writes to NAS at the user-partitioned path, firesmemory/run.captured, and returns a HATEOAS envelope withrun_idand_links. (2026-04-19) -
POST /api/runs/searchreturns hybrid results withreadable_byfilter enforced server-side. Privacy-spoofing integration test deferred to multi-user phase. (2026-04-19) -
DELETE /api/runs/:idcascade-deletes descendants from Typesense and removes NAS blobs; re-running the same DELETE is idempotent. Phase 5. -
memory/run.enrich.requestedpopulatesentities_mentionedwithin a minute of ingest with the 5-kind prefix taxonomy. Phase 6. - Share Grant creation updates
readable_byon affected chunks within ~1 minute (window documented in CLI help). Phase 5. - Path 1 reindex smoke test: change an unused config (e.g. tag), trigger
memory/reindex.requestedwith a narrow filter, observe alias swap torun_chunks_v2, confirm old + new Runs searchable throughout. Build late; not needed until first model swap. - Capture hook installed captures a real claude-code turn end-to-end into the live index, searchable within ~30 seconds of turn completion. Ambient capture via
~/.claude/settings.jsonStop hook →joelclaw-capture-sessionwith incremental delta + per-session byte-offset state. Pi equivalent via@joelclaw/pi-extensions/memory-captureextension onturn_end+session_shutdown.joelclaw registerCLI command itself is still Phase 3. (2026-04-19) -
joelclaw recall(from ADR-0195 era) now includes Run archive as a source; fan-out works without regression. Phase 4 continuation. - OTEL events emitted for:
memory.run.captured,memory.run.enriched,memory.run.deleted,memory.share_grant.created,memory.reindex.completed. Partially done —memory.embed.completedemitted; others pending their respective functions. - Embed priority lanes work under contention: with a deliberately-saturated
ingest-bulkworkload running, aquery-priority embed returns within p99 < 1 s. Measured 338ms with 98 bulk embeds queued (2026-04-19). Test atpackages/inference-router/__tests__/embeddings-priority.test.ts. - System-knowledge sync fires on ADR acceptance:
joelclaw send system/adr.sync.requested -d '{"source":"adr-skill"}'. Note: at proposal time (2026-04-19) colima was down and Inngest unreachable, so the sync has been queued for execution when the cluster is restored. Record the sync timestamp here when it runs. - Spike collection
run_chunks_spikedropped from Typesense after productionrun_chunks_currentalias is serving live traffic:curl -X DELETE -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" http://localhost:8108/collections/run_chunks_spike.
Visual artifacts
Optional. Defer /generate-web-diagram until build Phase 5 (first route handler) — the architecture will benefit from a visual after the code makes it concrete. Path: docs/decisions/diagrams/0243-runs-capture-architecture.html.
More Information
- Canonical design spec:
~/Code/joelhooks/joelclaw/CONTEXT.md(21 rules, 13 terms, full API shape, example dialogue). This ADR is the decision record; CONTEXT.md is the binding spec. On divergence, update CONTEXT.md first, then revisit this ADR. - End-to-end validation spike:
~/Code/joelhooks/joelclaw/scripts/memory-spike/with findings documented in itsREADME.md. - Ingested data from spike: Typesense collection
run_chunks_spike(cleanly deletable after build-out completes).
Domain-model grilling lineage
This ADR is the product of a 12-question Socratic session via the domain-model skill (mattpocock/skills). Every decision above was confirmed individually before being committed to CONTEXT.md. Questions and resolutions:
- Where does ingestion run? → Central (Rule 1)
- Ownership/isolation model? → Private-by-default with explicit Share Grants (Rule 4, 18)
- Run = ? → One agent invocation; tree-shaped; flat conversation_id label;
/api/runs/*(Rule 3; Rule 13’s API partitioning) - How does a Machine authenticate? → Tailnet + PDS + App Password bearer (Rules 6, 7, 19, 20)
- How is a Run produced? → Native runtime hooks with wrapper fallback + file Outbox (Rule 8)
- Chunking + embedding? → Per-turn with sub-turn fallback + qwen3-embedding:8b @ 768-dim Matryoshka (Rule 9)
- Source of truth + NAS path? → NAS-authoritative + user-partitioned (Rules 10, 11)
- Search API shape? → D: one hybrid search + convenience traversal (Rule 13); agent-first API principle (Rule 12)
- Retention + deletion? → Keep-forever + hard-delete cascade (Rules 14, 15)
- Re-indexing orchestration? → Three distinct paths, new-collection-swap for embedding rebuilds (Rule 16)
- Parsed metadata columns? → Inline deterministic + async entity extraction (5-kind taxonomy) (Rule 17)
- Mechanical closeouts (Share Grants, admin, PDS, package structure) → Rules 18, 19, 20, 21
Spike confidence update
Pre-spike design confidence: 8/10. Post-spike confidence: 9/10 — qwen3-embedding:8b quality on real agent Run data is validated with measurements. Remaining 1/10 risk: capture-hook coverage, throughput-under-burst, Share Grant consistency window. All operational, none structural.