ADR-0243accepted

Runs-Based Memory Capture Architecture

2026-04-19T00:00:00.000Z

Status

accepted — 2026-04-19. Phase 1 build-out began with a priority-lane validation slice (Rule 9a).

Joelclaw is evolving from a personal AI infrastructure into a central service for a distributed network of user machines belonging to Joel and his family (wife, kids). Agents across these machines — pi, claude-code, codex, workload-rig stages, gateway reply flows, loops — produce jsonl transcripts of every invocation, but there is no single archive:

Transcripts are scattered across per-tool directories (~/.claude/projects/, ~/.pi/agent/sessions/) and never make it off the Machine that produced them.
There is no cross-Machine or cross-User semantic search over what agents have done.
Existing memory infrastructure (ADR-0021, 0077, 0082) stores curated, distilled memory notes written by agents as summaries of significant observations. It does not store the raw source material those observations were drawn from.
Without raw Runs, the system has no rebuildable ground truth — if the curated memory is corrupted, embeddings are upgraded, or a schema changes, there is nothing to re-derive from.
Agents on one Machine cannot discover what agents on another Machine learned, even when the User is the same person.

Existing memory lineage (complementary, not replaced)

ADR-0021 — agent memory system foundation (curated notes, Typesense-backed)
ADR-0077 — memory system next phase (Typesense consolidation, observability)
ADR-0082 — Typesense as unified search layer
ADR-0190 — memory yield contract (quality + cost discipline)
ADR-0195 — mandatory memory participation contract (hooks enforce agent participation)

These cover the distillate layer: curated memory notes that agents consciously write. This ADR introduces the source layer beneath them: raw Run capture, turn-level chunks, hybrid search. The two layers compose — a Run captured here may later generate a memory note there; a memory note here may link back to the Run chunks that supported it.

Forcing function

The Mac Studio is being brought online as a dedicated inference node; the spike validates qwen3-embedding:8b on real agent transcripts at 768-dim Matryoshka truncation; the Ollama + Typesense + Inngest stack is already running on Panda. The architectural questions have been grilled to resolution via the domain-model skill (see CONTEXT.md at repo root). Every decision below has been confirmed.

Why the NRC score is `do-now`

Need 5/5: multi-Machine memory is blocked without this; every day without capture is a day of irreversible signal loss from agent Runs.
Readiness 5/5: design is locked in CONTEXT.md; spike in scripts/memory-spike/ proves qwen3-embedding + Typesense hybrid search works end-to-end on real data.
Confidence 4/5: implementation will surface edge cases (capture-hook coverage, throughput under burst, Share Grant consistency window) — none of them structural.
Novelty 4/5: new pillar, not a repeat of prior memory work; extends existing Typesense + Inngest + inference-router patterns.

Decision

Adopt a Runs-based memory capture architecture that ingests every agent invocation across the joelclaw Network into a centrally-hosted, rebuildable hybrid search index. Full specification — 13 terms, 21 architectural rules, complete API surface — lives at ~/Code/joelhooks/joelclaw/CONTEXT.md; this ADR summarizes the binding decisions.

1 — Topology: central service + thin Machines

Rule 1. Ingestion is Central. Machines ship raw jsonl + identity metadata to /api/runs. Chunking, embedding, indexing, denormalization, and re-indexing all happen on the Central worker. Machines never run embedding models, never write to Typesense, never touch NAS directly. This is the “KISS the Machines” rule — non-technical family members’ devices must work with zero crypto concepts and one CLI installed.

Rule 2. Embedding is an interface, not an implementation. The Central worker calls embeddings through @joelclaw/inference-router (extending ADR-0140). Local Ollama today on Panda (localhost:11434); Mac Studio Ollama tomorrow via Tailscale MagicDNS. Caller code unchanged — only a config URL swap.

Rule 5. Design for horizontal migration, not RAM optimization. Panda (64GB) runs everything today. Mac Studio (128GB unified memory) is the upgrade target for RAM-bound services (Typesense). Current ceiling is ~270K Runs hot in RAM on Panda — several years of headroom at realistic family rates. Services must move across Mac-class nodes over Tailscale without a refactor: stable typed HTTP interfaces, persistent state on NAS or PVC, no colocation assumptions.

Rule 7. Ingress is Tailnet-only. /api/runs/* and /api/memory/* are not reachable from the public internet. Public joelclaw.com stays marketing/content; memory endpoints route through Tailscale-bound ingress. Defense in depth beneath the bearer-token layer.

Rule 3. Every Run carries User + Machine identity at capture time. Ownership is not inferred downstream. A Run is one agent invocation — the atomic unit of capture. A single pi -p call, one claude-code turn, one codex call, one loop iteration, one gateway reply generation. Runs form trees via parent_run_id + root_run_id (workload-rig stages, nested agent calls). Conversations are a lightweight conversation_id label linking sibling Runs (e.g. turns of one claude-code session). Not a first-class entity.

Rule 17. Parsed metadata is inline-deterministic; entity extraction is async-LLM. During ingest, the memory/run.captured Inngest function populates: turn_count, user_turn_count, assistant_turn_count, tool_turn_count, token_total, tool_call_count, files_touched (from structured tool calls), skills_invoked (string match against skills/ dir), intent (first 500 chars of first user message), status. A separate memory/run.enrich.requested function fires fire-and-forget: one local pi -p call per Run with a strict JSON schema extracting five entity kinds — people, projects, tools, concepts, resources. Stored as a flat prefix-kinded string[] (e.g. people:Kristina, tools:typesense) on the Run row. Runs become searchable immediately; entities_mentioned populates within minutes. Entity linking (resolving to canonical Contacts/Projects) is a Path 2 enhancement, not v1.

Rule 4. Runs are private by default; sharing is explicit. Queries filter to owner_user_id or a readable_by grant. No Network-wide pool.

Rule 18. Share Grants are their own Typesense collection. POST /api/share-grants { grantee_user_id, scope: "tag:<tag>" | "run:<id>", expires_at? } creates a row and fires memory/share-grant.created → fanout update of readable_by on affected chunks. Revoke fires memory/share-grant.revoked. Nightly Inngest cron expires time-bounded grants. GET /api/share-grants returns grants given + received for the caller.

Tag-primary is the default because workloads and topics cluster by tag far more cleanly than by individual Run id (e.g. “share everything I tag household:travel” is common; “share this single Run” is rare). Per-Run scope remains available as scope: run:<id> and grants access to the Run plus all its descendants in the tree.

4 — Storage: NAS authoritative, Typesense rebuildable

Rule 10. NAS is authoritative; Typesense is rebuildable. Each Run writes <run-id>.jsonl + <run-id>.metadata.json to NAS as the source of truth. Typesense is a derived index. Schema changes, embedding-model upgrades, chunk-strategy shifts, and service migrations are all “re-walk NAS and rebuild the collection” — a safe bulk operation, not a database migration. Typesense corruption or loss is recoverable. This inverts the usual DB+search pattern but matches the “observability data” framing: Runs are append-only trace data, not transactional state.

Rule 11. NAS path convention is user-partitioned. /nas/memory/runs/<user_id>/<yyyy-mm>/<run-id>.{jsonl,metadata.json}. User-first partitioning makes per-User export, deletion, and privacy audits trivial filesystem operations (rm -rf /nas/memory/runs/kristina/ if ever needed).

This builds on ADR-0088 (NAS-backed storage tiering). The Runs archive is a new tenant on the existing NAS infrastructure.

5 — Identity: PDS DIDs + AT Proto App Passwords + bearer wire

Rule 6. Identity is PDS; the wire is a bearer token. Every User has a DID in the joelclaw PDS. Every Machine has an AT Protocol App Password scoped to its User’s DID. Machines present the App Password (as a bearer token in v1) to authenticate Run POSTs.

Rule 20. PDS integration: createAppPassword + bearer + 60s session cache. User creation calls the PDS admin API to mint a did:plc:... + handle. Machine registration calls com.atproto.server.createAppPassword on behalf of the User’s DID; the app password is returned to the CLI once and written to ~/.joelclaw/auth.json (0600). On every POST, Central validates the bearer token via com.atproto.server.createSession (cached 60s), extracts the DID, maps to user_id. Revocation calls com.atproto.server.revokeAppPassword. Full AT Proto signed-request envelopes, dev.joelclaw.run.captured audit records, and federation with external DIDs are reserved upgrades — not v1.

Rule 19. Admin = a DID in the ADMIN_DIDS env var on Central. No separate admin token. /api/admin/* endpoints check caller’s resolved DID against ADMIN_DIDS; non-members get 403. V1 list is Joel’s DID. KISS extends all the way through authorization.

6 — Capture: native hooks + file Outbox

Rule 8. Capture uses native runtime hooks; wrappers are the fallback. Pi extension (extending packages/pi-extensions), claude-code Stop hook in ~/.claude/settings.json, codex hook where supported — each invokes joelclaw capture-stdin which enriches jsonl with identity + lineage and POSTs. Explicit joelclaw capture -- <cmd> only for tools with no hook surface. Machines get one CLI installed and nothing else. Parent linkage propagates via JOELCLAW_PARENT_RUN_ID + JOELCLAW_CONVERSATION_ID env vars — best-effort; orphan Runs are acceptable. Failed POSTs go to the Outbox (~/.joelclaw/outbox/*.jsonl) and are drained by any joelclaw CLI invocation plus a launchd/systemd timer every 5 minutes.

7 — Search: hybrid-by-default with auto-applied privacy filters

Rule 13. Search API shape is D — one hybrid search + convenience traversal endpoints. Primary call is POST /api/runs/search with hybrid-by-default mode, AND-semantics tag filters, and auto-applied user_id + readable_by filters from the bearer token (never from the request body — no way to spoof privacy from client). Traversal endpoints (GET /api/runs/:id, :id/jsonl, :id/descendants) are separate. Mutation endpoints (POST /api/runs/:id/tags) are owner-gated.

Rule 12. Agent-first; humans are a vestigial afterthought. Every API, response shape, error, and pagination choice is optimized for agents consuming them. Stable typed JSON envelopes, machine-readable error codes, idempotency keys on mutating POSTs, cursor-based pagination, rich _links and next_actions, deterministic result ordering. No dashboard, no web UI, no visual manual-operations surface in v1 — humans use the CLI, which is itself an agent-shaped thin wrapper over the same endpoints.

8 — Embeddings: qwen3-embedding:8b, Matryoshka 768-dim

Rule 9. Embeddings: qwen3-embedding:8b via Ollama, Matryoshka-truncated to 768-dim. Chunking is per-turn (40K-token context window makes sub-turn splits rare). Every Chunk carries its Embedding Model Tag (qwen3-embedding-8b@768). Dimension is a query-time/deployment knob, not a data commitment — full 4096-dim can be re-computed at zero marginal cost since the same model produces it. Ingest path calls the model through @joelclaw/inference-router; swap via config.

Rationale: qwen3-embedding:8b scores 70.58 on the MTEB multilingual leaderboard (June 2025) vs nomic-embed-text’s 62.39 — 8-point gap is meaningful for targeted family-scale retrieval. The 40K-token context window handles long agent turns (including claude-code turns with large code blocks) without sub-turn splitting. Matryoshka truncation provides deployment flexibility (storage vs RAM tradeoff) without re-embedding on dimension changes.

Embed concurrency is an Inngest-managed knob with priority lanes (Rule 9a in CONTEXT.md). Ollama serializes embed calls internally, so naive HTTP concurrency doesn’t help — what matters is which caller waits. Every embed routes through Inngest with one of three priorities: query (interactive search — never starved), ingest-realtime (live Run captures — normal), ingest-bulk (reindex, backfill — lowest, drops out when anything else arrives). Implementation: memory/embed.requested event with a priority field; Inngest priority.run expression gates scheduling. This is the remediation for the query-starvation failure mode enumerated in Operational Failure Modes.

Spike validation (2026-04-19) on a 1247-line claude-code session:

708 chunks embedded in 572s (1.2 ch/s sequential, ~2.5 ch/s concurrent-8)
Query latency: ~420ms end-to-end for semantic, ~20ms for keyword, ~250ms hybrid
Retrieval quality on real queries surfaced the actual root-cause chunks (e.g. “why did the cluster fail” → connection-refused tool_results at vec_distance 0.28)

9 — Retention + deletion

Rule 14. Retention is keep-forever. No TTLs, no rolling windows, no auto-expiration. Storage is not the constraint; the value of agent memory compounds across years. Explicit deletion is the privacy lever.

Rule 15. Deletion is owner-only, hard, cascade-by-default, durable via Inngest. DELETE /api/runs/:id fires memory/run.delete.requested → remove Typesense chunks → remove Run row → remove NAS jsonl + metadata. Idempotent at every step; safe to retry. Descendant Runs cascade-delete (root_run_id match). Bulk delete is always filter-scoped and owner-scoped; no wildcard. DR via nightly NAS snapshots. Optional dev.joelclaw.run.deleted PDS record available per-User but off by default in v1.

10 — Re-indexing: three distinct paths

Rule 16. Re-indexing is three distinct paths, each an Inngest function.

Embedding/chunking rebuild — admin-triggered, fans out from NAS (not Typesense), writes to a new collection run_chunks_v2, atomic alias swap on completion, throttled to Ollama throughput, resumable. Preferred over in-place mutation: rollback is an alias swap, failed rebuild never corrupts live data, cost is 2× Typesense disk during the window.
Metadata enrichment — updates Run rows only, no chunk work. Used for async entity extraction and future field additions.
Share-Grant fanout — updates readable_by on affected chunks only. Used on Share Grant create/revoke.

NAS is always the source of truth for “what to reindex.” Delivers Rule 10’s promise as a concrete operational capability, not a hope.

11 — Package home

Rule 21. packages/memory is the new canonical home. Types (Run, Chunk, ShareGrant, User, Machine, RunStatus, AgentRuntime, Role), Typesense collection schemas, NAS path helpers, the per-turn chunker (claude-code + pi format detection), and interface definitions (RunStore, ChunkStore, ShareGrantStore) all live there, mirroring the @joelclaw/telemetry pattern (ADR-0144 hexagonal style). Heavy logic (chunking, embedding, indexing) lives in packages/memory and is consumed by packages/system-bus/src/inngest/functions/memory/*. Route handlers in apps/web/app/api/runs/* and apps/web/app/api/share-grants/* are thin composition roots — they authenticate, enforce Rule 4, delegate to memory functions, and return HATEOAS envelopes. Embeddings lane lives at packages/inference-router/src/embeddings.ts (extending ADR-0140).

Consequences

Positive

Unified archive across the Network. Any agent on any Machine produces a Run that lands in one searchable place. Cross-Machine retrieval for a single User becomes a solved problem in v1.
Rebuildable search index. Embedding model upgrades, chunking strategy changes, schema additions are all “re-walk NAS and rebuild” operations. No painful migrations.
Privacy is enforced at the schema layer, not the query layer. Denormalized readable_by means a malformed query cannot accidentally leak across Users — the index literally doesn’t return rows the caller isn’t authorized for.
Agent-first API shape matches existing joelclaw CLI conventions (HATEOAS, _links, next_actions). Agents consuming it already understand the pattern.
Family-ready. Setup for non-technical Users: install Tailscale, then joelclaw register. Done. No crypto, no key management, no dashboard to learn.
Complements existing memory system (ADR-0021, 0077, 0195). Curated memory notes and raw Runs can cross-reference — a note that cites a Run, a Run that generates a note.

Negative / costs

New infrastructure footprint: Ollama pod, packages/memory, new Typesense collections, new Inngest functions, new CLI commands, new pi + claude-code hooks, new PDS integration code. Real implementation effort.
Capture hook coverage is uneven — claude-code Stop hook is well-documented; pi extensions exist; codex hook surface is less certain. The wrapper fallback (joelclaw capture -- <cmd>) will fire more than we’d like until we close gaps per runtime.
Share Grant fanout has an eventual-consistency window of seconds to minutes between grant creation and chunks becoming visible to the grantee. Must be documented in the CLI and tolerated by agents.
Embedding throughput is the ingest-path bottleneck. On Panda at 1.2 ch/s sequential / ~2.5 ch/s with concurrency, bursty Run production by many agents will back up. Mac Studio migration is the fix.
Family-scale does not stress-test multi-User federation. If the Network grows beyond family, the ADMIN_DIDS env-var authorization model and the bearer-token wire protocol will both need to harden (toward full AT Proto signed requests). That’s a known upgrade path, not a current blocker.

Operational failure modes (enumerated)

Known failure modes and their intended behavior. Each must have an OTEL event and a remediation path documented in the relevant runbook before build completion:

Ollama pod down → memory/run.captured step retries; memory/run.enrich.requested defers; ingest continues buffering jsonl + metadata to NAS while embeddings queue. On restore, Inngest drains the queue. Raw Runs remain searchable by BM25 on text field even while embedding is null for unprocessed chunks. Invariant: no Run is dropped due to Ollama outage.
NAS unmounted on Central → POST /api/runs returns 503 with {"error":"nas_unavailable","retryable":true}. Machine-side Outbox retains the jsonl and retries on its next drain. Invariant: no Run is acknowledged before NAS write succeeds.
Typesense full or unreachable → ingest returns 202 after NAS write; embedding + indexing are async via Inngest. Search returns 503 until Typesense recovers. Rebuild from NAS (Rule 10, Path 1) is the recovery procedure.
PDS unreachable during auth → 60s session cache absorbs short outages; on cache miss + PDS outage, POST /api/runs returns 503 (not 401 — we don’t want Machines to treat a PDS blip as a credential problem and re-register). Machine-side Outbox retries.
App Password leaked from a Machine → Admin revokes via joelclaw machine revoke <id>. Next createSession call fails; Machine falls back to Outbox until re-registered. Other Users and Machines unaffected (Rule 4).
Ollama throughput ceiling exceeded (bursty Run production) → Inngest throttle on memory/run.captured queues chunks; ingest latency increases but nothing is dropped. Dashboard alert when queue depth exceeds 10 minutes of embedding budget. Mac Studio migration is the remediation.
Query embeds starved by bulk embeds → observed during the 2026-04-19 spike: query-time embedding went from ~220 ms idle to 8-10 s while bulk ingest saturated Ollama. Ollama serializes internally, so raw HTTP concurrency is a fake optimization. Fix: every embed call routes through Inngest with one of three priorities — query (interactive, never starved), ingest-realtime (live Run captures, normal), ingest-bulk (reindex/backfill, lowest). @joelclaw/inference-router sets the priority based on caller; Inngest priority.run expression gates scheduling. Background ingest must never steal query latency. Mac Studio migration helps but does not substitute for the priority discipline.
Share Grant fanout lag → grants take seconds to minutes to propagate across chunks. CLI + API docs must state “grants may take up to 2 minutes to take effect.” Grantee search queries during the window return a correct-but-incomplete result set.
Capture hook scrubs env vars mid-subprocess → orphan Runs (no parent_run_id despite being nested). Accepted as a known limit per Rule 8. Ingest still succeeds; tree linkage is best-effort.

Explicitly deferred (v1 non-goals)

Full AT Proto signed-request envelope on every POST
dev.joelclaw.run.captured / dev.joelclaw.run.deleted PDS audit records (schema slot reserved, write path deferred)
Federation with external DIDs (brother’s self-hosted PDS, etc.)
Invite-link self-serve User creation
Entity linking (resolving surface strings to canonical Contacts/Projects)
Archive tier (status=archived — NAS retained, Typesense chunks dropped)
Per-chunk redaction without full Run deletion
User lifecycle transitions (kid reaches 18 → ownership change)
Derived retrieval endpoint POST /api/memory/retrieve (composed context injection across Runs)
Web UI / dashboard
Per-User opt-out of enrichment
Search rerank via full 4096-dim Matryoshka (the 768-dim hybrid is sufficient for v1)

Each has a designed insertion point per CONTEXT.md; none requires structural change.

Implementation Plan

Required skills (load before implementation starts)

inngest-durable-functions — all memory/run/share-grant lifecycle runs through Inngest; must follow step/flow conventions.
inngest-steps — idempotent step patterns for chunking, embedding, indexing, cascade delete.
inngest-events — event naming + contracts (memory/run.captured, memory/run.enrich.requested, memory/run.delete.requested, memory/share-grant.created, memory/share-grant.revoked, memory/reindex.requested).
inngest-flow-control — throttle + concurrency tuning for embedding throughput and reindex walks.
system-bus — repo conventions for adding new functions under packages/system-bus/src/inngest/functions/memory/.
next-best-practices — route handler patterns (auth middleware, streaming responses, cookies).
next-cache-components — response caching where safe (metadata GETs are cacheable, search is not).
nextjs-static-shells — apps/web/app/api/ conventions.
pds — createAppPassword, createSession, revokeAppPassword flows; PDS admin user creation.
k8s — Ollama pod deployment, Tailscale exposure, PVC sizing.
system-architecture — cross-cutting integration with gateway, workload-rig, loops.
adr-skill — ADR lifecycle management, including post-acceptance sync to system_knowledge.

Affected paths

New packages: packages/memory/
Extended packages: packages/inference-router/ (embeddings lane), packages/cli/ (runs/user/machine/admin commands, extended recall), packages/pi-extensions/ (capture extension), packages/system-bus/src/inngest/functions/memory/ (new functions)
Extended apps: apps/web/app/api/runs/, apps/web/app/api/share-grants/, apps/web/app/api/admin/
Deferred apps: apps/web/app/api/memory/retrieve (stub only; body in later ADR)
k8s: k8s/ollama-deployment.yaml, k8s/ollama-service.yaml
Client-side: ~/.claude/settings.json hook entry on register, ~/.joelclaw/auth.json, ~/.joelclaw/outbox/, ~/.joelclaw/memory-spike-ingested.jsonl (spike only; removed after build-out)
Docs: ~/Code/joelhooks/joelclaw/CONTEXT.md (canonical; do not duplicate into this ADR)

Testing discipline for each build step

Every build step below includes three test obligations, not just the implementation:

Unit test in the same package (Rule 21 boundaries) for any pure logic (chunking, embedding interface, NAS path helpers, Share Grant scope evaluation).
Integration test for any Inngest function that writes to Typesense or NAS — real Typesense, real NAS mount, fixtures committed under packages/memory/__tests__/fixtures/.
Privacy enforcement test for every route handler: an explicit unit/integration test that a second User’s bearer token cannot retrieve the first User’s data under any combination of filter spoofing, direct-by-id access, or descendant traversal.

Privacy tests are first-class Rule 4 enforcement and must exist before a route is merged.

Build order (sequenced to compound signal early)

Graduate packages/memory/ from spike quality to production quality — promote from scripts/memory-spike/ patterns. Types, Typesense schemas, NAS path helpers, per-turn chunker (with fixed tool-result role detection for claude-code), interface exports.
Typesense collection bootstrap script — idempotent create for runs, run_chunks, share_grants, users, machines with alias run_chunks_current.
@joelclaw/inference-router embeddings lane (packages/inference-router/src/embeddings.ts) — catalog entry for qwen3-embedding:8b, Ollama provider, Matryoshka dimension parameter, tracing integration.
Ollama k8s pod running qwen3-embedding:8b, exposed to system-bus-worker via Tailscale MagicDNS.
memory/run.captured Inngest function — receives event, chunks jsonl, calls embeddings via router, writes to NAS + Typesense, populates deterministic metadata columns.
apps/web/app/api/runs/route.ts (POST) — auth via bearer → PDS createSession, persist jsonl to NAS, fire memory/run.captured.
apps/web/app/api/runs/search/route.ts — hybrid Typesense query with auto-applied privacy filters.
joelclaw runs search CLI command.
joelclaw user create + joelclaw machine register + PDS admin wiring.
Pi capture extension in packages/pi-extensions/ + claude-code Stop hook installed by joelclaw register.
Gateway integration — packages/gateway/src/channels/* fire server-side captureRun() for replies; extends ADR-0144.
Share Grants endpoints + Path 3 reindex (memory/share-grant.created|revoked).
Delete endpoints + cascade + bulk delete.
memory/run.enrich.requested — async entity extraction via local pi -p with 5-kind schema.
Path 1 reindex (embedding/chunking rebuild) — not critical until first model swap; build late.

Non-goals call-out (prevents scope creep)

Implementation MUST NOT include anything listed under “Explicitly deferred” above. If a gap there begins to bite, open a new narrow ADR referencing this one.

Verification criteria

Visual artifacts

Optional. Defer /generate-web-diagram until build Phase 5 (first route handler) — the architecture will benefit from a visual after the code makes it concrete. Path: docs/decisions/diagrams/0243-runs-capture-architecture.html.

More Information

Canonical design spec: ~/Code/joelhooks/joelclaw/CONTEXT.md (21 rules, 13 terms, full API shape, example dialogue). This ADR is the decision record; CONTEXT.md is the binding spec. On divergence, update CONTEXT.md first, then revisit this ADR.
End-to-end validation spike: ~/Code/joelhooks/joelclaw/scripts/memory-spike/ with findings documented in its README.md.
Ingested data from spike: Typesense collection run_chunks_spike (cleanly deletable after build-out completes).

Domain-model grilling lineage

This ADR is the product of a 12-question Socratic session via the domain-model skill (mattpocock/skills). Every decision above was confirmed individually before being committed to CONTEXT.md. Questions and resolutions:

Where does ingestion run? → Central (Rule 1)
Ownership/isolation model? → Private-by-default with explicit Share Grants (Rule 4, 18)
Run = ? → One agent invocation; tree-shaped; flat conversation_id label; /api/runs/* (Rule 3; Rule 13’s API partitioning)
How does a Machine authenticate? → Tailnet + PDS + App Password bearer (Rules 6, 7, 19, 20)
How is a Run produced? → Native runtime hooks with wrapper fallback + file Outbox (Rule 8)
Chunking + embedding? → Per-turn with sub-turn fallback + qwen3-embedding:8b @ 768-dim Matryoshka (Rule 9)
Source of truth + NAS path? → NAS-authoritative + user-partitioned (Rules 10, 11)
Search API shape? → D: one hybrid search + convenience traversal (Rule 13); agent-first API principle (Rule 12)
Retention + deletion? → Keep-forever + hard-delete cascade (Rules 14, 15)
Re-indexing orchestration? → Three distinct paths, new-collection-swap for embedding rebuilds (Rule 16)
Parsed metadata columns? → Inline deterministic + async entity extraction (5-kind taxonomy) (Rule 17)
Mechanical closeouts (Share Grants, admin, PDS, package structure) → Rules 18, 19, 20, 21

Spike confidence update

Pre-spike design confidence: 8/10. Post-spike confidence: 9/10 — qwen3-embedding:8b quality on real agent Run data is validated with measurements. Remaining 1/10 risk: capture-hook coverage, throughput-under-burst, Share Grant consistency window. All operational, none structural.