Private-First PDS with Bento Bridge
Context
ADR-0004 established AT Protocol as the bedrock of joelclaw — every record lives on a PDS, Inngest orchestrates, clients are AppViews. That ADR is accepted but had zero implementation.
Meanwhile, the system has been building the layers around the PDS without calling them that:
- Media pipeline (ADR-0041) produces structured data (descriptions, transcripts, MIME types) — a lexicon waiting to be formalized
- NAS video archive on three-body — content-addressed storage, the future PDS data plane
- slog produces structured log entries — direct mapping to
dev.joelclaw.system.logrecords - Inngest event bus already describes data flows — maps to PDS firehose patterns
- Agent identity (bot app, DID, Telegram bot) — identity layer already exists
The question is how to get from “fully designed, zero code” to “running PDS with real data flowing through it” without solving the hard problems (DNS, TLS, public exposure, federation) upfront.
Decision
Deploy a private-first PDS to the existing k8s cluster, accessible only via Tailscale. Use Bento as a declarative stream processor to bridge PDS ↔ Inngest. Federation comes later.
What “private-first” means
- PDS runs in
joelclawk8s namespace alongside inngest-0, qdrant-0, redis-0 - Reachable only via Tailscale (port-forward or Caddy proxy on tailnet)
- No internet exposure, no public DNS, no TLS cert from a public CA
- DID registered with plc.directory (outbound call works, inbound resolution fails for external parties — that’s fine)
- No crawlers, no relay subscription, no Bluesky network participation
- Custom
dev.joelclaw.*records written/read via XRPC from Inngest functions and CLI
What Bento does
Bento (WarpStream fork of Benthos) is a Go stream processor deployed as a k8s pod. It replaces what would otherwise be a custom Node.js firehose subscriber and XRPC write-back service.
Two stream pipelines:
- PDS → Inngest: WebSocket input subscribing to PDS firehose (
com.atproto.sync.subscribeRepos) → filter fordev.joelclaw.*collections → transform to Inngest event format → HTTP POST to Inngest event API - Inngest → PDS: Redis pub/sub input (existing event bridge) → transform to XRPC
createRecordpayloads → HTTP POST to PDS
Bento runs in streams mode — multiple pipeline configs in a single ConfigMap. Declarative, not application code. Built-in retries, backpressure, reconnection, Prometheus metrics.
Cluster topology after deployment
joelclaw namespace:
inngest-0 — event bus + function orchestration
qdrant-0 — vector search (indexes PDS records + Vault)
redis-0 — cache + pub/sub + ephemeral state
pds-0 — AT Proto personal data server
bento-0 — stream processor (PDS ↔ Inngest bridge)Implementation
Phase 1: PDS (done)
- Generate secrets (JWT, admin password, PLC rotation key) → stored in agent-secrets
- Helm install
nerkho/bluesky-pdsv0.4.2 intojoelclawnamespace - Create Joel’s account — DID
did:plc:7vyfh3gnwfjniddpp5sws4mq, handlejoel.pds.panda.tail7af24.ts.net - Write test records:
dev.joelclaw.system.log,dev.joelclaw.media.processed - Verify XRPC read/write via port-forward on 2583
- Values file at
infra/pds/values.yaml(secrets referenceexistingSecret, not plaintext)
Phase 1.5: Operational tooling (done)
-
joelclaw pdsCLI subcommand tree: health, describe, collections, records, write, delete, session - PDS Effect service (
~/Code/joelhooks/joelclaw-cli/src/pds.ts) — XRPC client with auto-session management - Auto-lease credentials from agent-secrets (no manual env var setup needed)
- Session caching at
~/.joelclaw/pds-session.jsonwith auto-refresh - HATEOAS JSON output on all commands
- PDS skill at
~/.agents/skills/pds/— operational guide, symlinked to pi + claude
Phase 2: Lexicon definitions
- Define
dev.joelclaw.system.loglexicon JSON (mirrors slog schema) - Define
dev.joelclaw.media.processedlexicon JSON (mirrors ADR-0041 output) - Define
dev.joelclaw.memory.observationlexicon JSON (mirrors ADR-0020 shape) - Store lexicon files in
packages/lexicons/orinfra/pds/lexicons/ - Optional: generate TypeScript types from lexicons via
@atproto/lex-cli
Phase 3: Bento bridge
- Helm install
warpstreamlabs/bentointojoelclawnamespace - Stream config: PDS firehose (WebSocket) → filter
dev.joelclaw.*→ Inngest HTTP POST - Stream config: Redis pub/sub
pds:writechannel → XRPCcreateRecord - Auth: PDS session token managed by Bento (refresh on expiry)
- ConfigMap for stream definitions, values file at
infra/bento/values.yaml
Phase 4: Wire existing pipelines
-
media-processInngest function writesdev.joelclaw.media.processedrecord to PDS (via Redis → Bento → XRPC, or direct XRPC call from step) - slog writes
dev.joelclaw.system.logrecord to PDS alongside JSONL - Memory observation pipeline writes
dev.joelclaw.memory.observationto PDS - Qdrant indexes PDS records (via Bento → Inngest → embedding function)
Phase 5: Persistent access
- Add PDS to Caddy config for Tailscale HTTPS (
pds.panda.tail7af24.ts.net→svc/bluesky-pds:3000) - Or: launchd plist for persistent
kubectl port-forward - Add PDS health to
joelclaw-system-checkskill
Future (not this ADR)
- Public DNS + TLS for federation (ADR-0004 Phase 0 step 3+)
- Family PDS instances
- Relay deployment
- iPhone app connecting via XRPC
- NAS-backed storage (replace local-path PVC with NFS/three-body volume)
Consequences
Enables
- Immediate value: system data (logs, media, memory) gets a structured, typed, queryable home beyond JSONL and Redis
- Incremental migration: existing pipelines don’t break — PDS writes are additive alongside current storage
- Schema-first: lexicon definitions formalize what’s been implicit in Inngest event schemas
- Bridge pattern: Bento is swappable — if Jetstream or a custom subscriber is better later, only the ConfigMap changes
- Federation-ready: when public access is added, the PDS already has data and the DID is registered
Risks
- PDS is a new stateful service — adds operational burden (backups, upgrades, storage management)
- Bento is a new dependency — another thing to monitor, though it’s a single Go binary with minimal resource usage
- Token management: Bento needs a valid PDS session token; refresh logic must be reliable
- local-path storage: PVC on Colima is not durable across cluster rebuilds — NAS-backed storage should come before real data accumulates
- PDS version drift: Bluesky ships PDS updates; Helm chart may lag behind
Non-goals
- Bluesky social features (posts, follows, likes) — this is a personal data server, not a social media account
- Multi-account PDS — Joel only for now, family PDS instances are future work
- Custom AppView — no web UI for browsing PDS records yet
- Lexicon publishing —
dev.joelclaw.*schemas are private, not registered withlexicon.community
Affected Paths
infra/pds/values.yaml— PDS Helm values (created)infra/bento/values.yaml— Bento Helm values (Phase 3)infra/bento/streams/— Bento stream config YAML files (Phase 3)packages/lexicons/— Lexicon JSON definitions (Phase 2)packages/system-bus/src/inngest/functions/media-process.ts— add PDS write step (Phase 4)~/Code/joelhooks/slog/— add PDS write alongside JSONL (Phase 4)
Verification
- PDS pod running in
joelclawnamespace - XRPC health check returns
{"version": "0.4.204"} - Custom
dev.joelclaw.*records can be created and read back -
describeReposhows custom collections - Bento pod running, stream pipelines connected
-
media-processfunction output appears as PDS record - slog entry appears as PDS record
- PDS accessible via Tailscale HTTPS (Caddy)
Notes
- PDS default port is 2583 — used for port-forward:
kubectl port-forward -n joelclaw svc/bluesky-pds 2583:3000 - PDS image:
ghcr.io/bluesky-social/pds:0.4.204(100MB) - Bento image:
ghcr.io/warpstreamlabs/bento:1.2.0(~50MB) validationStatus: "unknown"on custom records is expected — PDS is lexicon-agnostic- Helm chart source: Nerkho/helm-charts
- Bento Helm chart: warpstreamlabs/bento-helm-chart
- Credit: dame.is for the “personal lexicon” concept; Groundmist project for “AT Proto as legibility layer for local-first software”
Credits
- Bluesky / AT Protocol team — PDS implementation, lexicon-agnostic record storage
- Nerkho — PDS Helm chart for Kubernetes
- WarpStream Labs — Bento stream processor (Benthos fork)
- dame.is — personal lexicon concept (
is.dame.*records as custom “file formats”) - Groundmist / grjte — AT Proto as legibility layer for local-first software
- Kyle Wilson (kyledev.co) — PDS on Kubernetes walkthrough