ADR-0114superseded

Elixir/BEAM/Jido Migration — Full Architecture Evaluation

Context

joelclaw is a personal AI operating system built in TypeScript. After 100+ ADRs and ~40k LOC of system-bus functions, a recurring theme has emerged: a significant fraction of the infrastructure exists to compensate for problems the BEAM VM solves natively.

Specific pain points that BEAM eliminates by design:

Current PainRoot CauseBEAM Native Solution
launchd daemons (5+) for process managementNode.js is one-process-per-serviceOTP supervisors — unlimited lightweight processes in one VM
Worker clone sync workflow (push → fetch → reset → kickstart → PUT sync)Can’t hot-reload a running Bun processHot code reload — deploy without restart
Redis pub/sub dual-client hackioredis subscription blocks the clientProcess mailboxes — every process has built-in messaging
Gateway session file persistence (~/.joelclaw/gateway.session)pi sessions are stateful and fragileGenServer state — survives crashes via supervisor restart
Inngest as external orchestrator (k8s pod, function registry sync, stale registration)No built-in durable execution in Node.jsOTP — Task, GenServer, Supervisor provide durable execution natively
Priority queue in Redis sorted sets (ADR-0104)No native priority mailboxreceive with pattern matching on priority tuples
Concurrency guards via Redis keys + TTLsNo process-level isolationProcess-per-resource — each has isolated state, no shared memory
Notification dedup via Redis cooldownsShared-nothing requires external coordinationProcess state — cooldown timers live in the process, no Redis round-trip
Tripwire/watchdog for gateway healthProcess crashes are silent and unrecoverableSupervisors with restart strategies — crash recovery is the default
Extension reload requires session kill + daemon restartpi loads extensions once at session startHot code reload — update modules in the running VM

The Jido framework provides an Elixir-native agent architecture with:

  • Agents as 25KB processes (GenServer-backed, supervisor-managed)
  • Signal-based communication (CloudEvents-compliant, type-routed)
  • Directive system (Emit, Spawn, Schedule — pure functional core, effectful runtime)
  • Plugin architecture (composable capability bundles with state, actions, routes)
  • Built-in observability (telemetry events, OTEL tracer behavior, debug mode)
  • FSM strategies (state machine workflows — maps to gateway state management)
  • Worker pools (pre-warmed agent pools — maps to webhook/event processing)
  • Memory and identity plugins (built-in, maps to current memory system)

Current System Inventory

What exists today

ComponentTechLOCFilesComplexity
Event bus functionsInngest + TypeScript~34k63High — durable steps, retries, cron, fan-out
Gateway daemonpi session + launchd~3k8High — Telegram, Redis bridge, session mgmt
CLIEffect-TS + Bun~8k25Medium — 30+ commands, HATEOAS output
Web appNext.js + Convex~5k40Medium — MDX, auth, real-time dashboard
Memory systemTypesense + Redis~2k6Medium — observe, reflect, promote, triage
Webhook serverExpress + providers~1.5k10Low — signature verify, dispatch
Cache layer (ADR-0112)Redis + file~3501Low — just shipped
Infrastructure gluelaunchd, k8s, bash~50015High — fragile, stateful, manual

External dependencies that complicate migration

DependencyCurrent RoleMigration Difficulty
InngestEvent orchestration, durable execution, cron, retryHard — 74 functions, deeply integrated step patterns
ConvexReal-time DB for web app (schema, queries, auth)Hard — tight coupling with Next.js frontend
piAgent coding harness (tool use, sessions, extensions)Very Hard — no Elixir equivalent exists
VercelCDN, ISR, edge functions for Next.jsMedium — Phoenix deployment is different
Telegram Bot APIPrimary human interfaceEasy — HTTP API, language-agnostic
Front APIEmail integrationEasy — HTTP API
Todoist APITask managementEasy — HTTP API
TypesenseSearch + vector storeEasy — HTTP API
RedisCache, pub/sub, statePartially eliminated — keep for cache, drop pub/sub

What BEAM/Jido Replaces Naturally

Tier 1: Direct replacements (high confidence, clear win)

Inngest → OTP Supervision Tree

  • Each of 74 functions becomes a GenServer or Task under a supervisor
  • Cron schedules via Quantum (mature, production-proven)
  • Retries via supervisor restart strategies + exponential backoff in process
  • Step functions → sequential with chains or Jido Strategy FSM states
  • Fan-out → Task.Supervisor.async_stream_nolink/3
  • Concurrency limits → process-per-resource with GenServer call serialization
  • Eliminates: Inngest k8s pod, function registry sync, PUT sync, stale registration bugs
  • Risk: Lose Inngest dashboard (run traces, step visualization). Must build equivalent.

Redis pub/sub → Phoenix.PubSub / process messaging

  • Gateway event bridge becomes PubSub topic subscription
  • No more dual-client ioredis hack
  • No more LPUSH/LRANGE drain pattern
  • Eliminates: Redis subscription client, gateway drain logic

launchd daemons → OTP Application

  • system-bus-worker, gateway, gateway-tripwire, content-sync-watcher, vault-log-sync, typesense-portforward — all become child processes in one supervision tree
  • Single mix release produces one deployable artifact
  • Eliminates: 5+ launchd plists, PID management, session files, kickstart commands

Priority queue (ADR-0104) → process mailbox

def handle_info({:message, priority, payload}, state) do
  state = %{state | queue: PriorityQueue.push(state.queue, priority, payload)}
  {:noreply, drain(state)}
end
  • Starvation prevention, dedup, coalescing — all in-process state
  • Eliminates: Redis sorted set, SHA-256 dedup window, aging promotion logic

Webhook server → Phoenix/Plug

  • Plug pipeline for signature verification
  • Pattern-match dispatch to handlers
  • Each provider becomes a Plug module
  • Eliminates: Express server, provider registry

Tier 2: Good fit but requires design work

Gateway agent → Jido Agent with Telegram plugin

  • Gateway becomes a long-lived Jido Agent process
  • Telegram channel as a Jido Signal adapter (inbound) + Action (outbound)
  • MCQ flows as FSM Strategy states
  • Session state in GenServer, not Redis + files
  • Risk: No pi integration. How does the agent write code?

Memory system → Jido Memory plugin + Typesense

  • Observe/reflect/promote pipeline maps to Jido Agent with memory plugin
  • Typesense stays as search backend (HTTP API, language-agnostic)
  • Proposal triage becomes a Jido Strategy FSM
  • Risk: Current memory pipeline is deeply intertwined with Inngest step patterns

Cache layer (ADR-0112) → ETS / Cachex

  • Cachex provides TTL, warm/cold tiers, stats
  • ETS for hot cache (faster than Redis for local reads)
  • File cache stays for warm tier
  • Eliminates: Redis cache keys, cache.ts module

Tier 3: Difficult / unclear benefit

Web app (Next.js → Phoenix LiveView)

  • Full rewrite of apps/web (~5k LOC, 40 files)
  • MDX pipeline → Earmark + custom transformers (thinner plugin ecosystem)
  • Convex real-time → LiveView sockets (comparable but different DX)
  • Vercel CDN/ISR → self-hosted Phoenix or Fly.io
  • Auth (Better Auth) → Phoenix auth generators or custom
  • Risk: High effort, uncertain payoff. The web layer works fine.

Agent coding interface (pi/codex → ???)

  • pi is the coding agent harness — tool use, file I/O, sessions, extensions
  • Jido Shell exists (virtual workspace, command execution) but is early
  • No equivalent of codex exec in Elixir ecosystem
  • Could use Jido AI + Jido Shell + LLM Actions for a custom agent loop
  • Risk: Very high. This is the creative heart of the system. Building a coding agent from scratch is a multi-month project.

CLI (Effect-TS → Mix tasks or Burrito)

  • Burrito for standalone CLI binary
  • Mix tasks for development
  • Effect’s structured error handling → with chains + tagged tuples
  • HATEOAS output → Jason encoding of response structs
  • Risk: Medium. CLI is well-tested and stable. Rewriting is friction without clear gain.

Migration Strategies

Strategy A: Full rewrite (12-18 months)

Replace everything. Single Elixir umbrella app.

PhaseDurationScope
0. Proof of concept2 weeksOne Jido Agent running heartbeat, connected to Redis
1. Event bus6 weeksAll 74 functions as OTP processes, kill Inngest
2. Gateway4 weeksJido Agent with Telegram, kill launchd daemons
3. Memory3 weeksObserve/reflect/promote as Jido pipeline
4. Web8 weeksPhoenix LiveView, kill Next.js/Vercel/Convex
5. Agent coding8+ weeksCustom coding agent on Jido, kill pi dependency
6. CLI3 weeksMix tasks + Burrito binary

Total: ~34 weeks. Highly ambitious. System is offline-capable for nothing during migration.

Strategy B: Hybrid — BEAM backend, keep JS frontend (6-8 months)

Replace the infrastructure layer where BEAM wins. Keep the web frontend.

PhaseDurationScope
0. Proof of concept2 weeksJido Agent running heartbeat
1. Event bus6 weeks74 functions → OTP, kill Inngest
2. Gateway4 weeksJido Agent with Telegram adapter
3. Memory3 weeksPipeline as Jido Agents
4. API layer2 weeksPhoenix API serving Next.js frontend
5. CLI bridge2 weeksElixir CLI or keep TS CLI calling Phoenix API

Total: ~19 weeks. Next.js stays on Vercel. Convex stays. pi/codex stay for coding tasks. BEAM handles all backend orchestration.

Strategy C: Incremental strangler — one function at a time (ongoing)

Run Elixir alongside TypeScript. Migrate functions incrementally.

StepScope
1. Elixir app with Phoenix.PubSub subscribing to Redis eventsBridge
2. Migrate simplest functions first (heartbeat, system-logger, daily-digest)3-5 functions
3. Gradually move more functions, one at a timeMonths
4. When >50% migrated, evaluate killing InngestDecision point

Total: open-ended. Low risk, but carries the cost of running two runtimes indefinitely. Operational complexity increases before it decreases.

What You Gain

  1. Single runtime — one BEAM VM replaces Inngest pod + Bun worker + gateway daemon + 5 launchd services
  2. True fault tolerance — supervisor trees replace tripwire/watchdog/heartbeat recovery machinery
  3. Hot code reload — deploy without restart, no stale function registry, no session kill
  4. Process isolation — each function, each webhook, each agent session — isolated heap, crash one ≠ crash all
  5. Native concurrency — no Redis pub/sub hack, no dual-client, no LPUSH/LRANGE drain
  6. Dramatic infra simplification — kill: Inngest k8s pod, 5+ launchd plists, worker clone sync, gateway session files, Redis pub/sub layer
  7. Jido-native agent patterns — signal routing, directive execution, plugin architecture, FSM strategies all align with joelclaw’s signal→action→effect pattern
  8. LiveBook for exploration — interactive notebooks for system debugging (Elixir’s Jupyter equivalent)

What You Lose

  1. Inngest dashboard — best-in-class run visualization. Must build equivalent or use Grafana/custom telemetry.
  2. TypeScript ecosystem — npm is vast. Hex.pm is smaller. Some integrations (Convex SDK, next-mdx-remote, shiki) have no Elixir equivalent.
  3. Vercel CDN + ISR — Phoenix can be fast but no automatic edge network. Fly.io is the closest equivalent.
  4. pi/codex integration — the entire coding-agent loop is TypeScript-native. No Elixir equivalent at comparable maturity.
  5. Momentum — 74 functions, 40k LOC, 100+ ADRs of decisions encoded in the current stack. Migration cost is real and the current system works.
  6. Convex real-time — LiveView is comparable but losing Convex means losing its conflict resolution, transactional guarantees, and managed hosting.
  7. Effect-TS patterns — structured error handling, dependency injection, schema validation. Elixir has equivalents (with, application config, NimbleOptions) but different idioms.
  8. Team familiarity — Joel and all current agent tooling are TypeScript-native.

Jido Framework Assessment

Strengths for joelclaw

  • Agent-as-process model perfectly maps to gateway, memory pipeline, and function workers
  • Signal routing (CloudEvents-based) directly replaces Inngest event dispatch
  • Directive system (Emit, Spawn, Schedule) maps to current event fan-out patterns
  • Plugin architecture allows incremental capability addition (memory, identity, chat)
  • Built-in telemetry with OTEL tracer behavior — no custom o11y-logging skill needed
  • FSM Strategy maps to gateway state management (idle → processing → responding → idle)
  • Worker pools for high-throughput webhook/event processing
  • 25KB per agent — could run hundreds of specialized agents (one per integration) in one VM

Gaps / Risks

  • Maturity: Jido is ~1 year old, single maintainer (Mike Hostetler). GitHub stars ~250. Production usage unclear beyond demos.
  • jido_ai: LLM integration exists but unclear depth — tool calling, streaming, multi-provider support?
  • jido_memory: Exists but documentation thin. May need to build memory pipeline from scratch on top of Jido primitives.
  • jido_shell: Virtual filesystem shell — interesting for sandboxed code execution but not a pi replacement.
  • No coding agent: Jido has no equivalent of pi/codex for autonomous code generation. This would be a ground-up build.
  • Community: Small. If Mike stops maintaining it, we’re on our own. Compare to Inngest (funded company, active development).

Cost/Benefit Summary

FactorFull RewriteHybridStrangler
Infra simplification★★★★★★★★★★★
Risk★★★★★★★★
Calendar time12-18 months6-8 monthsOngoing
Operational complexity during migrationHighMediumHigh (two runtimes)
Web layer disruptionTotalNoneNone
Agent coding disruptionTotalNoneNone
Inngest dashboard lossImmediateImmediateGradual

Open Questions

  1. Is Jido production-ready? Single maintainer, ~1 year old. What happens if it’s abandoned?
  2. Can Jido AI handle the LLM integration depth we need? Tool calling, streaming, multi-provider, structured output?
  3. What’s the coding agent story? Without pi/codex, how do agent loops write code? Jido Shell + LLM Actions?
  4. Is the web layer worth migrating? Next.js + Vercel works. Phoenix LiveView is good but rewriting the site is pure cost.
  5. What about Convex? Keep it for the web frontend? Replace with Ecto + Postgres? Both have trade-offs.
  6. Can we justify 6-8 months of migration for a personal system? The current system works. Is the maintenance burden high enough to warrant this?
  7. Would starting from Elixir primitives (without Jido) be simpler? OTP + Phoenix + Oban (for job processing) is a mature, proven stack. Jido adds agent abstractions but also adds a dependency risk.

Alternatives Considered

Option A: Jido (full agent framework)

  • Agent-as-process, signals, directives, plugins, FSM strategies
  • ~1 year old, single maintainer, ~250 stars
  • Ecosystem: jido_ai, jido_memory, jido_shell, jido_chat
  • Pro: Strongest agent abstractions, signal routing maps to Inngest events
  • Con: Young, small community, unclear production usage. Dependency risk.
  • Oban — mature, funded, production-proven job processor. Direct Inngest replacement with durable jobs, cron, retries, priorities, workflows (fan-out/fan-in), queue isolation, telemetry. Oban Web provides dashboard (replaces Inngest dashboard).
  • LangChain Elixir — LLM integration (OpenAI, Anthropic, Google, Bumblebee). Maintained by Mark Ericksen (Fly.io). 2+ years, production-used.
  • Raw OTP GenServers for gateway agent, memory pipeline, long-lived processes
  • Phoenix LiveView for web (or keep Next.js, Phoenix serves API)
  • Cachex/ETS replaces Redis cache. Phoenix.PubSub replaces Redis pub/sub.
  • Pro: Battle-tested stack, no dependency on young frameworks. Oban is 6+ years old with thousands of production deployments.
  • Con: No pre-built agent patterns. ~500 lines of custom GenServer code to build signal routing / directive execution.
  • Assessment: AppUnite and George Guimarães both conclude the Elixir community doesn’t need agent frameworks — OTP primitives are the agent framework. “Effective agent-based systems rely on software architecture… teams should retain control over system design.”

Option C: SwarmEx

  • Lightweight agent orchestration, early stage, author acknowledges bugs
  • Pro: Minimal. Con: Not production-ready. Skip.

Option D: Stay TypeScript, reduce infrastructure

  • Replace Inngest with Trigger.dev or self-hosted Temporal
  • Simplify launchd to fewer services
  • Accept the maintenance burden as cost of TypeScript ecosystem benefits
  • Pro: No migration. Incremental improvements.
  • Con: Fundamental process model limitations remain.

Option E: Deno + TypeScript

  • Better process model than Node/Bun
  • Still single-threaded per isolate
  • Doesn’t solve the fundamental “processes as first-class” problem
  • Pro: Stay in TypeScript. Con: Marginal improvement.

Consequences

This ADR is in researching status. No decision has been made.

Next steps if pursuing further:

  1. Build a proof-of-concept: single Jido Agent running heartbeat, connected to existing Redis
  2. Evaluate Jido AI for LLM tool calling depth
  3. Benchmark: one Inngest function vs equivalent Jido Agent (latency, memory, observability)
  4. Talk to Mike Hostetler about Jido roadmap and production usage
  5. Prototype Telegram adapter as Jido Signal plugin