Elixir/BEAM/Jido Migration — Full Architecture Evaluation
Context
joelclaw is a personal AI operating system built in TypeScript. After 100+ ADRs and ~40k LOC of system-bus functions, a recurring theme has emerged: a significant fraction of the infrastructure exists to compensate for problems the BEAM VM solves natively.
Specific pain points that BEAM eliminates by design:
| Current Pain | Root Cause | BEAM Native Solution |
|---|---|---|
| launchd daemons (5+) for process management | Node.js is one-process-per-service | OTP supervisors — unlimited lightweight processes in one VM |
| Worker clone sync workflow (push → fetch → reset → kickstart → PUT sync) | Can’t hot-reload a running Bun process | Hot code reload — deploy without restart |
| Redis pub/sub dual-client hack | ioredis subscription blocks the client | Process mailboxes — every process has built-in messaging |
Gateway session file persistence (~/.joelclaw/gateway.session) | pi sessions are stateful and fragile | GenServer state — survives crashes via supervisor restart |
| Inngest as external orchestrator (k8s pod, function registry sync, stale registration) | No built-in durable execution in Node.js | OTP — Task, GenServer, Supervisor provide durable execution natively |
| Priority queue in Redis sorted sets (ADR-0104) | No native priority mailbox | receive with pattern matching on priority tuples |
| Concurrency guards via Redis keys + TTLs | No process-level isolation | Process-per-resource — each has isolated state, no shared memory |
| Notification dedup via Redis cooldowns | Shared-nothing requires external coordination | Process state — cooldown timers live in the process, no Redis round-trip |
| Tripwire/watchdog for gateway health | Process crashes are silent and unrecoverable | Supervisors with restart strategies — crash recovery is the default |
| Extension reload requires session kill + daemon restart | pi loads extensions once at session start | Hot code reload — update modules in the running VM |
The Jido framework provides an Elixir-native agent architecture with:
- Agents as 25KB processes (GenServer-backed, supervisor-managed)
- Signal-based communication (CloudEvents-compliant, type-routed)
- Directive system (Emit, Spawn, Schedule — pure functional core, effectful runtime)
- Plugin architecture (composable capability bundles with state, actions, routes)
- Built-in observability (telemetry events, OTEL tracer behavior, debug mode)
- FSM strategies (state machine workflows — maps to gateway state management)
- Worker pools (pre-warmed agent pools — maps to webhook/event processing)
- Memory and identity plugins (built-in, maps to current memory system)
Current System Inventory
What exists today
| Component | Tech | LOC | Files | Complexity |
|---|---|---|---|---|
| Event bus functions | Inngest + TypeScript | ~34k | 63 | High — durable steps, retries, cron, fan-out |
| Gateway daemon | pi session + launchd | ~3k | 8 | High — Telegram, Redis bridge, session mgmt |
| CLI | Effect-TS + Bun | ~8k | 25 | Medium — 30+ commands, HATEOAS output |
| Web app | Next.js + Convex | ~5k | 40 | Medium — MDX, auth, real-time dashboard |
| Memory system | Typesense + Redis | ~2k | 6 | Medium — observe, reflect, promote, triage |
| Webhook server | Express + providers | ~1.5k | 10 | Low — signature verify, dispatch |
| Cache layer (ADR-0112) | Redis + file | ~350 | 1 | Low — just shipped |
| Infrastructure glue | launchd, k8s, bash | ~500 | 15 | High — fragile, stateful, manual |
External dependencies that complicate migration
| Dependency | Current Role | Migration Difficulty |
|---|---|---|
| Inngest | Event orchestration, durable execution, cron, retry | Hard — 74 functions, deeply integrated step patterns |
| Convex | Real-time DB for web app (schema, queries, auth) | Hard — tight coupling with Next.js frontend |
| pi | Agent coding harness (tool use, sessions, extensions) | Very Hard — no Elixir equivalent exists |
| Vercel | CDN, ISR, edge functions for Next.js | Medium — Phoenix deployment is different |
| Telegram Bot API | Primary human interface | Easy — HTTP API, language-agnostic |
| Front API | Email integration | Easy — HTTP API |
| Todoist API | Task management | Easy — HTTP API |
| Typesense | Search + vector store | Easy — HTTP API |
| Redis | Cache, pub/sub, state | Partially eliminated — keep for cache, drop pub/sub |
What BEAM/Jido Replaces Naturally
Tier 1: Direct replacements (high confidence, clear win)
Inngest → OTP Supervision Tree
- Each of 74 functions becomes a GenServer or Task under a supervisor
- Cron schedules via Quantum (mature, production-proven)
- Retries via supervisor restart strategies + exponential backoff in process
- Step functions → sequential
withchains or Jido Strategy FSM states - Fan-out →
Task.Supervisor.async_stream_nolink/3 - Concurrency limits → process-per-resource with GenServer call serialization
- Eliminates: Inngest k8s pod, function registry sync, PUT sync, stale registration bugs
- Risk: Lose Inngest dashboard (run traces, step visualization). Must build equivalent.
Redis pub/sub → Phoenix.PubSub / process messaging
- Gateway event bridge becomes PubSub topic subscription
- No more dual-client ioredis hack
- No more LPUSH/LRANGE drain pattern
- Eliminates: Redis subscription client, gateway drain logic
launchd daemons → OTP Application
- system-bus-worker, gateway, gateway-tripwire, content-sync-watcher, vault-log-sync, typesense-portforward — all become child processes in one supervision tree
- Single
mix releaseproduces one deployable artifact - Eliminates: 5+ launchd plists, PID management, session files, kickstart commands
Priority queue (ADR-0104) → process mailbox
def handle_info({:message, priority, payload}, state) do
state = %{state | queue: PriorityQueue.push(state.queue, priority, payload)}
{:noreply, drain(state)}
end- Starvation prevention, dedup, coalescing — all in-process state
- Eliminates: Redis sorted set, SHA-256 dedup window, aging promotion logic
Webhook server → Phoenix/Plug
- Plug pipeline for signature verification
- Pattern-match dispatch to handlers
- Each provider becomes a Plug module
- Eliminates: Express server, provider registry
Tier 2: Good fit but requires design work
Gateway agent → Jido Agent with Telegram plugin
- Gateway becomes a long-lived Jido Agent process
- Telegram channel as a Jido Signal adapter (inbound) + Action (outbound)
- MCQ flows as FSM Strategy states
- Session state in GenServer, not Redis + files
- Risk: No pi integration. How does the agent write code?
Memory system → Jido Memory plugin + Typesense
- Observe/reflect/promote pipeline maps to Jido Agent with memory plugin
- Typesense stays as search backend (HTTP API, language-agnostic)
- Proposal triage becomes a Jido Strategy FSM
- Risk: Current memory pipeline is deeply intertwined with Inngest step patterns
Cache layer (ADR-0112) → ETS / Cachex
- Cachex provides TTL, warm/cold tiers, stats
- ETS for hot cache (faster than Redis for local reads)
- File cache stays for warm tier
- Eliminates: Redis cache keys, cache.ts module
Tier 3: Difficult / unclear benefit
Web app (Next.js → Phoenix LiveView)
- Full rewrite of apps/web (~5k LOC, 40 files)
- MDX pipeline → Earmark + custom transformers (thinner plugin ecosystem)
- Convex real-time → LiveView sockets (comparable but different DX)
- Vercel CDN/ISR → self-hosted Phoenix or Fly.io
- Auth (Better Auth) → Phoenix auth generators or custom
- Risk: High effort, uncertain payoff. The web layer works fine.
Agent coding interface (pi/codex → ???)
- pi is the coding agent harness — tool use, file I/O, sessions, extensions
- Jido Shell exists (virtual workspace, command execution) but is early
- No equivalent of codex exec in Elixir ecosystem
- Could use Jido AI + Jido Shell + LLM Actions for a custom agent loop
- Risk: Very high. This is the creative heart of the system. Building a coding agent from scratch is a multi-month project.
CLI (Effect-TS → Mix tasks or Burrito)
- Burrito for standalone CLI binary
- Mix tasks for development
- Effect’s structured error handling →
withchains + tagged tuples - HATEOAS output → Jason encoding of response structs
- Risk: Medium. CLI is well-tested and stable. Rewriting is friction without clear gain.
Migration Strategies
Strategy A: Full rewrite (12-18 months)
Replace everything. Single Elixir umbrella app.
| Phase | Duration | Scope |
|---|---|---|
| 0. Proof of concept | 2 weeks | One Jido Agent running heartbeat, connected to Redis |
| 1. Event bus | 6 weeks | All 74 functions as OTP processes, kill Inngest |
| 2. Gateway | 4 weeks | Jido Agent with Telegram, kill launchd daemons |
| 3. Memory | 3 weeks | Observe/reflect/promote as Jido pipeline |
| 4. Web | 8 weeks | Phoenix LiveView, kill Next.js/Vercel/Convex |
| 5. Agent coding | 8+ weeks | Custom coding agent on Jido, kill pi dependency |
| 6. CLI | 3 weeks | Mix tasks + Burrito binary |
Total: ~34 weeks. Highly ambitious. System is offline-capable for nothing during migration.
Strategy B: Hybrid — BEAM backend, keep JS frontend (6-8 months)
Replace the infrastructure layer where BEAM wins. Keep the web frontend.
| Phase | Duration | Scope |
|---|---|---|
| 0. Proof of concept | 2 weeks | Jido Agent running heartbeat |
| 1. Event bus | 6 weeks | 74 functions → OTP, kill Inngest |
| 2. Gateway | 4 weeks | Jido Agent with Telegram adapter |
| 3. Memory | 3 weeks | Pipeline as Jido Agents |
| 4. API layer | 2 weeks | Phoenix API serving Next.js frontend |
| 5. CLI bridge | 2 weeks | Elixir CLI or keep TS CLI calling Phoenix API |
Total: ~19 weeks. Next.js stays on Vercel. Convex stays. pi/codex stay for coding tasks. BEAM handles all backend orchestration.
Strategy C: Incremental strangler — one function at a time (ongoing)
Run Elixir alongside TypeScript. Migrate functions incrementally.
| Step | Scope |
|---|---|
| 1. Elixir app with Phoenix.PubSub subscribing to Redis events | Bridge |
| 2. Migrate simplest functions first (heartbeat, system-logger, daily-digest) | 3-5 functions |
| 3. Gradually move more functions, one at a time | Months |
| 4. When >50% migrated, evaluate killing Inngest | Decision point |
Total: open-ended. Low risk, but carries the cost of running two runtimes indefinitely. Operational complexity increases before it decreases.
What You Gain
- Single runtime — one BEAM VM replaces Inngest pod + Bun worker + gateway daemon + 5 launchd services
- True fault tolerance — supervisor trees replace tripwire/watchdog/heartbeat recovery machinery
- Hot code reload — deploy without restart, no stale function registry, no session kill
- Process isolation — each function, each webhook, each agent session — isolated heap, crash one ≠ crash all
- Native concurrency — no Redis pub/sub hack, no dual-client, no LPUSH/LRANGE drain
- Dramatic infra simplification — kill: Inngest k8s pod, 5+ launchd plists, worker clone sync, gateway session files, Redis pub/sub layer
- Jido-native agent patterns — signal routing, directive execution, plugin architecture, FSM strategies all align with joelclaw’s signal→action→effect pattern
- LiveBook for exploration — interactive notebooks for system debugging (Elixir’s Jupyter equivalent)
What You Lose
- Inngest dashboard — best-in-class run visualization. Must build equivalent or use Grafana/custom telemetry.
- TypeScript ecosystem — npm is vast. Hex.pm is smaller. Some integrations (Convex SDK, next-mdx-remote, shiki) have no Elixir equivalent.
- Vercel CDN + ISR — Phoenix can be fast but no automatic edge network. Fly.io is the closest equivalent.
- pi/codex integration — the entire coding-agent loop is TypeScript-native. No Elixir equivalent at comparable maturity.
- Momentum — 74 functions, 40k LOC, 100+ ADRs of decisions encoded in the current stack. Migration cost is real and the current system works.
- Convex real-time — LiveView is comparable but losing Convex means losing its conflict resolution, transactional guarantees, and managed hosting.
- Effect-TS patterns — structured error handling, dependency injection, schema validation. Elixir has equivalents (
with, application config, NimbleOptions) but different idioms. - Team familiarity — Joel and all current agent tooling are TypeScript-native.
Jido Framework Assessment
Strengths for joelclaw
- Agent-as-process model perfectly maps to gateway, memory pipeline, and function workers
- Signal routing (CloudEvents-based) directly replaces Inngest event dispatch
- Directive system (Emit, Spawn, Schedule) maps to current event fan-out patterns
- Plugin architecture allows incremental capability addition (memory, identity, chat)
- Built-in telemetry with OTEL tracer behavior — no custom o11y-logging skill needed
- FSM Strategy maps to gateway state management (idle → processing → responding → idle)
- Worker pools for high-throughput webhook/event processing
- 25KB per agent — could run hundreds of specialized agents (one per integration) in one VM
Gaps / Risks
- Maturity: Jido is ~1 year old, single maintainer (Mike Hostetler). GitHub stars ~250. Production usage unclear beyond demos.
- jido_ai: LLM integration exists but unclear depth — tool calling, streaming, multi-provider support?
- jido_memory: Exists but documentation thin. May need to build memory pipeline from scratch on top of Jido primitives.
- jido_shell: Virtual filesystem shell — interesting for sandboxed code execution but not a pi replacement.
- No coding agent: Jido has no equivalent of pi/codex for autonomous code generation. This would be a ground-up build.
- Community: Small. If Mike stops maintaining it, we’re on our own. Compare to Inngest (funded company, active development).
Cost/Benefit Summary
| Factor | Full Rewrite | Hybrid | Strangler |
|---|---|---|---|
| Infra simplification | ★★★★★ | ★★★★ | ★★ |
| Risk | ★★★★★ | ★★★ | ★ |
| Calendar time | 12-18 months | 6-8 months | Ongoing |
| Operational complexity during migration | High | Medium | High (two runtimes) |
| Web layer disruption | Total | None | None |
| Agent coding disruption | Total | None | None |
| Inngest dashboard loss | Immediate | Immediate | Gradual |
Open Questions
- Is Jido production-ready? Single maintainer, ~1 year old. What happens if it’s abandoned?
- Can Jido AI handle the LLM integration depth we need? Tool calling, streaming, multi-provider, structured output?
- What’s the coding agent story? Without pi/codex, how do agent loops write code? Jido Shell + LLM Actions?
- Is the web layer worth migrating? Next.js + Vercel works. Phoenix LiveView is good but rewriting the site is pure cost.
- What about Convex? Keep it for the web frontend? Replace with Ecto + Postgres? Both have trade-offs.
- Can we justify 6-8 months of migration for a personal system? The current system works. Is the maintenance burden high enough to warrant this?
- Would starting from Elixir primitives (without Jido) be simpler? OTP + Phoenix + Oban (for job processing) is a mature, proven stack. Jido adds agent abstractions but also adds a dependency risk.
Alternatives Considered
Option A: Jido (full agent framework)
- Agent-as-process, signals, directives, plugins, FSM strategies
- ~1 year old, single maintainer, ~250 stars
- Ecosystem: jido_ai, jido_memory, jido_shell, jido_chat
- Pro: Strongest agent abstractions, signal routing maps to Inngest events
- Con: Young, small community, unclear production usage. Dependency risk.
Option B: OTP + Oban + Phoenix (recommended if migrating)
- Oban — mature, funded, production-proven job processor. Direct Inngest replacement with durable jobs, cron, retries, priorities, workflows (fan-out/fan-in), queue isolation, telemetry. Oban Web provides dashboard (replaces Inngest dashboard).
- LangChain Elixir — LLM integration (OpenAI, Anthropic, Google, Bumblebee). Maintained by Mark Ericksen (Fly.io). 2+ years, production-used.
- Raw OTP GenServers for gateway agent, memory pipeline, long-lived processes
- Phoenix LiveView for web (or keep Next.js, Phoenix serves API)
- Cachex/ETS replaces Redis cache. Phoenix.PubSub replaces Redis pub/sub.
- Pro: Battle-tested stack, no dependency on young frameworks. Oban is 6+ years old with thousands of production deployments.
- Con: No pre-built agent patterns. ~500 lines of custom GenServer code to build signal routing / directive execution.
- Assessment: AppUnite and George Guimarães both conclude the Elixir community doesn’t need agent frameworks — OTP primitives are the agent framework. “Effective agent-based systems rely on software architecture… teams should retain control over system design.”
Option C: SwarmEx
- Lightweight agent orchestration, early stage, author acknowledges bugs
- Pro: Minimal. Con: Not production-ready. Skip.
Option D: Stay TypeScript, reduce infrastructure
- Replace Inngest with Trigger.dev or self-hosted Temporal
- Simplify launchd to fewer services
- Accept the maintenance burden as cost of TypeScript ecosystem benefits
- Pro: No migration. Incremental improvements.
- Con: Fundamental process model limitations remain.
Option E: Deno + TypeScript
- Better process model than Node/Bun
- Still single-threaded per isolate
- Doesn’t solve the fundamental “processes as first-class” problem
- Pro: Stay in TypeScript. Con: Marginal improvement.
Consequences
This ADR is in researching status. No decision has been made.
Next steps if pursuing further:
- Build a proof-of-concept: single Jido Agent running heartbeat, connected to existing Redis
- Evaluate Jido AI for LLM tool calling depth
- Benchmark: one Inngest function vs equivalent Jido Agent (latency, memory, observability)
- Talk to Mike Hostetler about Jido roadmap and production usage
- Prototype Telegram adapter as Jido Signal plugin