Utah and joelclaw: Convergent Architecture
Dan Farrelly open-sourced Utah (Universally Triggered Agent Harness), and it’s basically a clean-room implementation of the same architecture joelclaw landed on independently. I tore it apart to understand the design decisions and found the convergence is almost eerie.
What Utah Is
Utah is a personal AI agent that connects to Telegram and Slack, runs an LLM think/act/observe loop, and uses Inngest for every durable operation. It builds on pi-ai (Mario Zechner’s unified LLM interface) and pi-coding-agent’s battle-tested tools (read, edit, write, bash, grep, find, ls).
The entire codebase is 51 files. No monorepo. No framework layer. Just TypeScript, Inngest, and pi.
The Architecture Map
Here’s where it gets interesting. Almost every Utah component maps 1:1 to a joelclaw subsystem:
| Concept | Utah | joelclaw |
|---|---|---|
| Agent loop | agent-loop.ts, while loop where each iteration is step.run | Gateway daemon pi session + system-bus Inngest functions |
| Channels | channels/, Telegram and Slack with ChannelHandler interface | packages/gateway/src/channels/types.ts, same interface pattern |
| Message handling | agent.message.received → singleton per chat | gateway/message.received → priority queue per channel |
| Reply dispatch | agent.reply.ready → channel-agnostic send | gateway/reply.ready → channel formatter → delivery |
| Acknowledgment | Separate function, typing indicator, best effort | Separate step, same pattern with typing + reactions |
| Memory | MEMORY.md + daily logs + 30-min heartbeat distillation | Observation pipeline + vector store + extension enforcement |
| Session persistence | JSONL files in workspace/sessions/ | pi’s append-only JSONL with tree structure |
| Compaction | LLM summarization at 80% of 150K token budget | pi’s built-in compaction + branch summaries |
| Context pruning | Two-tier: soft trim (head+tail) + hard clear at 50K | Similar, pi handles this in prompt composition |
| System prompt | SOUL.md + USER.md + memory injection | SOUL.md + IDENTITY.md + ROLE.md + USER.md + skill injection |
| Tools | pi-coding-agent tools + remember + web_fetch + delegate_task | 52 skills + pi tools + MCP + custom tools via extension API |
| Sub-agents | delegate_task → step.invoke() → isolated loop, no recursion | Codex delegation → separate process, sandboxed |
| Failure handling | Global handler on inngest/function.failed → user notification | Same pattern, function.failed → gateway notification |
| Heartbeat | Cron every 30min, check if distillation needed and skip if not | Cron every 15min, health checks that only act if needed |
| Worker | connect(), WebSocket to Inngest Cloud with no server needed | k8s worker serves HTTP to self-hosted Inngest |
Where They Diverge
The convergence tells you what’s necessary for a personal agent. The divergence tells you what’s optional, or at least what reflects different constraints.
Scale of ambition. Utah is 51 files, joelclaw is 110+ Inngest functions across a monorepo. Utah is a reference implementation showing the pattern. joelclaw is a lived-in system that’s been accumulating capability fast over a couple weeks: CLI, ADR tracking, video pipeline, content publishing, email processing, contact enrichment, and calendar integration.
Memory architecture. Utah’s memory is file-based: MEMORY.md (curated by LLM distillation) + daily logs (append-only). Simple, effective, and great for a reference implementation. joelclaw’s is multi-layered: observation pipeline with write gates, vector embeddings in PGlite, semantic recall, and a mandatory participation contract enforced by pi extension. The complexity is earned because the system already needs structured recall.
Identity model. Utah has SOUL.md (4 lines) and USER.md. joelclaw splits identity across four files: SOUL, IDENTITY, ROLE, and USER. Different agent surfaces (gateway daemon, codex workers, interactive sessions) need different role definitions with the same core identity. The organism metaphor matters when you have multiple appendages.
Hosting. Utah uses connect(), a WebSocket connection to Inngest Cloud. No server, no public endpoint, no ngrok. Brilliant for getting started. joelclaw self-hosts Inngest on a k8s cluster because I’m a control freak who wants to own every byte. Both are valid. Utah’s approach is objectively easier.
Framework relationship. Utah uses pi-ai (the LLM library) and pi-coding-agent (the tool implementations), but not pi-the-framework. joelclaw uses pi as a full framework: sessions, extensions, skills, themes. This is the exact difference from The Harness Is a Framework; Utah is genuinely harness-shaped while joelclaw embraced the framework and built on top of it.
The Patterns That Converged
When two systems built independently arrive at the same design, pay attention. These patterns are likely necessary, not incidental:
Event-driven message flow. Both normalize inbound messages into typed events and dispatch via Inngest. Neither does synchronous request/response. The agent is async by nature, and fighting that creates fragile systems.
Channel abstraction. Both define a ChannelHandler interface with sendReply and acknowledge. Both keep the agent loop channel-agnostic. The shape of this interface is almost identical.
Singleton concurrency. One conversation at a time per chat. Utah uses Inngest’s singleton config. joelclaw uses Redis-backed priority queues. Same goal: prevent race conditions when the human sends faster than the agent thinks.
Memory as system prompt injection. Both load memory into the system prompt, not as tool results. Memory is context, not conversation.
Separate acknowledgment. Both fire a separate function to show “typing” immediately, independent of the agent loop. The user needs to know they were heard before the agent thinks.
Failure notification. Both catch inngest/function.failed and notify the user. Silent failures in agent systems are deadly. The user thinks the agent is ignoring them.
Context management. Both prune old tool results and compact long conversations. Both estimate tokens with chars/4. Context management isn’t a nice-to-have. Without it, agents degrade over hours.
What This Means
Dan built Utah as a reference implementation for the ideas in his article about agents needing infrastructure over frameworks. The fact that it maps so closely to joelclaw, which was built organically over a couple weeks of daily use, is the strongest possible validation of those ideas.
The patterns aren’t opinions. They’re convergent evolution. Two systems solving the same problem arrived at the same shape because that’s the shape the problem demands.
If you’re building a personal agent, start with Utah. It’s clean, minimal, and gets the architecture right. When you outgrow it, when you need 52 skills instead of 3 tools, when you need a CLI, and when you need vector memory instead of file-based memory, you’ll know exactly which layers to add because Utah already taught you where the seams are.