Utah and joelclaw: Convergent Architecture

aiarchitectureinngestutahconvergent-evolutionarticle

Dan Farrelly open-sourced Utah (Universally Triggered Agent Harness), and it’s basically a clean-room implementation of the same architecture joelclaw landed on independently. I tore it apart to understand the design decisions and found the convergence is almost eerie.

What Utah Is

Utah is a personal AI agent that connects to Telegram and Slack, runs an LLM think/act/observe loop, and uses Inngest for every durable operation. It builds on pi-ai (Mario Zechner’s unified LLM interface) and pi-coding-agent’s battle-tested tools (read, edit, write, bash, grep, find, ls).

The entire codebase is 51 files. No monorepo. No framework layer. Just TypeScript, Inngest, and pi.

The Architecture Map

Here’s where it gets interesting. Almost every Utah component maps 1:1 to a joelclaw subsystem:

ConceptUtahjoelclaw
Agent loopagent-loop.ts, while loop where each iteration is step.runGateway daemon pi session + system-bus Inngest functions
Channelschannels/, Telegram and Slack with ChannelHandler interfacepackages/gateway/src/channels/types.ts, same interface pattern
Message handlingagent.message.received → singleton per chatgateway/message.received → priority queue per channel
Reply dispatchagent.reply.ready → channel-agnostic sendgateway/reply.ready → channel formatter → delivery
AcknowledgmentSeparate function, typing indicator, best effortSeparate step, same pattern with typing + reactions
MemoryMEMORY.md + daily logs + 30-min heartbeat distillationObservation pipeline + vector store + extension enforcement
Session persistenceJSONL files in workspace/sessions/pi’s append-only JSONL with tree structure
CompactionLLM summarization at 80% of 150K token budgetpi’s built-in compaction + branch summaries
Context pruningTwo-tier: soft trim (head+tail) + hard clear at 50KSimilar, pi handles this in prompt composition
System promptSOUL.md + USER.md + memory injectionSOUL.md + IDENTITY.md + ROLE.md + USER.md + skill injection
Toolspi-coding-agent tools + remember + web_fetch + delegate_task52 skills + pi tools + MCP + custom tools via extension API
Sub-agentsdelegate_taskstep.invoke() → isolated loop, no recursionCodex delegation → separate process, sandboxed
Failure handlingGlobal handler on inngest/function.failed → user notificationSame pattern, function.failed → gateway notification
HeartbeatCron every 30min, check if distillation needed and skip if notCron every 15min, health checks that only act if needed
Workerconnect(), WebSocket to Inngest Cloud with no server neededk8s worker serves HTTP to self-hosted Inngest

Where They Diverge

The convergence tells you what’s necessary for a personal agent. The divergence tells you what’s optional, or at least what reflects different constraints.

Scale of ambition. Utah is 51 files, joelclaw is 110+ Inngest functions across a monorepo. Utah is a reference implementation showing the pattern. joelclaw is a lived-in system that’s been accumulating capability fast over a couple weeks: CLI, ADR tracking, video pipeline, content publishing, email processing, contact enrichment, and calendar integration.

Memory architecture. Utah’s memory is file-based: MEMORY.md (curated by LLM distillation) + daily logs (append-only). Simple, effective, and great for a reference implementation. joelclaw’s is multi-layered: observation pipeline with write gates, vector embeddings in PGlite, semantic recall, and a mandatory participation contract enforced by pi extension. The complexity is earned because the system already needs structured recall.

Identity model. Utah has SOUL.md (4 lines) and USER.md. joelclaw splits identity across four files: SOUL, IDENTITY, ROLE, and USER. Different agent surfaces (gateway daemon, codex workers, interactive sessions) need different role definitions with the same core identity. The organism metaphor matters when you have multiple appendages.

Hosting. Utah uses connect(), a WebSocket connection to Inngest Cloud. No server, no public endpoint, no ngrok. Brilliant for getting started. joelclaw self-hosts Inngest on a k8s cluster because I’m a control freak who wants to own every byte. Both are valid. Utah’s approach is objectively easier.

Framework relationship. Utah uses pi-ai (the LLM library) and pi-coding-agent (the tool implementations), but not pi-the-framework. joelclaw uses pi as a full framework: sessions, extensions, skills, themes. This is the exact difference from The Harness Is a Framework; Utah is genuinely harness-shaped while joelclaw embraced the framework and built on top of it.

The Patterns That Converged

When two systems built independently arrive at the same design, pay attention. These patterns are likely necessary, not incidental:

Event-driven message flow. Both normalize inbound messages into typed events and dispatch via Inngest. Neither does synchronous request/response. The agent is async by nature, and fighting that creates fragile systems.

Channel abstraction. Both define a ChannelHandler interface with sendReply and acknowledge. Both keep the agent loop channel-agnostic. The shape of this interface is almost identical.

Singleton concurrency. One conversation at a time per chat. Utah uses Inngest’s singleton config. joelclaw uses Redis-backed priority queues. Same goal: prevent race conditions when the human sends faster than the agent thinks.

Memory as system prompt injection. Both load memory into the system prompt, not as tool results. Memory is context, not conversation.

Separate acknowledgment. Both fire a separate function to show “typing” immediately, independent of the agent loop. The user needs to know they were heard before the agent thinks.

Failure notification. Both catch inngest/function.failed and notify the user. Silent failures in agent systems are deadly. The user thinks the agent is ignoring them.

Context management. Both prune old tool results and compact long conversations. Both estimate tokens with chars/4. Context management isn’t a nice-to-have. Without it, agents degrade over hours.

What This Means

Dan built Utah as a reference implementation for the ideas in his article about agents needing infrastructure over frameworks. The fact that it maps so closely to joelclaw, which was built organically over a couple weeks of daily use, is the strongest possible validation of those ideas.

The patterns aren’t opinions. They’re convergent evolution. Two systems solving the same problem arrived at the same shape because that’s the shape the problem demands.

If you’re building a personal agent, start with Utah. It’s clean, minimal, and gets the architecture right. When you outgrow it, when you need 52 skills instead of 3 tools, when you need a CLI, and when you need vector memory instead of file-based memory, you’ll know exactly which layers to add because Utah already taught you where the seams are.

Utah on GitHub