Embed pi as a library in a joelclaw gateway daemon
Context and Problem Statement
The current central gateway session (ADR-0036) runs pi inside tmux, managed by launchd. A gateway extension injects events via sendUserMessage(). This works for Redis-based notifications but has fundamental limitations:
-
No mobile access — Joel currently SSH’s into the Mac Mini from Termius on his phone to interact with pi. This works but fights the medium (tiny keyboard, no streaming, terminal rendering issues).
-
No multi-channel routing — Replies stay inside the pi TUI. There’s no way to route a response to Telegram, Slack, or a native app. The agent can receive events but can’t talk back through the channel that asked.
-
No streaming —
sendUserMessage()is fire-and-forget. The extension can’t stream LLM deltas to external clients. -
TMux PTY hack — Pi is a TUI app that needs a terminal. The tmux wrapper adds complexity and an extra process layer. OpenClaw solved this by embedding pi as a library — no terminal needed.
-
Extension limitations — The gateway extension can inject prompts and drain events, but it can’t control the session lifecycle, model selection, compaction, or routing.
What We Want
Talk to the agent from anywhere:
- Telegram — Send a message from your phone, get a response
- Native iOS/macOS app — Purpose-built UI (future, on roadmap)
- WebSocket — Attach from any terminal (like
openclaw tui) - Redis bridge — Inngest events still flow in (existing infrastructure)
- All inputs serialize through one session — Same conversation, same memory, same context
How OpenClaw Does It
OpenClaw’s gateway daemon (src/macos/gateway-daemon.ts) is a standalone Node.js process that:
- Embeds pi via
createAgentSession()from@mariozechner/pi-coding-agent - Serializes all inputs through a
CommandQueuewith lanes — TUI, heartbeat, Telegram, Discord, etc. all go through one queue into one pi session - Routes replies back through channel-specific outbound adapters (Telegram HTML chunks, Discord markdown, WhatsApp formatting, etc.)
- Streams deltas to connected WebSocket clients (TUI, mobile app)
- Runs as a launchd daemon —
KeepAlive: true, no terminal needed - Manages channels via a plugin system — each channel implements
ChannelPlugin(config, gateway lifecycle, outbound adapter, status probes)
The TUI (openclaw tui) is a WebSocket client that connects to the running daemon — it doesn’t run pi directly.
Decision
Build a joelclaw gateway daemon that embeds pi as a library, replacing the current tmux + extension approach. Start with Telegram as the first external channel.
Architecture
launchd (com.joel.gateway)
→ joelclaw-gateway daemon (Node.js)
├── createAgentSession() — owns the LLM conversation
├── CommandQueue — serializes all inputs
├── HeartbeatRunner — periodic checklist (setInterval)
├── Channels:
│ ├── Redis — Inngest event bridge (existing)
│ ├── Telegram — grammY bot (first external channel)
│ ├── WebSocket — TUI attach + future native app
│ └── (future: Discord, Slack, iMessage, web)
├── OutboundRouter — route replies to source channel
└── Watchdog — heartbeat staleness detection (ADR-0037)Session Ownership
The daemon owns the pi session via createAgentSession(). This gives us:
- Full control over model, thinking level, compaction
session.prompt()for synchronous prompt/responsesession.subscribe()for streaming deltas to channelssession.sendUserMessage()withfollowUpfor async injection- Same extensions, skills, tools as interactive pi (auto-discovered from
~/.pi/agent/) - Persistent session file (conversation survives restart)
Command Queue
All inputs serialize through one queue (adapted from OpenClaw’s CommandLane):
type QueueEntry = {
source: ChannelId; // "telegram:12345", "redis", "ws:abc", "heartbeat"
prompt: string;
replyTo?: string; // Channel-specific reply target
metadata?: Record<string, unknown>;
};The queue drains sequentially — one prompt at a time. While the LLM is responding, new messages queue up (OpenClaw calls this the “main lane”).
Outbound Routing
When the LLM responds, the reply routes back to the channel that sent the prompt:
- Telegram → Format as Telegram HTML, send via grammY
- Redis → Push to
joelclaw:events:{sessionId}(satellite notification) - WebSocket → Stream deltas as JSON frames
- Heartbeat → Filter
HEARTBEAT_OK(suppress), deliver non-OK to notification channel
Telegram Channel (First Implementation)
Phone (Telegram) → Bot API → grammY handler → CommandQueue → pi session
↓
Phone (Telegram) ← Bot API ← Telegram outbound ← OutboundRouter ←─┘- grammY bot with long polling (no webhook needed — runs on the tailnet)
- Allowlist: Joel’s Telegram user ID only
- Message types: text, photos (as image attachments), voice (future: whisper transcription)
- Reply formatting: Markdown → Telegram HTML with chunk splitting (4000 char limit)
- Typing indicator while LLM is working
WebSocket Channel (TUI Attach)
# Attach to the running daemon from any terminal
joelclaw tui
# Or from Termius on the phone
ssh joel@mac-mini "joelclaw tui"Protocol: JSON frames over WebSocket (simplified from OpenClaw’s protocol):
{type: "prompt", text: "..."}— send a message{type: "delta", text: "..."}— streaming response chunk{type: "done", fullText: "..."}— response complete{type: "status", ...}— model, usage, session info
Build Plan
Phase 1: Daemon + Redis (replace current extension) ✅
- Create
packages/gateway/in monorepo -
daemon.ts— entry point,createAgentSession(), launchd lifecycle -
command-queue.ts— sequential input serialization -
channels/redis.ts— port existing Redis bridge from extension -
heartbeat.ts—setIntervalrunner, reads HEARTBEAT.md, watchdog (30min threshold), tripwire file - Update
com.joel.gatewayplist to run daemon directly (no tmux) - Verify: Redis events flow through pi session, responses logged
Phase 2: Telegram ✅
-
channels/telegram.ts— grammY bot, user allowlist, text/photo/voice handlers - Outbound: markdown → Telegram HTML conversion, 4000 char chunking, typing indicator
- Response routing via session.subscribe() delta collection → source channel dispatch
- Bot token in
agent-secrets(leased at startup via gateway-start.sh) - Created @JoelClawPandaBot via @BotFather
- Verified: full round-trip — phone → Telegram → pi session → Telegram → phone
Phase 3: WebSocket + TUI
-
channels/websocket.ts— WS server on localhost (Tailscale accessible) -
joelclaw tuiCLI command — connects to daemon WS, renders in terminal - Stream deltas to connected clients
- Auth: Tailscale identity or simple token
Phase 4: Native App Foundation
- WebSocket protocol stabilized
- Session info endpoint (model, usage, messages)
- Consider React Native or Swift UI for iOS
- Consider whether to port OpenClaw’s mobile node protocol
Considered Options
Option 1: Telegram bot on current extension (rejected as long-term)
Quick win (~1 hour) but doesn’t solve the fundamental limitations. The extension can’t control session lifecycle, can’t stream, can’t properly route replies. Would need to be rewritten anyway.
Option 2: OpenClaw deployment (rejected — ADR-0003)
OpenClaw has everything we want, but it’s a different system with different opinions about configuration, channel management, and multi-agent orchestration. We’ve already diverged significantly (Inngest over job queues, Qdrant over SQLite, k8s over localhost). Embedding pi directly gives us the session management without the rest.
Option 3: Embedded pi daemon (chosen)
Best of both worlds: OpenClaw’s proven architecture pattern (embedded pi, command queue, channel plugins) with joelclaw’s infrastructure (Inngest, Redis, k8s, Tailscale). We own the daemon code, control the channel implementations, and can evolve at our own pace.
Consequences
Positive
- Talk to the agent from Telegram (phone), WebSocket (any terminal), and future native app
- Streaming responses to all channels
- No tmux PTY hack — pure headless Node.js daemon
- Same session, skills, extensions, tools as interactive pi
- Foundation for native iOS/macOS app
- Outbound delivery: agent can proactively message Joel on Telegram (not just respond)
Negative
- More code to maintain (daemon + channels vs. extension)
picommand no longer used for central session (it’s embedded in the daemon)- Need to build TUI attach for terminal access (or use Termius →
joelclaw tui) - Telegram bot token is a new secret to manage
- Channel-specific formatting (Telegram HTML, Discord markdown) is ongoing work
Non-goals (for now)
- Multi-agent: one daemon = one pi session. Subagents are future work.
- Voice: Telegram voice messages → Whisper transcription is Phase 2+.
- Group chats: Bot responds only in DMs with Joel.
- End-to-end encryption: Tailscale provides transport security.
Implementation
Affected Paths
| Path | Change |
|---|---|
packages/gateway/ | New package — daemon, channels, outbound, heartbeat |
~/Library/LaunchAgents/com.joel.gateway.plist | Updated: runs daemon directly, no tmux |
~/.joelclaw/scripts/gateway-start.sh | Simplified: just exec the daemon |
~/.pi/agent/extensions/gateway/ | Deprecated: functionality moves into daemon |
packages/cli/src/commands/gateway.ts | Add tui subcommand for WebSocket attach |
Dependencies
| Package | Purpose |
|---|---|
@mariozechner/pi-coding-agent | Pi SDK — createAgentSession, tools, extensions |
@mariozechner/pi-ai | Model selection (getModel) |
grammy | Telegram Bot API |
ioredis | Redis pub/sub bridge |
ws | WebSocket server |
Verification
-
createAgentSession()works headless (no TUI, no terminal) - Extensions and skills auto-discovered from
~/.pi/agent/ - AGENTS.md loaded as system prompt context
- Heartbeat fires every 15 min, HEARTBEAT_OK filtered
- Redis events from Inngest flow through to session
- Telegram message → LLM response → Telegram reply (round-trip)
- WebSocket streaming deltas to connected client
- launchd restart on crash (KeepAlive)
- Session file persists across daemon restarts
- Satellite pi sessions still get targeted notifications