Gateway TUI via WebSocket
Status
Accepted
Context
The joelclaw gateway daemon (com.joel.gateway) runs a headless pi session via createAgentSession() from the pi SDK. It accepts messages through Redis pub/sub and Telegram channels, but there is no way to interact with it from a terminal.
This creates several problems:
- No visibility — when the gateway hangs (e.g., a bash tool call running
findfor 12+ minutes with no timeout), there’s no way to see what it’s doing without reading raw log files and process tables. - Health checks lie —
joelclaw gateway statusreported “healthy” while the session was stuck mid-stream on a hung tool call. It only checks Redis connectivity and process existence, not session state. - No attach workflow — unlike OpenClaw which has
openclaw tuiconnecting to a gateway via WebSocket, joelclaw requires finding terminal tabs or reading session JSONL files manually. - Bash commands hang forever — the pi bash tool accepts an optional
timeoutparameter, but the LLM doesn’t always specify one. In a headless daemon with no human watching, a single hung command blocks all subsequent messages.
OpenClaw Reference
OpenClaw solves this with a clean client-server split (credit: openclaw/openclaw src/tui/gateway-chat.ts):
- Gateway exposes a WebSocket server
openclaw tuiconnects as aGatewayChatClient- Protocol supports:
chat.send,chat.abort,chat.history,sessions.list,sessions.patch,status - TUI is just another channel — the gateway remains the single session owner
Decision
1. WebSocket Server in Gateway Daemon
Add a WebSocket server to the gateway daemon that exposes the pi session for TUI attachment. The server:
- Listens on a configurable port (default:
3018, stored in/tmp/joelclaw/gateway.ws.port) - Streams session events (text deltas, tool calls, tool results, turn boundaries) to connected clients
- Accepts prompts from connected TUI clients (routed through the existing command queue)
- Exposes session state:
isStreaming,currentToolCalls,sessionId,model,uptime - Supports
abortto cancel the current generation - Single-writer: only one TUI client can send prompts at a time (but multiple can observe)
2. joelclaw tui CLI Command
New subcommand that launches a terminal UI connected to the gateway via WebSocket:
joelclaw tui # connect to local gateway
joelclaw tui --url ws://... # connect to remote gateway
joelclaw tui --observe # read-only mode (watch without sending)The TUI:
- Shows live streaming responses (text deltas as they arrive)
- Shows tool call execution (command, output, timing)
- Allows sending prompts via text input
- Shows session metadata in a status bar (model, uptime, queue depth, streaming state)
- Supports
/abortto cancel current generation - Supports
/statusto show detailed gateway health - Gracefully reconnects on disconnect
3. Default Bash Timeout Extension
Add a pi-tools extension (bash-timeout/index.ts) that intercepts tool_call events for the bash tool and injects a default timeout when the LLM doesn’t specify one:
- Default: 120 seconds (configurable via
PI_BASH_DEFAULT_TIMEOUTenv var) - Only applies when
event.input.timeoutisundefined - Mutates
event.input.timeoutdirectly (same object reference passed to tool execute) - Logs when a default timeout is applied
4. Deep Health Checks
Enhance joelclaw gateway status and the gateway-debug skill to check:
- Process existence (PID file +
kill -0) - Redis connectivity
- Session streaming state — is the agent mid-stream? For how long?
- Stuck tool calls — are there child processes of the gateway PID that have been running too long?
- Command queue depth — are messages piling up?
- Last successful response — when did the gateway last complete a turn?
- Error rate — count of “Agent is already processing” errors in gateway.err
Consequences
- Gateway daemon grows a WebSocket dependency (lightweight —
wspackage or Bun native WebSocket) - TUI is “just another channel” — same pattern as Telegram, no special session ownership
- Bash timeout extension prevents the entire class of “hung forever” bugs that caused today’s outage
- Health checks become meaningful — they detect the actual failure mode (stuck streaming) not just “is process alive”
- Future: the WS server could support multiple named sessions, remote access via Tailscale
Implementation Notes
WebSocket Protocol (Minimal)
Client → Server:
{ type: "prompt", text: "...", source: "tui" }
{ type: "abort" }
{ type: "status" }
Server → Client:
{ type: "text_delta", delta: "..." }
{ type: "tool_call", id: "...", name: "bash", input: {...} }
{ type: "tool_result", id: "...", content: [...] }
{ type: "turn_end" }
{ type: "status", data: { streaming: bool, model: "...", uptime: N, ... } }
{ type: "error", message: "..." }Gateway Session File
The gateway should use a stable, predictable session file so the TUI can also load history on connect:
~/.pi/agent/sessions/--Users-joel--/gateway-session.jsonlCurrently it creates a new session file each restart. Pin it to a stable path via SessionManager.open().
Related
- ADR-0038: Gateway daemon architecture
- OpenClaw
src/tui/gateway-chat.ts— reference implementation (credit: openclaw/openclaw) - pi SDK
createAgentSession()— headless session API - pi extensions
tool_callevent — input mutation for timeout injection