essay

CLI Design for AI Agents

clihateoasdesign

Every agent harness can run a shell command and read stdout. Pi, Claude Code, Codex — doesn’t matter. That’s the universal interface. If your tool returns something an agent can parse, the agent can use it. If it returns a pretty table with ANSI colors, the agent is flying blind.

npx skills add joelhooks/joelclaw --skill cli-design

My system runs 35 Inngest functions, an always-on gateway, video transcription, email triage, meeting analysis. The agent operates all of it through one CLI: joelclaw. Not a REST API. Not an SDK. A CLI that returns JSON.

Design CLIs for agents first, and humans get a perfectly usable tool for free — pipe through jq. Design for humans first, and agents get nothing.

Principle 1: JSON always

No plain text. No tables. No color codes. No --json flag to opt into structured output. JSON is the default and only format.

joelclaw status
{
  "ok": true,
  "command": "joelclaw status",
  "result": {
    "server": { "ok": true, "url": "http://localhost:8288" },
    "worker": { "ok": true, "functions": 35 }
  },
  "next_actions": [
    { "command": "joelclaw functions", "description": "View registered functions" },
    { "command": "joelclaw runs --count 5", "description": "Recent runs" }
  ]
}

Every command. Every time. The agent never has to guess what format it’s getting.

Principle 2: HATEOAS — tell the agent what to do next

This is the one that changes everything.

Every response includes next_actions — command templates the agent can run next. Not literal examples to copy-paste — templates with typed placeholders the agent fills in.

Standard POSIX/docopt syntax: <required> for positional args, [--flag <value>] for optional flags. When a params object is present, the command is a template. When it’s absent, the command is literal. The agent doesn’t need to know your CLI’s flag syntax — the template tells it everything.

{
  "ok": false,
  "command": "joelclaw send pipeline/video.download",
  "error": {
    "message": "Inngest server not responding",
    "code": "SERVER_UNREACHABLE"
  },
  "fix": "Start the Inngest server pod: kubectl rollout restart statefulset/inngest -n joelclaw",
  "next_actions": [
    { "command": "joelclaw status", "description": "Re-check after fix" },
    {
      "command": "kubectl get pods [--namespace <ns>]",
      "description": "Check pod status",
      "params": { "ns": { "default": "joelclaw" } }
    }
  ]
}

The next_actions are contextual — they change based on what just happened. A failed command suggests different templates than a successful one. An error includes a fix field in plain language. The agent has everything it needs to self-recover.

The params object carries metadata the agent uses to fill templates intelligently:

  • value — pre-filled from the current response context (e.g., a run ID just returned)
  • default — what happens if the agent omits this flag entirely
  • enum — valid choices (the agent picks from a closed set instead of guessing)
  • description — what this parameter means
{
  "ok": true,
  "command": "joelclaw send video/download",
  "result": { "event_id": "01KHF98SKZ7RE6HC2BH8PW2HB2", "status": "accepted" },
  "next_actions": [
    {
      "command": "joelclaw run <run-id>",
      "description": "Inspect the triggered run",
      "params": {
        "run-id": { "value": "01KHF98SKZ7RE6HC2BH8PW2HB2", "description": "Run ID (ULID)" }
      }
    },
    {
      "command": "joelclaw runs [--status <status>] [--count <count>]",
      "description": "List recent runs",
      "params": {
        "status": { "enum": ["COMPLETED", "FAILED", "RUNNING", "QUEUED", "CANCELLED"] },
        "count": { "default": 10 }
      }
    }
  ]
}

The agent sees params.run-id.value → it knows the exact ID to use. It sees params.status.enum → it picks from the list instead of hallucinating a filter name. It sees params.count.default → it can omit the flag or adjust it.

This is Roy Fielding’s HATEOAS constraint from REST, applied to CLIs. But where REST gives you links, this gives you forms — hypermedia controls with typed inputs. The application state is navigable and parameterizable from the response itself. No out-of-band knowledge required.

Principle 3: Self-documenting command tree

The root command (no arguments) returns the full command tree:

joelclaw
{
  "ok": true,
  "command": "joelclaw",
  "result": {
    "description": "joelclaw — Personal AI system CLI",
    "commands": [
      { "name": "send", "usage": "joelclaw send <event> [-d <json>] [--follow]" },
      { "name": "status", "usage": "joelclaw status" },
      { "name": "watch", "usage": "joelclaw watch [<loop-id>]" },
      { "name": "gateway stream", "usage": "joelclaw gateway stream" }
    ]
  },
  "next_actions": [...]
}

One call and the agent knows everything available. No --help parsing. No man pages. No guessing.

Principle 4: Protect context

Agents have finite context windows. A CLI that dumps 10,000 log lines into stdout just consumed half the agent’s working memory.

Rules:

  • Truncate by default — show last 30 lines, not all of them
  • When truncated, point to the full output — include a file path
  • Auto-limit lists — cap at a reasonable default, offer --count to adjust
{
  "result": {
    "showing": 30,
    "total": 4582,
    "truncated": true,
    "full_output": "/tmp/joelclaw-logs-abc123.log",
    "lines": ["...last 30 lines..."]
  },
  "next_actions": [
    {
      "command": "joelclaw logs [--lines <count>]",
      "description": "Show more",
      "params": { "count": { "default": 30, "description": "Number of lines" } }
    }
  ]
}

The temporal gap

Those four principles cover spatial queries — what’s the state right now? But my system is temporal. Events fire. Pipelines run. Loops iterate through stories. The gateway routes messages. All of that happens over time.

With request-response only, the agent is stuck polling:

joelclaw send video/download -d '{"url":"..."}'   → event sent
joelclaw runs --count 3                            → still running
joelclaw runs --count 3                            → still running
joelclaw runs --count 3                            → still running
joelclaw run 01KHF98SKZ7RE6HC2BH8PW2HB2           → completed

Five tool calls to follow one pipeline. Each one burns context. Each one has up to 15 seconds of latency if you’re polling on an interval.

My watch command tried to solve this with a polling loop inside the CLI — but it had to break the “JSON always” principle to do it, outputting formatted text because the envelope format had no streaming semantics.

Principle 5: NDJSON for the temporal dimension

NDJSON (Newline-Delimited JSON) — one JSON object per line. The same pattern docker events --format '{{json .}}' and kubectl get pods -w -o json use. Pipe-native. Grep-able. jq-friendly.

The protocol: each line has a type discriminator. The last line is always the standard HATEOAS envelope. Tools that don’t understand streaming just read the last line.

joelclaw send video/download --follow -d '{"url":"..."}'
{"type":"start","command":"joelclaw send video/download --follow","ts":"..."}
{"type":"step","name":"download","status":"started","ts":"..."}
{"type":"progress","name":"download","percent":45,"ts":"..."}
{"type":"step","name":"download","status":"completed","duration_ms":3200,"ts":"..."}
{"type":"step","name":"transcribe","status":"started","ts":"..."}
{"type":"step","name":"transcribe","status":"completed","duration_ms":45000,"ts":"..."}
{"type":"result","ok":true,"command":"...","result":{...},"next_actions":[...]}

One command. The agent sees every step as it happens. No polling. No wasted calls. And because the stream terminates with the standard envelope, the agent knows exactly what to do next.

The event types:

TypeMeaningTerminal?
startStream begunNo
stepPipeline step lifecycleNo
progressProgress updateNo
logDiagnostic messageNo
eventAn event was emitted (fan-out visibility)No
resultHATEOAS success envelopeYes
errorHATEOAS error envelopeYes

What this unlocks

send --follow — send an event and watch the pipeline run. The agent can react mid-stream. If a step fails, it can cancel, retry, or escalate without waiting for the whole thing to finish.

watch as real-time push — subscribe to Redis pub/sub for loop state changes instead of polling every 15 seconds. Story completions arrive the instant they happen.

gateway stream — tap into the gateway event bridge from any terminal. See every event flowing through the system.

logs --follow — structured tail -f. Each line is typed JSON with a level field. The agent can filter for errors without regex.

Composable pipes:

# Only step completions
joelclaw watch | jq --unbuffered 'select(.type == "step" and .status == "completed")'
 
# Only errors
joelclaw send pipeline/run --follow | jq --unbuffered 'select(.type == "error" or .status == "failed")'

The response envelope

For reference — the exact shape every command uses.

Success

{
  ok: true,
  command: string,          // the command that was run
  result: object,           // command-specific payload
  next_actions: Array<{
    command: string,        // template (POSIX syntax) or literal command
    description: string,    // what it does
    params?: Record<string, {   // presence = command is a template
      value?: string | number,  // pre-filled from context
      default?: string | number,// value if omitted
      enum?: string[],          // valid choices
      description?: string      // what this param means
    }>
  }>
}

Error

{
  ok: false,
  command: string,
  error: {
    message: string,        // what went wrong
    code: string            // machine-readable error code
  },
  fix: string,              // plain-language suggested fix
  next_actions: Array<{
    command: string,
    description: string,
    params?: Record<string, { ... }>  // same schema as success
  }>
}

Stream event

type StreamEvent =
  | { type: "start"; command: string; ts: string }
  | { type: "step"; name: string; status: "started" | "completed" | "failed"; ... }
  | { type: "progress"; name: string; percent?: number; message?: string; ts: string }
  | { type: "log"; level: "info" | "warn" | "error"; message: string; ts: string }
  | { type: "event"; name: string; data: unknown; ts: string }
  | { type: "result"; ok: true; command: string; result: unknown; next_actions: NextAction[] }
  | { type: "error"; ok: false; command: string; error: {...}; fix: string; next_actions: NextAction[] }

Implementation notes

The joelclaw CLI uses Effect CLI (@effect/cli) with Bun. The streaming infrastructure subscribes to the same Redis pub/sub channels that the gateway extension uses — pushGatewayEvent() middleware in every Inngest function is the emission point, and the CLI is just another subscriber.

Inngest function step completes
  → pushGatewayEvent() writes to Redis pub/sub
    → gateway extension receives it (session injection)
    → CLI --follow receives it (NDJSON on stdout)

No new infrastructure. The event bridge was already there. Streaming just gave the CLI a way to tap into it.

The anti-patterns

Don’tDo
Plain text outputJSON envelope
--json flagJSON is the only format
Dump unbounded outputTruncate + file pointer
Static --help textSelf-documenting root command
Error: something went wrong{ ok: false, error: {...}, fix: "..." }
Hardcoded literal next_actionsTemplates with params (<placeholder>, [--flag <value>])
Poll for temporal dataStream NDJSON
ANSI colorsJSON fields

Try it

The cli-design skill contains the full pattern reference — envelope shape, streaming protocol, naming conventions, implementation checklist. Install it and your agent has the complete playbook:

npx skills add joelhooks/joelclaw --skill cli-design --yes --global

The ADR chain is ADR-0009 (CLI identity) through ADR-0058 (streaming protocol).