ADR-0180shipped

Configurable Sub-Agent Roster

Context

joelclaw delegates work to sub-agents (codex for coding, Inngest infer() for LLM calls in functions). But:

  1. Codex only supports OpenAI models — design tasks need Opus/Sonnet (taste, not speed)
  2. No roster — agent selection is ad-hoc, hardcoded per call site
  3. Role.md exists but isn’t actionable — we have role definitions but no mechanism to spawn a sub-agent with a specific role, model, and tool set

Pi’s native subagent tool (from nicobailon/pi-subagents) already provides:

  • Agent definitions (YAML frontmatter + markdown body) with discovery by scope (builtin/user/project)
  • Model + thinking level per agent, tool/extension sandboxing
  • Chain execution with {previous}, {task}, {chain_dir} template vars
  • Parallel dispatch, skill injection
  • 6 builtin agents: context-builder, planner, researcher, reviewer, scout, worker

Key insight: Pi-subagents handles agent definitions and spawning. joelclaw needs to make infer() resolve pi agent definitions and wrap execution in Inngest steps for durability. Best of both worlds, minimal new code.

Decision

Adopt pi-subagents as the definition and discovery layer, with Inngest as the execution backbone for durable agent dispatch from system-bus functions.

Gaps identified during review (addressed by this revision)

  • The original ADR did not define how chain/parallel semantics map to Inngest step return values ({previous} was underspecified).
  • Agent definition parsing/validation rules were implicit; invalid frontmatter, unknown models, and unsafe paths were not explicitly rejected.
  • infer() compatibility risk was unaddressed: agent currently maps to inference-router profiles (classifier, triage, reflector) in packages/system-bus/src/lib/inference.ts.
  • Extension sandboxing policy was not translated from pi-subagents to joelclaw’s production worker threat model.

Agent Definition Format

Markdown files in ~/.joelclaw/agents/{name}.md (user scope) and .joelclaw/agents/{name}.md (project scope):

---
name: designer
description: Frontend design with taste — UI components, layouts, visual polish
model: claude-opus-4-6
thinking: high
tools: read, bash, edit, write
skill: frontend-design, ui-animation, emilkowal-animations
---
 
You are a design-focused agent. You create distinctive, production-grade
frontend interfaces. Use the StatusPulseDot and StatusLed components from
@repo/ui/status-badge for activity indicators...

Roster Configuration

.joelclaw/config.toml:

[agents]
# Default agent for unspecified tasks
default = "coder"
 
# Route by task type
[agents.routing]
design = "designer"      # $frontend-design tagged tasks
code = "coder"            # Default coding tasks  
research = "researcher"   # Web research, repo autopsy
review = "reviewer"       # Code review, PR review

Execution Backbone: Inngest

All sub-agent execution runs through Inngest — not raw subprocess spawning. This gives durability, retries, observability, and session streaming for free.

// Single agent dispatch
const result = await step.run("designer", async () => {
  return infer(task, { agent: "designer" });
});
 
// Chain execution — each step memoized
const recon = await step.run("scout-recon", () => infer(task, { agent: "scout" }));
const plan = await step.run("planner", () => infer(recon, { agent: "planner" }));
const impl = await step.run("worker", () => infer(plan, { agent: "worker" }));
 
// Parallel execution
const [a, b] = await Promise.all([
  step.run("task-a", () => infer(taskA, { agent: "designer" })),
  step.run("task-b", () => infer(taskB, { agent: "coder" })),
]);

Why Inngest over raw subprocess:

  • Step memoization — crash mid-chain, resume from last completed step
  • Retries — model 503, timeout → automatic retry with backoff
  • Observability — every step in Inngest dashboard + OTEL
  • Concurrency control — throttle parallel agents, prevent resource contention
  • Timeoutstimeouts.finish kills runaway agent calls
  • Cancellation — cancel running chains by event
  • Session streaming — results stream back into gateway/interactive sessions via Redis event bridge

infer() extension:

// infer() resolves agent definition → pi flags
await infer("redesign this component", { agent: "designer" });
// Resolves to: pi -p --no-session --models claude-opus-4-6:high --tools read,bash,edit,write
// With agent's system prompt injected via --append-system-prompt

Inngest execution model parity (mapped from pi-subagents internals)

pi-subagents currently implements:

  • ~/.repo-autopsy/nicobailon/pi-subagents/execution.ts: runSync() derives pi flags from agent definitions (--models, tool splitting, extension policy, skill injection, MCP_DIRECT_TOOLS).
  • ~/.repo-autopsy/nicobailon/pi-subagents/chain-execution.ts + settings.ts: sequential/parallel chains with {task}, {previous}, {chain_dir} substitution and per-step behavior overrides.
  • ~/.repo-autopsy/nicobailon/pi-subagents/async-execution.ts + subagent-runner.ts: detached background runner with status.json, events.jsonl, and filesystem polling.

In joelclaw, we replace subprocess orchestration with Inngest primitives:

  1. Single agent: one step.run("agent:{name}") wrapping infer(...).
  2. Chain: one Inngest function with deterministic step IDs (chain:{index}:{agent}) for memoized replay/resume.
  3. Parallel step: parallel step.run calls with explicit concurrency and optional failFast behavior.
  4. Async/background: no detached worker needed; step.sendEvent + Inngest run state replaces FS watcher polling.
  5. Progress streaming: reuse the existing gateway.progress()/gateway.notify() pattern inside step.run blocks (same replay-safe pattern used in packages/system-bus/src/inngest/functions/story-pipeline.ts).

{previous} template semantics in Inngest chains

{previous} must map to structured step outputs, not only plain text. Define chain step returns as:

type AgentStepResult = {
  agent: string;
  text: string;
  model?: string;
  provider?: string;
  usage?: LlmUsage;
  artifacts?: Record<string, string>;
  exitCode: number;
};

Template context per step:

  • {task}: original top-level request
  • {previous}: previous step text (or aggregated parallel text)
  • {previous_json}: JSON-serialized previous AgentStepResult (or array for parallel)
  • {chain_dir}: durable artifact directory path

For parallel steps, aggregate outputs with stable headers (pi-subagents style: === Parallel Task N (agent) ===) so downstream prompts remain parseable and deterministic.

Integration Points

  1. Gateway interactive$frontend-design tag in user message dispatches agent/task.run Inngest event → result streams back via Redis
  2. Inngest functionsinfer() gains agent option that resolves from roster
  3. CLIjoelclaw agent list, joelclaw agent run <name> <prompt> (fires Inngest event, streams result)
  4. Codex delegation — unchanged for OpenAI tasks, designer agent for Anthropic tasks
  5. Session feedback — Inngest step results emit agent/task.complete events, gateway picks up via Redis subscription and streams into active session

Existing joelclaw plumbing to reuse:

  • packages/system-bus/src/inngest/middleware/gateway.ts already carries originSession and exposes gateway.progress()/notify()/alert() helpers.
  • packages/system-bus/src/inngest/functions/agent-loop/utils.ts#pushGatewayEvent() already fans events to gateway + originSession targets.
  • packages/gateway/src/channels/redis.ts already prefers originSession routing and packages/gateway/src/daemon.ts already routes responses by active source.

Event contracts to standardize:

  • agent/task.run: { taskId, agent, task, originSession?, cwd?, timeoutMs?, metadata? }
  • agent/task.progress: { taskId, step, message, originSession? } (optional for long chains)
  • agent/task.complete: { taskId, agent, status, output, usage?, artifacts?, originSession? }
  • agent/task.failed: { taskId, agent, error, retryable, attempt, originSession? }

Discovery Priority

  1. Project: .joelclaw/agents/ (highest)
  2. User: ~/.joelclaw/agents/ (medium)
  3. Builtin: joelclaw/agents/ in repo (lowest, git-tracked)

Patterns Adopted from pi-subagents

Agent definition format — YAML frontmatter + markdown body. Fields: name, description, model, thinking (off/minimal/low/medium/high/xhigh), tools, skill, extensions, output, defaultReads, defaultProgress, interactive.

Extension sandboxingextensions: field controls which pi extensions load in the sub-agent:

  • Absent → all extensions load (default)
  • Empty → --no-extensions
  • List → --no-extensions --extension a --extension b

Three execution modes — Single (one agent, one task), Chain (sequential with {previous} template var + shared {chain_dir}), Parallel (concurrent with max concurrency).

Spawn mechanismpi -p --mode json --no-session with --models, --tools, --extension, --append-system-prompt flags derived from agent definition. Captures stdout as JSONL, tracks usage/tokens/duration.

Async mode — Background execution via worker process. FSWatcher on results directory detects completion. Widget polls progress.

Skill injectionskill: field resolves skill files, injects content into system prompt before spawn.

Key Differences from pi-subagents

  • Chain execution via Inngest steps — durable, retryable, observable vs raw subprocess chains
  • Inngest-native — long-running agent tasks dispatched as Inngest steps with memoization
  • Role.md integration — agent definitions can reference roles/*.md for shared context
  • No TUI clarify step — joelclaw agents are headless; confirmation happens via gateway/CLI
  • Discovery adds repo scope — builtin agents live in joelclaw/agents/ (git-tracked, lowest priority)

Agent Definition Validation (required)

Use strict runtime validation at load-time before any dispatch:

  1. Frontmatter parsing: use a real YAML parser (not regex-only key/value parsing) so arrays/booleans are typed reliably.
    • Do not rely on permissive tool schemas alone (Type.Any usage in pi-subagents/schemas.ts); enforce strict server-side validation in joelclaw.
  2. Identity: name + description required; name must match file basename (designer.mdname: designer).
  3. Model: must resolve in @joelclaw/inference-router catalog (support bare IDs and provider/model IDs).
  4. Thinking level: enum off|minimal|low|medium|high|xhigh.
  5. Tools: each entry must be either allowed builtin tool or approved extension path.
  6. Skills: each skill in skill/skills must resolve from canonical skill loading paths; missing skills are validation errors (not warnings) in strict mode.
  7. Path safety: output, defaultReads, and chain file paths must reject traversal (..) outside configured workspace unless explicitly absolute-allowlisted.
  8. Extensions policy: absent vs empty vs explicit list semantics must be preserved:
    • absent: inherit platform default policy
    • empty list: disable extensions (--no-extensions)
    • explicit list: allowlist only
  9. Role composition: optional role: roles/<name>.md must resolve to existing repo role file before run.

infer() compatibility contract (critical)

packages/system-bus/src/lib/inference.ts currently maps agent to inference-router profiles via resolveProfile() (packages/inference-router/src/profiles.ts) and existing production callers rely on this (reflect.ts, task-triage.ts, email-cleanup.ts).

Adoption rules:

  1. Resolution order: roster agent definitionlegacy inference profile → explicit options.
  2. Preserve legacy behavior for classifier, reflector, triage until migrated.
  3. Add an explicit profile option long-term; keep agent backward compatible during migration window.
  4. runPiAttempt() currently hardcodes --no-extensions and --model; roster mode must support full flag derivation (--models, tools, extensions, appended prompt) while keeping locked-down defaults for non-roster calls.

Extension sandboxing in joelclaw

pi-subagents allows extension paths from agent definitions. In joelclaw worker context this is a security boundary, so defaults must remain deny-by-default:

  • Default execution remains no extensions for system-bus unless explicitly allowlisted.
  • Add agents.extension_allowlist in .joelclaw/config.toml; reject non-allowlisted extension paths at validation time.
  • Record effective extension set in OTEL metadata for every run.

Consequences

Positive

  • Design tasks route to Opus automatically
  • Agent selection is explicit and configurable
  • New agent types added without code changes
  • Model/tool/skill combos are named and reusable

Negative

  • Another config surface to maintain
  • Agent definitions can drift from actual capabilities
  • pi-subagents is a third-party dependency (or we steal patterns)

Risks

  • Over-engineering if we only need 2-3 agents
  • Chain execution complexity if adopted later
  • Agent/profile naming collision (agent currently means inference profile in infer())
  • Path/extension injection risk from unvalidated agent markdown
  • Parallel {previous} aggregation ambiguity can create non-deterministic downstream prompts
  • Gateway replay duplicates if progress/notify emits happen outside step.run

Resolved Questions

  1. Chain artifact directories → Durable per-session workspace with retention policy. Must survive crashes for Inngest replay.
  2. Chain topology → Full DAG support from day one. Not just linear + parallel groups.
  3. JSON output contracts → Per-agent configurable. Some agents (coder, reviewer) require strict output schemas; others (designer, researcher) are freeform text. Add outputSchema field to agent definition — when present, output is validated; when absent, plain text passthrough.
  4. Agent definition storage → Filesystem as source of truth (git-tracked), mirrored to Typesense for search + version pinning. Enables hot reload without git pull and searchable agent catalog.

Phase Plan

Phase 0+1: Loader + infer() Integration ✅ SHIPPED (2026-02-28)

  • packages/system-bus/src/lib/agent-roster.ts: loads pi-subagent-format .md files from project (.pi/agents/) and user (~/.pi/agent/agents/) scopes with module-level cache
  • infer() resolution order: roster → profile → throw (backward compatible with classifier/triage/reflector)
  • Roster agents derive full pi flags: --models MODEL:THINKING, --tools, --append-system-prompt, conditional --no-extensions
  • OTEL metadata includes agentSource (roster/profile/direct), agentName, agentDefinitionPath
  • 3 project-scoped agents committed: agents/{designer,coder,ops}.md (symlinked to .pi/agents/)
  • 6/6 unit tests: project load, user load, project-overrides-user, cache hit, missing agent, malformed frontmatter
  • Commit: a709622
  • Deferred: strict schema validation (model catalog check, skill resolution, path safety), role composition, extension allowlist. These become Phase 2 prerequisites.

Phase 2: Inngest Functions + CLI + Gateway Routing ✅ SHIPPED (2026-02-28)

  • agent/task.run, agent/task.complete, agent/task.progress event types added to Inngest client
  • agent-task-run Inngest function: validate → execute via infer() → emit complete/failed
  • Concurrency: 3 per agent type, 2 retries, 5m timeout
  • Gateway progress notification before execution, OTEL on start/complete/fail
  • originSession carried through all events (gateway middleware passthrough)
  • joelclaw agent list — discover agents from all scopes
  • joelclaw agent show <name> — display full definition + system prompt
  • joelclaw agent run <name> <task> — fire agent/task.run event, return taskId
  • HATEOAS JSON responses with next_actions throughout
  • Commits: f922842 (CLI), 5348b55 (Inngest function + events)
  • Deferred: Gateway $frontend-design tag routing (gateway pi session can already dispatch via joelclaw agent run or inngest_send)

Phase 3: Chain Execution ✅ SHIPPED (2026-02-28)

  • agent/chain.run, agent/chain.complete event types added to Inngest client
  • agent-chain-run Inngest function: sequential steps with {task}/{previous} template substitution
  • Parallel groups via Promise.allSettled with === Parallel Task N (agent) === aggregation headers
  • failFast option (default false — continue on step failure, collect partial results)
  • Concurrency: 2 chains, 1 retry, 15m timeout
  • OTEL per step + chain completion/failure; gateway progress per step (replay-safe)
  • CLI: joelclaw agent chain scout,planner+reviewer,coder --task "..." (+ = parallel, , = sequential)
  • 5 unit tests: template substitution, parallel aggregation, sequential passing, error handling
  • Commit: ab1b885
  • Deferred: output artifact validation (warning-first, fail-on-strict mode), DAG topology beyond linear+parallel

Runtime proof + recovery timeline (2026-02-28)

Attempt 1 — blocked by local runtime reachability

  • joelclaw agent list and joelclaw agent show coder succeeded.
  • joelclaw agent run ... failed while local Inngest API was unreachable (localhost:8288).
  • No reliable event→run trace could be captured in that attempt.

Attempt 2 — ingress restored, roster drift surfaced

  • Event send path recovered.
  • agent/task.run reached Agent Task Run, but failed with Unknown agent roster entry: coder.
  • This proved ingress was healthy while worker runtime resolution was stale.

Remediation applied

  • Patched roster resolution to search ancestor directories for builtin agents/ when worker CWD is nested:
    • commit a3e013a
    • file: packages/system-bus/src/lib/agent-roster.ts
    • tests: packages/system-bus/src/lib/__tests__/agent-roster.test.ts
  • Published system-bus-worker image with this fix and rolled k8s deployment.
  • Restarted host worker process (the active executor for Agent Task Run in this environment).
  • Recovered local control plane after transient outage (Colima/Talos restart + taint cleanup + pod recycle).

Final runtime proof — PASS

  • bun run packages/cli/src/cli.ts agent run coder "reply with OK" --timeout 20
    • event ID: 01KJK9JJ1C5P54ZH4F200XYWBD
  • bun run packages/cli/src/cli.ts event 01KJK9JJ1C5P54ZH4F200XYWBD
    • run ID: 01KJK9JJEX3A6NW55WQSZXKWNY
    • function: Agent Task Run
    • status: COMPLETED
    • output includes {"status":"completed", ... "text":"OK"}

Conclusion: ADR-0180 runtime contract is now validated end-to-end (list/show/run/chain/watch paths + truthful event navigation + durable execution).

Validation smoke test — ts=1772321467 ✅ DEEP PROOF

Second full end-to-end proof with production binary against live k8s worker, capturing full OTEL metadata:

Roster

  • joelclaw agent listok: true, total: 3 (coder/designer/ops, all source: builtin)
  • joelclaw agent show coder → filePath, systemPrompt, model, tools, skills all present

Dispatch

  • joelclaw agent run coder "ADR-0180 smoke test ts=1772321467 — echo the string 'SMOKE_OK' and exit"
  • Event 01KJK9MT0X00WXREWX3KZW6F2X accepted · taskId at-1772321662985-gv2oj9

Run

  • Run 01KJK9MT3H0N5AGJH8F1PYJ6Z2 · COMPLETED · 3,759 ms
  • Output: { status: "completed", text: "SMOKE_OK", model: "anthropic/claude-sonnet-4-6", provider: "anthropic" }

Step trace (7 steps, all COMPLETED) emit-started-otelvalidateagent-task-progress-executeexecute (2,404 ms) → agent-task-completeemit-completed-otelFinalization

OTEL (5 events)

  • agent.task.started — taskId, agent, originSession, cwd, timeoutMs
  • model_router.requestagentSource: "roster", agentName: "coder", agentDefinitionPath, resolvedModel
  • model_router.route — policy version, resolved model
  • model_router.result — 2,140 ms, fallbackUsed, usage
  • agent.task.completed — model, provider, durationMs

agentSource: "roster" in OTEL confirms builtin scope resolution is healthy end-to-end. Historical OTEL also shows the pre-deploy failure arc: 5 agent.task.failed events with "Unknown agent roster entry: coder" (22:50–23:22 UTC), followed by clean completions post-deploy — observable failure→fix→recovery captured in Typesense.

Phase 4: Live streaming + async UX ✅ SHIPPED (2026-02-28)

  • joelclaw agent watch <taskId|chainId> — NDJSON streaming watcher
  • Redis pub/sub subscription to joelclaw:notify:gateway for real-time progress events
  • Inngest API polling fallback when Redis is degraded or task completed before watch started
  • Auto-detects task (at-*) vs chain (ac-*) IDs, adjusts timeout (300s vs 900s)
  • Graceful degradation documented in-code: Redis down → polling only, pre-completed → immediate result
  • --timeout option, SIGINT/SIGTERM cleanup, HATEOAS next_actions in terminal events
  • Commit: 9ab8c6d

References

  • nicobailon/pi-subagents — pi extension for subagent delegation
    • execution.ts, chain-execution.ts, async-execution.ts, agents.ts, skills.ts, types.ts, schemas.ts, agents/*.md
  • packages/system-bus/src/lib/inference.ts — current infer implementation and agent profile resolution path
  • packages/inference-router/src/profiles.ts — legacy classifier/triage/reflector profiles
  • packages/system-bus/src/inngest/functions/story-pipeline.ts — replay-safe gateway signaling + contract-first stage execution
  • packages/system-bus/src/inngest/middleware/gateway.tsoriginSession routing helpers
  • packages/system-bus/src/inngest/functions/agent-loop/utils.ts (pushGatewayEvent)
  • packages/gateway/src/channels/redis.ts + packages/gateway/src/daemon.ts — source-aware response routing and Redis event bridge
  • ADR-0170: Agent Role System
  • ADR-0163: Adaptive Prompt Architecture