pi-tools Enhancement Cycle — Commit Tool, Web Extractors, MCQ
Context and Problem Statement
After evaluating oh-my-pi (can1357/oh-my-pi) as a potential replacement for stock pi, we decided to stay on stock pi and cherry-pick patterns into pi-tools (see ADR-0059 for LSP, ADR-0060 for swarm/DAG). Three additional capabilities are worth adopting:
-
Commit tool — oh-my-pi has a 43-file agentic commit system with hunk-level staging, split commits, changelog generation, and conventional commit validation. pi-tools has nothing for commits.
-
Web scraper site-specific handlers — oh-my-pi’s
src/web/scrapers/has specialized extractors for npm, PyPI, crates.io, arxiv, NVD, GitHub, and more. pi-toolsweb-searchdoes generic extraction — developer sources deserve targeted parsing. -
MCQ improvements — oh-my-pi’s ask tool supports multi-select, per-option descriptions, and multi-part question chaining. pi-tools MCQ is single-select with flat text options.
All three should be evaluated through the Inngest lens — joelclaw’s killer feature is durable event-driven execution. Where a capability benefits from durability, scheduling, or event triggers, it should have an Inngest-backed mode alongside the interactive pi extension.
Decision
1. Commit Tool — Hybrid Interactive + Event-Driven
Interactive Mode: /commit slash command
A pi-tools slash command (or extension-registered command) that:
- Runs
git diff --cached(orgit diffif nothing staged) to collect changes - Groups changes by concern — detects unrelated changes that should be separate commits
- Proposes conventional commit messages with:
- Type/scope detection from file paths and diff content
- Filler-word and meta-phrase rejection (“various improvements”, “update files”, “fix stuff”)
- Conventional commit format validation (
type(scope): description)
- Supports hunk-level staging — when changes span multiple concerns, stage individual hunks
- Optionally proposes changelog entries for affected packages
- User confirms or edits, then commits (and optionally pushes)
oh-my-pi reference: packages/coding-agent/src/commit/ — 43 files covering agentic analysis, map-reduce for large diffs, conventional commit validation, changelog generation, and git operations.
What to port: The analysis patterns (scope detection, validation rules, filler-word detection, conventional commit format). NOT the full 43-file system — adapt the ideas to a focused pi extension.
What to skip for v1: Map-reduce for massive diffs (solve later if needed), Cursor provider integration, multi-credential model selection.
Event-Driven Mode: commit/requested Inngest event
For automated workflows where commits happen without interactive input:
// After loop story completes
await step.sendEvent("commit/requested", {
data: {
repo: "/path/to/repo",
context: "Implemented story: Add user authentication",
push: true,
changelog: true,
}
});
// Scheduled repo grooming
// cron: "0 2 * * *" — nightly at 2am
// Scan repos for uncommitted changes, auto-commit with conventional formatThe commit/requested Inngest function:
step.run("analyze")— run the same analysis logic as interactive modestep.run("validate")— ensure conventional format, no filler wordsstep.run("commit")—git commitwith generated messagestep.run("push")— optionalgit pushstep.run("notify")— gateway notification with commit summary
Shares the core analysis/validation logic with the interactive slash command. The trigger and confirmation flow differ — interactive asks the user, event-driven commits autonomously.
Use cases:
- Post-loop auto-commit: loop
complete.tsemitscommit/requestedafter story passes - Scheduled grooming: nightly cron scans repos with dirty working trees
- Webhook-triggered: GitHub PR event → analyze changes → suggest commit message improvements
- Swarm agent output: swarm agents commit their work at the end of each task
2. Web Extractors — Shared Library for pi-tools + Inngest Pipelines
The Problem
pi-tools web-search fetches URLs and extracts content generically. But developer-facing sources have structured data that generic extraction misses:
- npm: package name, version, dependencies, weekly downloads, README
- PyPI: package metadata, classifiers, requirements
- GitHub: repo stats, file tree, README, recent commits
- crates.io: crate metadata, features, dependencies
- arxiv: title, authors, abstract, PDF link, categories
- NVD/CVE: vulnerability ID, severity, affected versions, references
- MDN: API signatures, browser compat tables, examples
- Stack Overflow: question, accepted answer, vote counts
The Solution
A shared extractor library (packages/system-bus/src/extractors/ or pi-tools/extractors/) with site-specific handlers:
interface SiteExtractor {
/** URL patterns this extractor handles */
patterns: RegExp[];
/** Extract structured content from the page */
extract(url: string, html: string): ExtractedContent;
}
interface ExtractedContent {
title: string;
type: string; // "npm-package" | "github-repo" | "arxiv-paper" | etc.
structured: Record<string, unknown>; // site-specific structured data
markdown: string; // clean markdown for LLM consumption
}Used in two contexts:
- pi-tools
web-searchextension (synchronous, in-session) — when the agent searches and a result URL matches a known pattern, use the site-specific extractor for richer output - Inngest pipelines (durable, event-driven) — enrichment steps in existing and future pipelines:
discovery/noted→ fetch URL → site-specific extraction → vault storage with structured metadataresearch/requested→ fan-out search → per-result extraction → synthesismeeting/noted→ extract URLs mentioned in transcript → enrich each- Future swarm agents doing web research
oh-my-pi reference: packages/coding-agent/src/web/scrapers/ — site-specific handlers for package registries, code hosts, research sources, forums, docs, security databases.
What to port: The URL pattern matching and extraction logic for the highest-value sources (npm, GitHub, PyPI, arxiv, MDN). NOT the full multi-provider search system (we have our own via pi-tools web-search + Brave API).
Inngest Integration
New utility functions usable from any Inngest step:
// In any Inngest function
const content = await step.run("extract-url", async () => {
const html = await fetch(url).then(r => r.text());
return extractWithSiteHandler(url, html);
// Returns structured ExtractedContent if handler matches,
// falls back to generic defuddle extraction
});The extractors are a shared library, not an Inngest function themselves. They’re called from within steps in existing pipelines (discovery, meeting analysis, future research pipelines).
3. MCQ Improvements — Pi Extension Only
No Inngest angle — MCQ is inherently interactive (user presses 1-4 in a terminal session).
Improvements from oh-my-pi’s ask tool:
Multi-Select
Current MCQ: single-select only (pick one option). Add: multi-select mode where multiple options can be chosen.
// New parameter
multiSelect?: boolean; // default: false
// When true, user can toggle multiple options before confirming
// Return value changes from single string to string[]Per-Option Descriptions
Current MCQ: flat text options. Add: optional description per option shown as dimmed secondary text.
// Options can be string OR { label, description }
options: [
{ label: "PostgreSQL", description: "Best for relational data, ACID compliance" },
{ label: "Redis", description: "In-memory, fastest for key-value + pub/sub" },
{ label: "SQLite", description: "Zero-config, embedded, good for single-node" },
]Adaptive Multi-Part Flow
Current MCQ: batch questions, answers come back all at once. Improvement: the MCQ tool description already says “prefer 1-2 questions at a time” — reinforce this in the extension by making follow-up calls feel seamless (no visible re-render jank between question batches).
oh-my-pi reference: packages/coding-agent/src/tools/ask.ts — structured ask tool with typed options and multi-select.
Consequences
Positive
- Conventional commits become automatic — both interactive and event-driven
- Web search gets dramatically smarter for developer sources
- MCQ becomes more expressive for complex decision gathering
- Inngest integration means commit analysis and web extraction are reusable across all pipelines
- Post-loop auto-commits eliminate the manual “commit the loop’s work” step
- Site-specific extractors improve vault quality for discovery/noted pipeline
Negative
- Commit tool adds complexity — must handle edge cases (merge commits, empty diffs, binary files)
- Site-specific extractors are maintenance burden — sites change their HTML structure
- MCQ multi-select changes the tool’s return type — models need to handle array responses
Follow-up Tasks
Commit Tool:
- Create
pi-tools/commit/extension with/commitcommand - Implement diff analysis (scope detection, concern grouping)
- Implement conventional commit validation (format, filler-word rejection)
- Implement hunk-level staging for split commits
- Create
commit/requestedInngest function in system-bus - Add
commit/requestedevent to Inngest client schema - Wire post-loop auto-commit in
complete.ts - Optional: changelog generation for monorepo packages
Web Extractors:
- Create shared extractor library (location TBD: pi-tools or system-bus)
- Implement extractors for: npm, GitHub, PyPI, arxiv, MDN (v1 set)
- Integrate into pi-tools
web-searchextension (synchronous path) - Integrate into
discovery/notedInngest pipeline (enrichment step) - Add extractor for crates.io, NVD, Stack Overflow (v2)
MCQ:
- Add multi-select mode to MCQ extension
- Add per-option descriptions rendering
- Test multi-select with Claude’s tool use (verify model handles array return)
Implementation Plan
Affected Paths
Commit tool:
pi-tools/commit/index.ts— new extension +/commitslash commandpi-tools/commit/analysis.ts— diff analysis, scope detection, concern groupingpi-tools/commit/validation.ts— conventional commit validation, filler rejectionpi-tools/commit/git.ts— git operations (diff, stage hunks, commit, push)packages/system-bus/src/inngest/functions/commit-requested.ts— new Inngest functionpackages/system-bus/src/inngest/events.ts— addcommit/requestedeventpackages/system-bus/src/inngest/functions/complete.ts— wire auto-commit post-loop
Web extractors:
packages/system-bus/src/extractors/— new shared libraryindex.ts— router (URL pattern → extractor)npm.ts,github.ts,pypi.ts,arxiv.ts,mdn.ts— site handlerstypes.ts— ExtractedContent interface
pi-tools/web-search/web-search.ts— integrate extractor calls for known URLspackages/system-bus/src/inngest/functions/discovery-noted.ts— add extraction step
MCQ:
pi-tools/mcq/index.ts— add multi-select mode + per-option descriptions
Patterns to Follow
- Commit validation rules from oh-my-pi
packages/coding-agent/src/commit/analysis/— adapt patterns, don’t copy code - Web extractors pattern-match URLs then parse HTML — use defuddle as fallback for unmatched URLs
- MCQ multi-select follows oh-my-pi ask tool’s
multiSelectparameter pattern - Inngest functions follow existing joelclaw conventions (step naming, Redis state, gateway notifications)
- Shared libraries between pi-tools and system-bus: put in system-bus if Inngest functions need them, import from pi-tools via shared package or duplicated module
What to Avoid
- Don’t build the full 43-file commit system from oh-my-pi — extract the patterns, build lean
- Don’t scrape sites that have APIs — use npm registry API, GitHub API, PyPI JSON API where available
- Don’t change MCQ’s default behavior — multi-select is opt-in, single-select remains default
- Don’t make web extractors block the search response — extract in background, return generic result immediately if extraction is slow
Verification
Commit tool:
-
/commitwith staged changes → proposes conventional commit message with correct type/scope -
/commitwith mixed concerns → suggests split into multiple commits - Filler message “update stuff” → rejected with explanation
-
commit/requestedInngest event → auto-commits with conventional format, pushes, notifies gateway - Post-loop story completion → auto-commit fires
Web extractors:
- Search for “express npm” → result includes package version, weekly downloads, deps
- Search for “CVE-2024-1234” → result includes severity, affected versions
-
discovery/notedwith npm URL → vault note has structured package metadata - Unknown URL → falls back to generic defuddle extraction (no error)
MCQ:
-
multiSelect: true→ user can toggle multiple options, returns array - Options with descriptions → descriptions render as dimmed secondary text
- Default (no multiSelect) → behavior unchanged from current
More Information
Reference Implementation
oh-my-pi (can1357/oh-my-pi):
packages/coding-agent/src/commit/— 43-file agentic commit systempackages/coding-agent/src/commit/analysis/conventional.ts— conventional commit validationpackages/coding-agent/src/commit/analysis/validation.ts— filler-word detectionpackages/coding-agent/src/web/scrapers/— site-specific extractorspackages/coding-agent/src/tools/ask.ts— structured ask tool with multi-select
Credit: Can Boluk (@can1357) for the patterns. MIT licensed.
Inngest Integration Summary
| Capability | Interactive (pi extension) | Event-driven (Inngest) |
|---|---|---|
| Commit tool | /commit slash command | commit/requested function — post-loop, scheduled, webhook-triggered |
| Web extractors | Inline in web-search tool | Shared library called from discovery, meeting, research pipelines |
| MCQ | In-session tool only | N/A — inherently interactive |
Related ADRs
- ADR-0059 — Multi-Language LSP Extension (same enhancement cycle, different capability)
- ADR-0060 — Inngest-Backed Swarm/DAG (same enhancement cycle, Inngest-first)
- ADR-0005 — Durable multi-agent coding loops (Inngest foundation)
- ADR-0053 — Event-emitter prompts and Agency triage (commit auto-trigger pattern)