pi-mono Artifacts Corpus via Restate + Typesense
Status
Accepted
Context and Problem Statement
We keep learning the same upstream lesson with badlogic/pi-mono: maintainer voice, review heuristics, issue-first contributor gate, common rejection patterns, release velocity, and package-boundary expectations.
Right now that knowledge is fragmented across:
- local repo clones
- GitHub issue threads
- PR review comments
- commit history
- one-off agent research notes
- human memory
That is brittle. It means every future upstream contribution risks redoing the same archaeology and sending low-signal issue/PR text back at Mario.
The useful corpus is not just prose docs. It is the full repo surface:
- root docs and package READMEs
- issue templates and GitHub workflow gates
- issues and issue comments
- pull requests and review comments
- commits and releases
- materialized maintainer guidance distilled from those artifacts
This is an appropriate Restate workload because the sync problem is:
- paginated
- idempotent
- resumable
- rate-limit sensitive
- artifact-shaped (issues, comments, PRs, commits, releases)
- worth re-running incrementally over time
Decision
Create a dedicated Typesense collection named pi_mono_artifacts and populate it via a Restate DAG pipeline backed by the existing host-side direct task runner pattern.
Collection contract
One denormalized collection holds:
repo_docissueissue_commentpull_requestpull_request_review_commentcommitreleasemaintainer_profilesync_state
Each document carries enough structure for retrieval and filtering:
repokindtitlecontentauthorauthor_rolemaintainer_signalpackage_scopeslabelsdecision_tagspathshatagthread_keycreated_atupdated_at
Runtime shape
Use the existing Restate DAG runtime, not a parallel ingestion subsystem.
Implementation path:
-
packages/restate/src/pi-mono-artifacts.ts- normalize GitHub + local repo artifacts
- ensure/create
pi_mono_artifacts - bulk-upsert documents into Typesense
- materialize maintainer profile docs
- write a
sync_statecheckpoint for later incremental runs
-
scripts/restate/run-tier1-task.ts- add task
pi-mono-artifacts-sync - reuse the host-runner pattern already used by ADR-0216 tier-1 Restate jobs
- add task
-
packages/restate/src/pipelines.ts- add
buildPiMonoArtifactsSyncPipeline() - first node: real sync shell task
- second node: infer-based operator summary for readable run output
- add
-
packages/restate/src/trigger-dag.ts- expose pipeline
pi-mono-sync - support
--repo,--full-backfill,--max-pages,--per-page,--local-clone
- expose pipeline
-
packages/cli/src/commands/restate.ts- add
joelclaw restate pi-mono-sync
- add
-
packages/cli/src/commands/search.ts- add
pi_mono_artifactsas a first-class searchable collection
- add
Scheduling policy
Do not add a Dkron cron by default yet.
This corpus is useful immediately as a manual/operator-triggered sync and backfill. After it proves useful, a scheduled incremental refresh can be added as a separate ADR update or follow-up task.
Public surface and extension split
Keep the corpus, sync runtime, and public search surface in joelclaw.
Why:
- joelclaw already owns Typesense, Upstash rate limiting, and joelclaw.com API discovery
- the public operator surface belongs on
joelclaw.com - the contributor-facing extension should evolve separately from the backend corpus/indexing runtime
Therefore:
joelclawownspi_mono_artifacts,joelclaw restate pi-mono-sync,joelclaw search --collection pi_mono_artifacts, and the public API discovery/search surface onjoelclaw.com- the public extension/installer repo lives at
joelhooks/contributing-to-pi-mono - the public API should include current install instructions for both the public skill and the public extension
Why this shape
One denormalized collection first
We care about retrieval and operator leverage more than perfect normalization.
The real operator questions are:
- show me examples where Mario rejected config bloat
- find comments that say a feature belongs in an extension
- what does “Breaks TUI” usually mean in practice
- compare an accepted proposal with a rejected one
- what package boundaries show up in review comments for a given PR
One collection with strong facets answers those now.
Host-runner pattern over a second executor
ADR-0216 already proved the Restate shell-node → host direct task runner pattern. Reusing that is the right first move because:
- it keeps the runtime consistent
- it avoids inventing a second ingestion stack
- it gives us real durable orchestration today
- it keeps Typesense writes and GitHub auth on the operator host where they already work
Consequences
Positive
- pi-mono contribution research becomes durable system knowledge, not session vapor
- future issue/PR drafting can search real maintainer patterns instead of guessing
- maintainer profile and sync checkpoint docs are materialized automatically
- Restate gets a research/indexing workload that actually fits its durability model
- CLI-first search remains intact because
joelclaw searchcan query the new collection
Negative
- another corpus to keep fresh
- GitHub API pagination and rate limits must be handled honestly
- initial implementation still relies on host-side auth/tooling rather than a fully isolated runner
Risks
- public unauthenticated GitHub requests will be rate-limited; use token/env/
gh auth tokenfallback - Typesense collection growth could get noisy if we ingest too much low-value repo documentation; keep the doc set curated
- maintainer-profile materialization can drift into bullshit if it becomes too generative; keep it evidence-backed and heuristic-heavy for now
Required Skills Preflight
Load before implementing or extending this ADR:
system-architecture— understand the Restate/CLI/Typesense/operator wiringtypesense— collection schema, import, and search behavioradr-skill— keep the ADR executable and updated
Current gap:
- there is no canonical Restate skill yet. Until one exists, implementers must read
packages/restate/README.md, ADR-0207, and ADR-0216 directly.
Implementation Plan
Code
- add
packages/restate/src/pi-mono-artifacts.ts - extend
scripts/restate/run-tier1-task.ts - extend
packages/restate/src/pipelines.ts - extend
packages/restate/src/trigger-dag.ts - extend
packages/cli/src/commands/restate.ts - extend
packages/cli/src/commands/search.ts - add targeted tests for normalization/tagging heuristics
Documentation
- update
packages/restate/README.md - update
docs/cli.md - update
docs/architecture.md - update
docs/inngest-functions.md - update ADR index
~/Vault/docs/decisions/README.md
Backfill
- run a real sync against
badlogic/pi-mono - verify documents landed in
pi_mono_artifacts - verify
maintainer_profileandsync_statewere written - verify
joelclaw search --collection pi_mono_artifactsworks
Implementation Progress (2026-03-07)
Phase 1 shipped in the repo:
packages/restate/src/pi-mono-artifacts.ts- collection schema
- GitHub + local-doc normalization
- heuristic
decision_tags maintainer_profilematerializationsync_statecheckpoint writes
scripts/restate/run-tier1-task.ts- added
pi-mono-artifacts-sync
- added
packages/restate/src/pipelines.ts- added
buildPiMonoArtifactsSyncPipeline()
- added
packages/restate/src/trigger-dag.ts- added pipeline trigger surface for
pi-mono-sync
- added pipeline trigger surface for
packages/cli/src/commands/restate.ts- added
joelclaw restate pi-mono-sync
- added
packages/cli/src/commands/search.ts- added
pi_mono_artifactscollection support
- added
Verification
-
bun test packages/restate/src/pi-mono-artifacts.test.ts -
bunx tsc --noEmit -
pnpm biome check packages/ apps/(repo-wide pre-existing failures; use targeted checks on touched files) -
joelclaw restate pi-mono-sync --repo badlogic/pi-mono --sync -
joelclaw search "badlogic" --collection pi_mono_artifacts -
joelclaw search "Breaks TUI" --collection pi_mono_artifacts -
joelclaw search "maintainer profile" --collection pi_mono_artifacts -
curl -sS "https://joelclaw.com/api/search?q=which+provider%2Fmodel+triggered+this&collection=pi_mono_artifacts"(route implemented locally; deploy verification still required after push) -
curl -sS "https://joelclaw.com/api/pi-mono"(route implemented locally; deploy verification still required after push)
Follow-up
If the corpus proves useful, add a scheduled incremental refresh via Dkron as a separate step. Do not conflate “collection exists” with “cron should exist.”