ADR-0186proposed

Persisted Q&A Context Resources and Human-Gated Improvement Loop

  • Status: proposed
  • Date: 2026-03-01
  • Deciders: Joel, Panda
  • Relates to: ADR-0106, ADR-0123, ADR-0168, ADR-0174, ADR-0183

Context

The ADR ranking rubric exists (ADR-0183), but daily review is still mostly manual.

Current gaps:

  1. MCQ/Q&A exchanges are useful context, but not persisted as first-class resources across channels.
  2. ADRs missing rubric compliance are not automatically enriched with targeted questions.
  3. New ADRs do not trigger immediate rerank/evidence refresh against the rubric.
  4. There is no unified operator AM review queue that ties unanswered questions to ADR decisions.
  5. Approvals are scattered in chat logs instead of explicit decision records linked to ADRs.
  6. Web UI exists and should be first venue, but mobile steering/dashboard workflows are becoming high-value (including future VOIP control loops).

Goal: build a durable, channel-agnostic Q&A decision substrate that improves ADR quality daily while keeping execution gated by explicit human approval.

Decision

1) Q&A becomes a core joelclaw resource type

Persist every meaningful question/answer exchange as canonical resources in Convex, then project to Typesense.

Core entities:

  • Question (mcq, approval, clarification, risk, policy)
  • Answer (selected option + freeform rationale + provenance)
  • Decision Link (binds Q&A to ADR/workload/event)

Required linkage fields:

  • subjectType (adr, story, incident, task, run)
  • subjectId (e.g. 0183, run-id, task-id)
  • channel (web, telegram, slack, discord, imessage, mobile)
  • sessionId (if applicable)
  • traceId / correlation IDs when available

2) Daily ADR review becomes a durable Inngest workflow

Add a daily function that:

  1. ranks open ADRs (proposed, accepted) using ADR-0183 rubric,
  2. identifies non-compliance/drift/missing rationale,
  3. generates targeted enrichment questions,
  4. queues AM operator review items,
  5. refreshes unanswered items with fresh context snapshots,
  6. writes all outputs as persisted Q&A resources.

Unanswered items are not dropped; they are re-evaluated and re-queued with updated evidence.

3) ADR lifecycle events trigger immediate rerank

A second workflow runs on ADR lifecycle changes:

  • adr.created
  • adr.updated
  • adr.status.changed
  • question.answered (when answer affects rubric axes or gates)

Behavior:

  1. re-score the affected ADR immediately,
  2. persist an assessment snapshot with trigger metadata,
  3. enqueue operator review only when score/band/gate state changes.

The daily run remains the global backstop; event rerank handles fresh decisions in near real time.

4) Persist rubric reasoning as first-class evidence

Every scoring pass writes a durable RubricAssessment record.

Required fields:

  • adrId, assessedAt, assessedBy, trigger, traceId
  • axes: need, readiness, confidence, novelty, agentReady
  • derived: score, band, drift, compliance
  • gates: autoEligible, approvalState, freshnessState, gateFailures
  • reasoning: summary, assumptions, counterarguments, risks
  • evidence refs: linked incidents/runs/commits/questions/notes used for scoring

ADR frontmatter remains the compact execution summary (priority-* fields). Rich reasoning stays in Convex and is indexed in Typesense.

5) Human approval remains a hard execution gate

System-improvement work stays gated by explicit approval decisions.

Auto-eligibility predicate:

priority-band == do-now
AND priority-confidence >= 4
AND priority-agent-ready >= 4
AND approval-state == approved
AND freshness-state == fresh

priority-agent-ready is gate-only (not part of the ranking score).

6) Channel persistence is mandatory

All MCQ/Q&A flows across all channels write to the same canonical store.

Implication:

  • A decision made in Telegram is visible in web/mobile review queues.
  • A web decision can unblock queued work in system-bus.
  • Channel is a delivery adapter, not a data silo.

7) Web first, mobile explicit Phase 2 adapter

  • Phase 1 venue: authenticated web UI for outstanding questions, approvals, and AM review queue.
  • Phase 2 venue: native mobile adapter (same backend/contracts) for MCQ/dashboard/review.
  • VOIP steering: Phase 3, only after MCQ approval loops are stable.

8) Treat MCQ as a survey/quiz-class primitive

Adopt persisted Q&A as a first-class product primitive in joelclaw (conceptually aligned with survey/quiz structures used in course-builder/ai-hero), but with joelclaw-specific governance fields:

  • approval category
  • ADR/workload linkage
  • operator/audit provenance
  • automation gate state

9) Curated steering-input pack becomes mandatory

Autonomous improvement proposals must be grounded in curated steering inputs, not single-source vibes.

Minimum steering categories:

  1. Outcome signals — incidents, failures, regressions, successful recoveries.
  2. Execution signals — loop throughput, retries, skipped work, test stability.
  3. Human signals — approvals, rejections, overrides, comments.
  4. Knowledge signals — ADR supersessions, discoveries, dependency shifts.
  5. Risk signals — blast-radius estimates, reversibility, rollback quality.

Each steering input must include: provenance, freshness timestamp, confidence weight, and conflict markers.

10) Freshness SLA contract is a hard autonomy gate

Steering inputs age out. When evidence is stale, autonomy stops.

Default freshness windows are 7/7/14/30 days:

CategoryMax age
Outcome signals7 days
Execution signals7 days
Risk signals7 days
Human signals14 days
Knowledge signals30 days

Contract:

  • every SteeringSignal stores capturedAt and computed age at assessment time,
  • every rerank computes freshnessState (fresh | stale) plus per-category freshness failures,
  • if any required category is stale or missing, freshness gate fails.

Failure behavior:

  • set autoEligible = false,
  • append deterministic gate failures (for example: freshness.outcome.stale, freshness.human.missing),
  • enqueue targeted refresh questions/tasks for stale categories,
  • continue scoring/banding, but scheduler must refuse autonomous execution until freshness is restored.

Monitoring + calibration:

  • freshness windows are configurable per environment and persisted with each RubricAssessment,
  • emit OTEL events for freshness failures and blocked-execution reasons,
  • calibrate thresholds from observed false-positive/false-negative blocks.

Consequences

Good

  • Decisions become queryable, replayable, and auditable.
  • Daily ADR review can continuously improve data quality.
  • Human gate is explicit and machine-checkable.
  • Web and mobile share one control-plane model.
  • Typesense retrieval can include both ADR text and decision history.

Tradeoffs

  • More schema + indexing complexity.
  • Requires strict event correlation discipline.
  • Poor question design can create operator fatigue.

Risks

  • Over-questioning can stall momentum.
  • Approval gate could become bottleneck if queue quality is bad.

Mitigation: keep MCQ packs short, dedupe similar questions, and enforce freshness-driven batching.

Implementation Plan (vector clock)

  1. Contract + schema

    • Add canonical Q&A/decision schema in Convex.
    • Add RubricAssessment schema for score + reasoning snapshots (including freshness-state + gate failures).
    • Add SteeringSignal schema for curated input records with signal category + freshness timestamps.
    • Persist active freshness-SLA values with each assessment snapshot for auditability.
  2. Event contracts

    • Define question.asked, question.answered, decision.recorded, decision.applied.
    • Add adr.created, adr.updated, adr.status.changed, adr.rubric.reranked.
  3. Rerank workflow (event-triggered)

    • New Inngest function reranks affected ADRs on lifecycle events.
    • Persist trigger metadata and reasoning snapshot every run.
    • Emit operator-review items only on meaningful state changes.
  4. Daily review workflow (batch backstop)

    • New Inngest function for full open-ADR ranking + enrichment + AM queue generation.
    • Refresh unanswered items with current context before re-queueing.
  5. Web review surface (Phase 1)

    • Authenticated dashboard for outstanding/unanswered questions.
    • Approval actions that emit durable decision events.
  6. Cross-channel capture

    • Ensure all channel MCQ interactions persist through shared contracts.
    • Link every answer to ADR/workload targets.
  7. Typesense projection

    • Index Q&A/decision/rubric reasoning artifacts with ADR linkage for semantic retrieval.
  8. Workload scheduler integration

    • Only schedule autonomous improvement workloads when gate predicate is satisfied.
    • Enforce per-category freshness SLA; stale or missing required categories hard-block autonomous execution.
  9. Mobile adapter (Phase 2)

    • Native app surface for queue review and MCQ decisions on same backend contracts.
  10. VOIP steering (Phase 3)

  • Add voice control/review loops after dashboard + MCQ flow proves stable.

Verification

  • Every MCQ asked in any channel creates a canonical Question record.
  • Every answer creates a canonical Answer record linked to ADR/workload.
  • New ADR create/update/status-change events trigger rerank for affected ADRs.
  • Every rerank writes a RubricAssessment record with trigger + reasoning + evidence refs.
  • Daily ADR review run produces enrichment queue with deterministic IDs.
  • Unanswered queue items are refreshed (not dropped) with new context.
  • auto_eligible is false unless approval + confidence + agent-ready + freshness gates pass.
  • Freshness failures are persisted with deterministic gate-failure codes and surfaced in review queues.
  • Scheduler refuses autonomous improvement work when required steering-signal freshness SLAs fail.
  • Web review UI can approve/reject and writes durable decision events.
  • Typesense queries can retrieve ADRs with linked Q&A/decision/rubric reasoning context.
  • Mobile adapter can consume and answer the same queue without schema forks.

Out of Scope (for this ADR)

  • Final UI aesthetics and component-level design choices.
  • Provider-specific VOIP transport implementation details.
  • Replacing existing ADR ranking policy (this ADR extends ADR-0183; it does not replace it).