Persisted Q&A Context Resources and Human-Gated Improvement Loop
- Status: proposed
- Date: 2026-03-01
- Deciders: Joel, Panda
- Relates to: ADR-0106, ADR-0123, ADR-0168, ADR-0174, ADR-0183
Context
The ADR ranking rubric exists (ADR-0183), but daily review is still mostly manual.
Current gaps:
- MCQ/Q&A exchanges are useful context, but not persisted as first-class resources across channels.
- ADRs missing rubric compliance are not automatically enriched with targeted questions.
- New ADRs do not trigger immediate rerank/evidence refresh against the rubric.
- There is no unified operator AM review queue that ties unanswered questions to ADR decisions.
- Approvals are scattered in chat logs instead of explicit decision records linked to ADRs.
- Web UI exists and should be first venue, but mobile steering/dashboard workflows are becoming high-value (including future VOIP control loops).
Goal: build a durable, channel-agnostic Q&A decision substrate that improves ADR quality daily while keeping execution gated by explicit human approval.
Decision
1) Q&A becomes a core joelclaw resource type
Persist every meaningful question/answer exchange as canonical resources in Convex, then project to Typesense.
Core entities:
- Question (
mcq,approval,clarification,risk,policy) - Answer (selected option + freeform rationale + provenance)
- Decision Link (binds Q&A to ADR/workload/event)
Required linkage fields:
subjectType(adr,story,incident,task,run)subjectId(e.g.0183, run-id, task-id)channel(web,telegram,slack,discord,imessage,mobile)sessionId(if applicable)traceId/ correlation IDs when available
2) Daily ADR review becomes a durable Inngest workflow
Add a daily function that:
- ranks open ADRs (
proposed,accepted) using ADR-0183 rubric, - identifies non-compliance/drift/missing rationale,
- generates targeted enrichment questions,
- queues AM operator review items,
- refreshes unanswered items with fresh context snapshots,
- writes all outputs as persisted Q&A resources.
Unanswered items are not dropped; they are re-evaluated and re-queued with updated evidence.
3) ADR lifecycle events trigger immediate rerank
A second workflow runs on ADR lifecycle changes:
adr.createdadr.updatedadr.status.changedquestion.answered(when answer affects rubric axes or gates)
Behavior:
- re-score the affected ADR immediately,
- persist an assessment snapshot with trigger metadata,
- enqueue operator review only when score/band/gate state changes.
The daily run remains the global backstop; event rerank handles fresh decisions in near real time.
4) Persist rubric reasoning as first-class evidence
Every scoring pass writes a durable RubricAssessment record.
Required fields:
adrId,assessedAt,assessedBy,trigger,traceId- axes:
need,readiness,confidence,novelty,agentReady - derived:
score,band,drift,compliance - gates:
autoEligible,approvalState,freshnessState,gateFailures - reasoning:
summary,assumptions,counterarguments,risks - evidence refs: linked incidents/runs/commits/questions/notes used for scoring
ADR frontmatter remains the compact execution summary (priority-* fields). Rich reasoning stays in Convex and is indexed in Typesense.
5) Human approval remains a hard execution gate
System-improvement work stays gated by explicit approval decisions.
Auto-eligibility predicate:
priority-band == do-now
AND priority-confidence >= 4
AND priority-agent-ready >= 4
AND approval-state == approved
AND freshness-state == freshpriority-agent-ready is gate-only (not part of the ranking score).
6) Channel persistence is mandatory
All MCQ/Q&A flows across all channels write to the same canonical store.
Implication:
- A decision made in Telegram is visible in web/mobile review queues.
- A web decision can unblock queued work in system-bus.
- Channel is a delivery adapter, not a data silo.
7) Web first, mobile explicit Phase 2 adapter
- Phase 1 venue: authenticated web UI for outstanding questions, approvals, and AM review queue.
- Phase 2 venue: native mobile adapter (same backend/contracts) for MCQ/dashboard/review.
- VOIP steering: Phase 3, only after MCQ approval loops are stable.
8) Treat MCQ as a survey/quiz-class primitive
Adopt persisted Q&A as a first-class product primitive in joelclaw (conceptually aligned with survey/quiz structures used in course-builder/ai-hero), but with joelclaw-specific governance fields:
- approval category
- ADR/workload linkage
- operator/audit provenance
- automation gate state
9) Curated steering-input pack becomes mandatory
Autonomous improvement proposals must be grounded in curated steering inputs, not single-source vibes.
Minimum steering categories:
- Outcome signals — incidents, failures, regressions, successful recoveries.
- Execution signals — loop throughput, retries, skipped work, test stability.
- Human signals — approvals, rejections, overrides, comments.
- Knowledge signals — ADR supersessions, discoveries, dependency shifts.
- Risk signals — blast-radius estimates, reversibility, rollback quality.
Each steering input must include: provenance, freshness timestamp, confidence weight, and conflict markers.
10) Freshness SLA contract is a hard autonomy gate
Steering inputs age out. When evidence is stale, autonomy stops.
Default freshness windows are 7/7/14/30 days:
| Category | Max age |
|---|---|
| Outcome signals | 7 days |
| Execution signals | 7 days |
| Risk signals | 7 days |
| Human signals | 14 days |
| Knowledge signals | 30 days |
Contract:
- every
SteeringSignalstorescapturedAtand computed age at assessment time, - every rerank computes
freshnessState(fresh|stale) plus per-category freshness failures, - if any required category is stale or missing, freshness gate fails.
Failure behavior:
- set
autoEligible = false, - append deterministic gate failures (for example:
freshness.outcome.stale,freshness.human.missing), - enqueue targeted refresh questions/tasks for stale categories,
- continue scoring/banding, but scheduler must refuse autonomous execution until freshness is restored.
Monitoring + calibration:
- freshness windows are configurable per environment and persisted with each
RubricAssessment, - emit OTEL events for freshness failures and blocked-execution reasons,
- calibrate thresholds from observed false-positive/false-negative blocks.
Consequences
Good
- Decisions become queryable, replayable, and auditable.
- Daily ADR review can continuously improve data quality.
- Human gate is explicit and machine-checkable.
- Web and mobile share one control-plane model.
- Typesense retrieval can include both ADR text and decision history.
Tradeoffs
- More schema + indexing complexity.
- Requires strict event correlation discipline.
- Poor question design can create operator fatigue.
Risks
- Over-questioning can stall momentum.
- Approval gate could become bottleneck if queue quality is bad.
Mitigation: keep MCQ packs short, dedupe similar questions, and enforce freshness-driven batching.
Implementation Plan (vector clock)
-
Contract + schema
- Add canonical Q&A/decision schema in Convex.
- Add
RubricAssessmentschema for score + reasoning snapshots (including freshness-state + gate failures). - Add
SteeringSignalschema for curated input records with signal category + freshness timestamps. - Persist active freshness-SLA values with each assessment snapshot for auditability.
-
Event contracts
- Define
question.asked,question.answered,decision.recorded,decision.applied. - Add
adr.created,adr.updated,adr.status.changed,adr.rubric.reranked.
- Define
-
Rerank workflow (event-triggered)
- New Inngest function reranks affected ADRs on lifecycle events.
- Persist trigger metadata and reasoning snapshot every run.
- Emit operator-review items only on meaningful state changes.
-
Daily review workflow (batch backstop)
- New Inngest function for full open-ADR ranking + enrichment + AM queue generation.
- Refresh unanswered items with current context before re-queueing.
-
Web review surface (Phase 1)
- Authenticated dashboard for outstanding/unanswered questions.
- Approval actions that emit durable decision events.
-
Cross-channel capture
- Ensure all channel MCQ interactions persist through shared contracts.
- Link every answer to ADR/workload targets.
-
Typesense projection
- Index Q&A/decision/rubric reasoning artifacts with ADR linkage for semantic retrieval.
-
Workload scheduler integration
- Only schedule autonomous improvement workloads when gate predicate is satisfied.
- Enforce per-category freshness SLA; stale or missing required categories hard-block autonomous execution.
-
Mobile adapter (Phase 2)
- Native app surface for queue review and MCQ decisions on same backend contracts.
-
VOIP steering (Phase 3)
- Add voice control/review loops after dashboard + MCQ flow proves stable.
Verification
- Every MCQ asked in any channel creates a canonical
Questionrecord. - Every answer creates a canonical
Answerrecord linked to ADR/workload. - New ADR create/update/status-change events trigger rerank for affected ADRs.
- Every rerank writes a
RubricAssessmentrecord with trigger + reasoning + evidence refs. - Daily ADR review run produces enrichment queue with deterministic IDs.
- Unanswered queue items are refreshed (not dropped) with new context.
-
auto_eligibleis false unless approval + confidence + agent-ready + freshness gates pass. - Freshness failures are persisted with deterministic gate-failure codes and surfaced in review queues.
- Scheduler refuses autonomous improvement work when required steering-signal freshness SLAs fail.
- Web review UI can approve/reject and writes durable decision events.
- Typesense queries can retrieve ADRs with linked Q&A/decision/rubric reasoning context.
- Mobile adapter can consume and answer the same queue without schema forks.
Out of Scope (for this ADR)
- Final UI aesthetics and component-level design choices.
- Provider-specific VOIP transport implementation details.
- Replacing existing ADR ranking policy (this ADR extends ADR-0183; it does not replace it).