ADR-0186proposed

Persisted Q&A Context Resources and Human-Gated Improvement Loop

2026-03-01T00:00:00.000Z

Status: proposed
Date: 2026-03-01
Deciders: Joel, Panda
Relates to: ADR-0106, ADR-0123, ADR-0168, ADR-0174, ADR-0183

Context

The ADR ranking rubric exists (ADR-0183), but daily review is still mostly manual.

Current gaps:

MCQ/Q&A exchanges are useful context, but not persisted as first-class resources across channels.
ADRs missing rubric compliance are not automatically enriched with targeted questions.
New ADRs do not trigger immediate rerank/evidence refresh against the rubric.
There is no unified operator AM review queue that ties unanswered questions to ADR decisions.
Approvals are scattered in chat logs instead of explicit decision records linked to ADRs.
Web UI exists and should be first venue, but mobile steering/dashboard workflows are becoming high-value (including future VOIP control loops).

Goal: build a durable, channel-agnostic Q&A decision substrate that improves ADR quality daily while keeping execution gated by explicit human approval.

Decision

1) Q&A becomes a core joelclaw resource type

Persist every meaningful question/answer exchange as canonical resources in Convex, then project to Typesense.

Core entities:

Question (mcq, approval, clarification, risk, policy)
Answer (selected option + freeform rationale + provenance)
Decision Link (binds Q&A to ADR/workload/event)

Required linkage fields:

subjectType (adr, story, incident, task, run)
subjectId (e.g. 0183, run-id, task-id)
channel (web, telegram, slack, discord, imessage, mobile)
sessionId (if applicable)
traceId / correlation IDs when available

2) Daily ADR review becomes a durable Inngest workflow

Add a daily function that:

ranks open ADRs (proposed, accepted) using ADR-0183 rubric,
identifies non-compliance/drift/missing rationale,
generates targeted enrichment questions,
queues AM operator review items,
refreshes unanswered items with fresh context snapshots,
writes all outputs as persisted Q&A resources.

Unanswered items are not dropped; they are re-evaluated and re-queued with updated evidence.

3) ADR lifecycle events trigger immediate rerank

A second workflow runs on ADR lifecycle changes:

adr.created
adr.updated
adr.status.changed
question.answered (when answer affects rubric axes or gates)

Behavior:

re-score the affected ADR immediately,
persist an assessment snapshot with trigger metadata,
enqueue operator review only when score/band/gate state changes.

The daily run remains the global backstop; event rerank handles fresh decisions in near real time.

4) Persist rubric reasoning as first-class evidence

Every scoring pass writes a durable RubricAssessment record.

Required fields:

adrId, assessedAt, assessedBy, trigger, traceId
axes: need, readiness, confidence, novelty, agentReady
derived: score, band, drift, compliance
gates: autoEligible, approvalState, freshnessState, gateFailures
reasoning: summary, assumptions, counterarguments, risks
evidence refs: linked incidents/runs/commits/questions/notes used for scoring

ADR frontmatter remains the compact execution summary (priority-* fields). Rich reasoning stays in Convex and is indexed in Typesense.

5) Human approval remains a hard execution gate

System-improvement work stays gated by explicit approval decisions.

Auto-eligibility predicate:

priority-band == do-now
AND priority-confidence >= 4
AND priority-agent-ready >= 4
AND approval-state == approved
AND freshness-state == fresh

priority-agent-ready is gate-only (not part of the ranking score).

6) Channel persistence is mandatory

All MCQ/Q&A flows across all channels write to the same canonical store.

Implication:

A decision made in Telegram is visible in web/mobile review queues.
A web decision can unblock queued work in system-bus.
Channel is a delivery adapter, not a data silo.

7) Web first, mobile explicit Phase 2 adapter

Phase 1 venue: authenticated web UI for outstanding questions, approvals, and AM review queue.
Phase 2 venue: native mobile adapter (same backend/contracts) for MCQ/dashboard/review.
VOIP steering: Phase 3, only after MCQ approval loops are stable.

8) Treat MCQ as a survey/quiz-class primitive

Adopt persisted Q&A as a first-class product primitive in joelclaw (conceptually aligned with survey/quiz structures used in course-builder/ai-hero), but with joelclaw-specific governance fields:

approval category
ADR/workload linkage
operator/audit provenance
automation gate state

9) Curated steering-input pack becomes mandatory

Autonomous improvement proposals must be grounded in curated steering inputs, not single-source vibes.

Minimum steering categories:

Outcome signals — incidents, failures, regressions, successful recoveries.
Execution signals — loop throughput, retries, skipped work, test stability.
Human signals — approvals, rejections, overrides, comments.
Knowledge signals — ADR supersessions, discoveries, dependency shifts.
Risk signals — blast-radius estimates, reversibility, rollback quality.

Each steering input must include: provenance, freshness timestamp, confidence weight, and conflict markers.

10) Freshness SLA contract is a hard autonomy gate

Steering inputs age out. When evidence is stale, autonomy stops.

Default freshness windows are 7/7/14/30 days:

Category	Max age
Outcome signals	7 days
Execution signals	7 days
Risk signals	7 days
Human signals	14 days
Knowledge signals	30 days

Contract:

every SteeringSignal stores capturedAt and computed age at assessment time,
every rerank computes freshnessState (fresh | stale) plus per-category freshness failures,
if any required category is stale or missing, freshness gate fails.

Failure behavior:

set autoEligible = false,
append deterministic gate failures (for example: freshness.outcome.stale, freshness.human.missing),
enqueue targeted refresh questions/tasks for stale categories,
continue scoring/banding, but scheduler must refuse autonomous execution until freshness is restored.

Monitoring + calibration:

freshness windows are configurable per environment and persisted with each RubricAssessment,
emit OTEL events for freshness failures and blocked-execution reasons,
calibrate thresholds from observed false-positive/false-negative blocks.

Consequences

Good

Decisions become queryable, replayable, and auditable.
Daily ADR review can continuously improve data quality.
Human gate is explicit and machine-checkable.
Web and mobile share one control-plane model.
Typesense retrieval can include both ADR text and decision history.

Tradeoffs

More schema + indexing complexity.
Requires strict event correlation discipline.
Poor question design can create operator fatigue.

Risks

Over-questioning can stall momentum.
Approval gate could become bottleneck if queue quality is bad.

Mitigation: keep MCQ packs short, dedupe similar questions, and enforce freshness-driven batching.

Implementation Plan (vector clock)

Contract + schema
- Add canonical Q&A/decision schema in Convex.
- Add RubricAssessment schema for score + reasoning snapshots (including freshness-state + gate failures).
- Add SteeringSignal schema for curated input records with signal category + freshness timestamps.
- Persist active freshness-SLA values with each assessment snapshot for auditability.
Event contracts
- Define question.asked, question.answered, decision.recorded, decision.applied.
- Add adr.created, adr.updated, adr.status.changed, adr.rubric.reranked.
Rerank workflow (event-triggered)
- New Inngest function reranks affected ADRs on lifecycle events.
- Persist trigger metadata and reasoning snapshot every run.
- Emit operator-review items only on meaningful state changes.
Daily review workflow (batch backstop)
- New Inngest function for full open-ADR ranking + enrichment + AM queue generation.
- Refresh unanswered items with current context before re-queueing.
Web review surface (Phase 1)
- Authenticated dashboard for outstanding/unanswered questions.
- Approval actions that emit durable decision events.
Cross-channel capture
- Ensure all channel MCQ interactions persist through shared contracts.
- Link every answer to ADR/workload targets.
Typesense projection
- Index Q&A/decision/rubric reasoning artifacts with ADR linkage for semantic retrieval.
Workload scheduler integration
- Only schedule autonomous improvement workloads when gate predicate is satisfied.
- Enforce per-category freshness SLA; stale or missing required categories hard-block autonomous execution.
Mobile adapter (Phase 2)
- Native app surface for queue review and MCQ decisions on same backend contracts.
VOIP steering (Phase 3)

Add voice control/review loops after dashboard + MCQ flow proves stable.

Verification

Out of Scope (for this ADR)

Final UI aesthetics and component-level design choices.
Provider-specific VOIP transport implementation details.
Replacing existing ADR ranking policy (this ADR extends ADR-0183; it does not replace it).