ADR-0247accepted

Session Signal Mining

Status

accepted — 2026-05-21. Start with a deterministic CLI slice; do not build the full pipeline until the signal quality earns it.

Context

joelclaw sessions search is useful for recovery, but it is not yet a pain radar. It can find session paths, chunks, and raw matches, but keyword search alone returns noisy hits:

  • copied source material that contains the searched phrase
  • current-session echo
  • old tweet/archive text
  • irrelevant keyword collisions
  • preview-only hits with no useful matches

Joel wants to mine session logs for recurring usage patterns, corrections, frustrations, preferences, and decisions across Machines and users. Friction is one important class, but the broader concept is a session signal.

Decision

Add a session signal mining layer over raw session transcripts.

A session signal is a transcript turn worth extracting because it reveals a preference, correction, decision, failure, workflow pattern, approval, repair request, or other durable operating signal.

Signal mining starts as CLI-first deterministic analysis:

joelclaw sessions signals --kind friction --source local --since 14d
joelclaw sessions friction # alias for --kind friction

The first implementation must:

  1. Prefer raw transcripts over Typesense snippets for signal extraction.
  2. Treat user turns as the primary signal source in v1.
  3. Include assistant/tool turns only as bounded evidence context.
  4. Filter by signal kind (friction, preference, decision, praise, etc.).
  5. Emit JSON envelopes with exact transcript paths and line numbers.
  6. Keep deterministic clusters and evidence before any LLM synthesis.
  7. Preserve sessions friction as a narrow alias over the broader sessions signals abstraction.

Signal language

Joel’s language uses profanity as meaning-bearing emphasis, not noise.

Rules:

  • fuck, fucking, and fuckin are strong emphasis signals.
  • They are not automatically anger.
  • When near correction or critique language, they boost friction severity.
  • When near praise or approval (fuck yeah, fuckin love), they indicate positive preference.
  • ShitRat is agent identity, not friction.
  • Insults aimed at output or process (generic bullshit, dogshit, trash, sludge) are high-confidence friction.

Memory promotion

Raw signal hits must not automatically become memory.

Memory promotion is allowed only after analysis produces derived reusable guidance. Future --remember behavior should stage proposals by default and write directly only for high-severity, clean-evidence patterns.

A memory-worthy analysis must include:

  1. a recurring cluster or one severe correction,
  2. clear actor/source attribution,
  3. evidence line pointers,
  4. reusable guidance rather than raw transcript dumps,
  5. no secrets or excessive copied transcript text.

Consequences

  • Session search remains a flashlight.
  • Session signals become the pain/preference radar.
  • The system gains a safer path from raw Runs to operational learning.
  • Signal quality can be improved incrementally without poisoning memory.

Non-goals for v1

  • No autonomous daily pipeline yet.
  • No raw transcript memory dumps.
  • No Typesense-only signal classification.
  • No numeric fake-confidence scoring.
  • No cross-user identity certainty unless transcript metadata provides it.