Session Signal Mining
Status
accepted — 2026-05-21. Start with a deterministic CLI slice; do not build the full pipeline until the signal quality earns it.
Context
joelclaw sessions search is useful for recovery, but it is not yet a pain radar. It can find session paths, chunks, and raw matches, but keyword search alone returns noisy hits:
- copied source material that contains the searched phrase
- current-session echo
- old tweet/archive text
- irrelevant keyword collisions
- preview-only hits with no useful matches
Joel wants to mine session logs for recurring usage patterns, corrections, frustrations, preferences, and decisions across Machines and users. Friction is one important class, but the broader concept is a session signal.
Decision
Add a session signal mining layer over raw session transcripts.
A session signal is a transcript turn worth extracting because it reveals a preference, correction, decision, failure, workflow pattern, approval, repair request, or other durable operating signal.
Signal mining starts as CLI-first deterministic analysis:
joelclaw sessions signals --kind friction --source local --since 14d
joelclaw sessions friction # alias for --kind frictionThe first implementation must:
- Prefer raw transcripts over Typesense snippets for signal extraction.
- Treat user turns as the primary signal source in v1.
- Include assistant/tool turns only as bounded evidence context.
- Filter by signal kind (
friction,preference,decision,praise, etc.). - Emit JSON envelopes with exact transcript paths and line numbers.
- Keep deterministic clusters and evidence before any LLM synthesis.
- Preserve
sessions frictionas a narrow alias over the broadersessions signalsabstraction.
Signal language
Joel’s language uses profanity as meaning-bearing emphasis, not noise.
Rules:
fuck,fucking, andfuckinare strong emphasis signals.- They are not automatically anger.
- When near correction or critique language, they boost friction severity.
- When near praise or approval (
fuck yeah,fuckin love), they indicate positive preference. ShitRatis agent identity, not friction.- Insults aimed at output or process (
generic bullshit,dogshit,trash,sludge) are high-confidence friction.
Memory promotion
Raw signal hits must not automatically become memory.
Memory promotion is allowed only after analysis produces derived reusable guidance. Future --remember behavior should stage proposals by default and write directly only for high-severity, clean-evidence patterns.
A memory-worthy analysis must include:
- a recurring cluster or one severe correction,
- clear actor/source attribution,
- evidence line pointers,
- reusable guidance rather than raw transcript dumps,
- no secrets or excessive copied transcript text.
Consequences
- Session search remains a flashlight.
- Session signals become the pain/preference radar.
- The system gains a safer path from raw Runs to operational learning.
- Signal quality can be improved incrementally without poisoning memory.
Non-goals for v1
- No autonomous daily pipeline yet.
- No raw transcript memory dumps.
- No Typesense-only signal classification.
- No numeric fake-confidence scoring.
- No cross-user identity certainty unless transcript metadata provides it.