ADR-0188shipped

Gateway Channel Muting and Known Issues

Status: Accepted
Date: 2026-03-01
Relates to: ADR-0062 (Heartbeat fan-out), ADR-0090 (O11y triage with streak suppression)

Context

The gateway health monitor probes all channels (Telegram, Discord, iMessage, Slack) every cycle. When a channel has a known issue — e.g. iMessage requiring a manual FDA re-grant — the monitor generates repeated degradation alerts that add noise without actionable information. The streak/cooldown system suppresses frequency but doesn’t eliminate alerts for issues the operator has already acknowledged.

Decision

Add a muted channels concept to the gateway health check:

  1. Redis state: gateway:health:muted-channels (JSON array of channel IDs) and gateway:health:mute-reasons (JSON object mapping channel ID → reason string).
  2. Health check behavior: Muted channels are still probed and logged in OTEL for observability, but are excluded from the alert path. No Telegram notifications for muted channels.
  3. CLI interface: joelclaw gateway mute <channel> [--reason "..."], joelclaw gateway unmute <channel>, joelclaw gateway known-issues — all returning HATEOAS JSON envelopes.
  4. No TTL on mutes: Mutes persist until explicitly unmuted. The known-issues command surfaces what’s muted so nothing is forgotten.

Consequences

  • Operators can acknowledge known issues without disabling monitoring entirely.
  • OTEL data remains complete — muting is a notification filter, not a probe filter.
  • The known-issues list doubles as a lightweight incident tracker for degraded subsystems.
  • Risk: a muted channel could stay muted indefinitely. Mitigated by known-issues being visible in gateway status output.

2026-03-04 Reality Check

Status updated to shipped.

  • packages/cli/src/commands/gateway.ts implements gateway mute, gateway unmute, gateway known-issues with Redis keys gateway:health:muted-channels and gateway:health:mute-reasons exactly as specified
  • HATEOAS JSON envelopes on all three commands
  • Listed in CLI help as joelclaw gateway {status|events|push|drain|test|restart|stream|diagnose|review|known-issues|mute|unmute}