Local SQLite as Discord's Missing Search Layer

repogodiscordsqlitefts5local-firstsearcharchivememorysovereignty

Discord is a joelclaw gateway channel — this pattern of local SQLite mirroring maps directly to the vault/memory system's approach of owning searchable history rather than depending on platform search

Peter Steinberger’s discrawl mirrors a Discord guild into local SQLite with FTS5 full-text search. Bot token only — no user-token hacks, so it’s ToS-compliant. The whole archive lives on your machine.

Discord’s built-in search is notoriously bad. No time-range queries, threads are hard to search, and the whole thing evaporates if you lose server access. discrawl sidesteps all of it. Run sync --full once to backfill history, then tail to stay current with live Gateway events. After that, search "panic: nil pointer" just works — locally, instantly, offline.

The sql command is the sleeper feature. Arbitrary read-only SQL against the message store: select guild_id, count(*) from messages group by guild_id, mention forensics, anything. FTS5 covers the text search layer, but the full relational model is there when you need it. Channels, members, threads, roles, attachments — all normalized and queryable.

Steipete also wrote OpenClaw, a Discord bot framework that discrawl integrates with by default (init --from-openclaw). The two tools share token config, which is a clean ergonomic detail — one bot credential, two tools. He’s shipping fast on this; the Go codebase is tidy and the doctor command makes setup verification a one-liner.

Key Ideas

  • Bot token, not user token — legit API access, no ToS gray area; requires Server Members Intent and Message Content Intent enabled in the Discord developer portal
  • FTS5 + SQLite — the same local-first search stack powering tools like Litestream and cr-sqlite; proven, fast, portable
  • sync --full then tail — backfill once, then live Gateway events keep it current; periodic repair syncs close any gaps
  • sql command — direct read-only SQL against the full schema; no ORM, no abstraction layer between you and the data
  • Multi-guild ready — schema handles multiple guilds; search fans out across all of them by default
  • Attachment text extraction — small text-like attachments get pulled into the FTS index, so code snippets and pastes are searchable too
  • OpenClaw integration — if you already run OpenClaw, the token config is shared automatically; zero additional credential management
  • Parallel channel workerssync uses min(32, max(8, GOMAXPROCS*2)) workers by default; --concurrency to override