claude-context-mode

mcpcontext-managementclaude-codesqlite-fts5bm25agent-tooling

MCP server that sits between Claude Code and tool outputs, preventing raw data from flooding the 200K context window. Instead of dumping large outputs directly into context, it sandboxes execution and returns only relevant snippets.

Headline number: 315KB → 5.4KB (98% reduction). Extends usable session time from ~30min to ~3hrs.

How It Works

  1. Sandboxed execution — runs code in subprocesses via PolyglotExecutor (JS, TS, Python, Shell, Ruby, Go, Rust, PHP, Perl, R). Only stdout enters context.
  2. FTS5 knowledge base — indexes large outputs into SQLite with BM25 ranking, Porter stemming, trigram search, and fuzzy correction (Levenshtein distance) as 3-layer fallback.
  3. Intent-driven filtering — when output >5KB and intent param is provided, returns only matching section titles + previews instead of full content.
  4. PreToolUse hooks — intercepts curl, wget, WebFetch, Read (large files), Grep and redirects to sandbox equivalents. Subagent Task prompts auto-injected with sandbox instructions.
  5. Batch executionbatch_execute(commands, queries) replaces 30+ individual execute() + search() calls in one round trip.

Interesting Design Choices

  • Smart truncation: 60% head / 40% tail split — errors are usually at the end, don’t lose them
  • Progressive search throttling: After 3 calls → 1 result/query. After 8 → blocks entirely, demands batching
  • Subagent upgrade: Bash subagents auto-upgraded to general-purpose for MCP access
  • Network tracking: Wraps fetch inside JS sandboxes to measure bytes consumed without entering context
  • Vocabulary extraction: Returns distinctive terms from indexed content as search hints for the LLM
  • Ephemeral + persistent stores: Intent search uses ephemeral :memory: DB for ranking while also indexing into persistent store for later search() calls

Benchmark Highlights

Data TypeRaw SizeContextSavings
Playwright page snapshot56.2 KB299 B99%
GitHub issues (facebook/react)58.9 KB1.1 KB98%
Analytics CSV (500 rows)85.5 KB222 B100%
Git log (150+ commits)11.6 KB107 B99%
Next.js App Router docs (index+search)6.5 KB3.3 KB50%

Knowledge retrieval (index+search) has lower savings (50-93%) because it returns exact code blocks, not summaries. This is by design — a useEffect cleanup pattern comes back with the full code intact.

Architecture

  • src/server.ts — MCP server with 7 tools: execute, execute_file, index, search, fetch_and_index, batch_execute, stats
  • src/store.tsContentStore class wrapping better-sqlite3 with FTS5, markdown chunking, plain text chunking
  • src/executor.tsPolyglotExecutor with safe env, compile-and-run for Rust, file content injection
  • src/runtime.ts — runtime detection (Bun preferred for 3-5x faster JS/TS)
  • hooks/pretooluse.sh — Claude Code PreToolUse hook for automatic tool interception

Relevance to joelclaw

We handle this differently — pi has its own context management, Inngest functions are server-side (no context window pressure), gateway uses message queues. But:

  • The FTS5 indexing pattern for large outputs is interesting for agent loop output compression
  • The intent-driven filtering concept could apply to how we surface OTEL events or Inngest run traces
  • The progressive throttling pattern (degrade gracefully → block → demand batching) is a good general anti-spam pattern for any tool surface