claude-context-mode
mcpcontext-managementclaude-codesqlite-fts5bm25agent-tooling
MCP server that sits between Claude Code and tool outputs, preventing raw data from flooding the 200K context window. Instead of dumping large outputs directly into context, it sandboxes execution and returns only relevant snippets.
Headline number: 315KB → 5.4KB (98% reduction). Extends usable session time from ~30min to ~3hrs.
How It Works
- Sandboxed execution — runs code in subprocesses via
PolyglotExecutor(JS, TS, Python, Shell, Ruby, Go, Rust, PHP, Perl, R). Onlystdoutenters context. - FTS5 knowledge base — indexes large outputs into SQLite with BM25 ranking, Porter stemming, trigram search, and fuzzy correction (Levenshtein distance) as 3-layer fallback.
- Intent-driven filtering — when output >5KB and
intentparam is provided, returns only matching section titles + previews instead of full content. - PreToolUse hooks — intercepts
curl,wget,WebFetch,Read(large files),Grepand redirects to sandbox equivalents. SubagentTaskprompts auto-injected with sandbox instructions. - Batch execution —
batch_execute(commands, queries)replaces 30+ individualexecute()+search()calls in one round trip.
Interesting Design Choices
- Smart truncation: 60% head / 40% tail split — errors are usually at the end, don’t lose them
- Progressive search throttling: After 3 calls → 1 result/query. After 8 → blocks entirely, demands batching
- Subagent upgrade: Bash subagents auto-upgraded to general-purpose for MCP access
- Network tracking: Wraps
fetchinside JS sandboxes to measure bytes consumed without entering context - Vocabulary extraction: Returns distinctive terms from indexed content as search hints for the LLM
- Ephemeral + persistent stores: Intent search uses ephemeral
:memory:DB for ranking while also indexing into persistent store for latersearch()calls
Benchmark Highlights
| Data Type | Raw Size | Context | Savings |
|---|---|---|---|
| Playwright page snapshot | 56.2 KB | 299 B | 99% |
| GitHub issues (facebook/react) | 58.9 KB | 1.1 KB | 98% |
| Analytics CSV (500 rows) | 85.5 KB | 222 B | 100% |
| Git log (150+ commits) | 11.6 KB | 107 B | 99% |
| Next.js App Router docs (index+search) | 6.5 KB | 3.3 KB | 50% |
Knowledge retrieval (index+search) has lower savings (50-93%) because it returns exact code blocks, not summaries. This is by design — a useEffect cleanup pattern comes back with the full code intact.
Architecture
src/server.ts— MCP server with 7 tools:execute,execute_file,index,search,fetch_and_index,batch_execute,statssrc/store.ts—ContentStoreclass wrapping better-sqlite3 with FTS5, markdown chunking, plain text chunkingsrc/executor.ts—PolyglotExecutorwith safe env, compile-and-run for Rust, file content injectionsrc/runtime.ts— runtime detection (Bun preferred for 3-5x faster JS/TS)hooks/pretooluse.sh— Claude Code PreToolUse hook for automatic tool interception
Relevance to joelclaw
We handle this differently — pi has its own context management, Inngest functions are server-side (no context window pressure), gateway uses message queues. But:
- The FTS5 indexing pattern for large outputs is interesting for agent loop output compression
- The intent-driven filtering concept could apply to how we surface OTEL events or Inngest run traces
- The progressive throttling pattern (degrade gracefully → block → demand batching) is a good general anti-spam pattern for any tool surface