claude-context-mode

mcpcontext-managementclaude-codesqlite-fts5bm25agent-tooling

MCP server that sits between Claude Code and tool outputs, preventing raw data from flooding the 200K context window. Instead of dumping large outputs directly into context, it sandboxes execution and returns only relevant snippets.

Headline number: 315KB → 5.4KB (98% reduction). Extends usable session time from ~30min to ~3hrs.

How It Works

Sandboxed execution — runs code in subprocesses via PolyglotExecutor (JS, TS, Python, Shell, Ruby, Go, Rust, PHP, Perl, R). Only stdout enters context.
FTS5 knowledge base — indexes large outputs into SQLite with BM25 ranking, Porter stemming, trigram search, and fuzzy correction (Levenshtein distance) as 3-layer fallback.
Intent-driven filtering — when output >5KB and intent param is provided, returns only matching section titles + previews instead of full content.
PreToolUse hooks — intercepts curl, wget, WebFetch, Read (large files), Grep and redirects to sandbox equivalents. Subagent Task prompts auto-injected with sandbox instructions.
Batch execution — batch_execute(commands, queries) replaces 30+ individual execute() + search() calls in one round trip.

Interesting Design Choices

Smart truncation: 60% head / 40% tail split — errors are usually at the end, don’t lose them
Progressive search throttling: After 3 calls → 1 result/query. After 8 → blocks entirely, demands batching
Subagent upgrade: Bash subagents auto-upgraded to general-purpose for MCP access
Network tracking: Wraps fetch inside JS sandboxes to measure bytes consumed without entering context
Vocabulary extraction: Returns distinctive terms from indexed content as search hints for the LLM
Ephemeral + persistent stores: Intent search uses ephemeral :memory: DB for ranking while also indexing into persistent store for later search() calls

Benchmark Highlights

Data Type	Raw Size	Context	Savings
Playwright page snapshot	56.2 KB	299 B	99%
GitHub issues (facebook/react)	58.9 KB	1.1 KB	98%
Analytics CSV (500 rows)	85.5 KB	222 B	100%
Git log (150+ commits)	11.6 KB	107 B	99%
Next.js App Router docs (index+search)	6.5 KB	3.3 KB	50%

Knowledge retrieval (index+search) has lower savings (50-93%) because it returns exact code blocks, not summaries. This is by design — a useEffect cleanup pattern comes back with the full code intact.

Architecture

src/server.ts — MCP server with 7 tools: execute, execute_file, index, search, fetch_and_index, batch_execute, stats
src/store.ts — ContentStore class wrapping better-sqlite3 with FTS5, markdown chunking, plain text chunking
src/executor.ts — PolyglotExecutor with safe env, compile-and-run for Rust, file content injection
src/runtime.ts — runtime detection (Bun preferred for 3-5x faster JS/TS)
hooks/pretooluse.sh — Claude Code PreToolUse hook for automatic tool interception

Relevance to joelclaw

We handle this differently — pi has its own context management, Inngest functions are server-side (no context window pressure), gateway uses message queues. But:

The FTS5 indexing pattern for large outputs is interesting for agent loop output compression
The intent-driven filtering concept could apply to how we surface OTEL events or Inngest run traces
The progressive throttling pattern (degrade gracefully → block → demand batching) is a good general anti-spam pattern for any tool surface

← All discoveries