Agent Execution as Its Own Training Signal
Every Inngest function run in the system-bus emits OTEL telemetry — that trace data is exactly the flywheel input that could close the loop on agent self-improvement
The flywheel metaphor comes from Jim Collins — each push builds momentum until the thing spins under its own weight. The Agent Flywheel applies that frame directly to agentic AI systems: every time an agent runs, it produces output, feedback, and trace data that can feed the next generation of runs. The system improves by operating.
This is a different frame than the usual “eval loop” thinking. It’s not about running benchmarks offline and tuning prompts. It’s about designing the system so that production execution is the training signal. The agent does the work. The work generates structured feedback. The feedback shapes what happens next. No separate eval pipeline — the flywheel is the pipeline.
What makes this compelling in practice is the compounding effect. A system that gets 1% smarter every 100 runs will dramatically outperform a system optimized once at launch and left alone. The work Joel has been doing with OTEL telemetry into Typesense, the memory system, and 110+ Inngest functions in the system-bus already generates an enormous amount of structured trace data per day. Whether that trace data actually closes the loop — improves prompts, refines routing decisions, updates the inference router — is the open question the flywheel pattern is asking.
The “complete guide” framing suggests agent-flywheel.com is documenting the full implementation pattern, not just the concept. The space is thin on actual working implementations — most writing on agent improvement stays abstract. Worth tracking.
Key Ideas
- Flywheel mechanics for agents: execution → feedback → improvement → better execution, cycling continuously rather than requiring offline eval rounds
- Production traces as training data: every Inngest run, every Restate workflow step, every OTEL span is a labeled example of what the agent did under what conditions
- Self-improving vs. static systems: agents tuned once at launch decay in value; flywheel-designed agents compound — their edge grows with usage
- Structured feedback as the key constraint: the flywheel only spins if the feedback is structured enough to act on — raw logs don’t close the loop, but typed OTEL events indexed in Typesense might
- Connection to memory systems: the joelclaw memory pipeline already does a version of this for episodic memory — the flywheel pattern extends it to skill and routing improvement
- Observation quality gates: what gets written back into the loop matters as much as what gets captured — garbage feedback = spinning in place