The Reviewer Should Be Trying to Break It
maps directly to the reviewer step in the agent loop — adversarial stance vs. neutral audit changes output quality dramatically
From poteto’s noodle collection on skills.sh — a lesson on adversarial review as a deliberate prompting stance for AI-assisted code review.
The standard pattern for code review with AI is to ask “does this look good?” That’s a rubber stamp waiting to happen. The adversarial version flips the prompt: “your job is to find a reason this shouldn’t ship.” It’s the difference between a collaborator and a red team. The model’s prior is confirmation — you have to explicitly override it to get genuine skepticism.
The adversarial stance isn’t about being harsh, it’s about changing the prior. A neutral reviewer implicitly wants to say yes. An adversarial reviewer is scored on finding problems. That framing change — applied at the system prompt level — pulls out failure modes that neutral review walks right past. The same model, same code, dramatically different output depending on which role it’s playing.
This connects directly to the agent loop architecture reviewer step. The loop already separates roles: Planner, Test Writer, Implementor, Reviewer, Judge. The reviewer is supposed to evaluate with four questions — but “evaluate” can mean “give it a pass” just as easily as “find the flaw.” Making the reviewer role explicitly adversarial is a lever on that step without touching the surrounding architecture.
Key Ideas
- Adversarial review = explicitly prompting the model to find reasons to reject, not confirm
- Changes the model’s implicit prior from “looking for approval” to “looking for failure modes”
- Same model, same code — different framing yields fundamentally different review quality
- Pairs naturally with a separate Judge step that weighs the adversarial critique against the implementation
- Related to red-teaming as a discipline — make someone’s job to break the thing before it ships
- The “four questions” reviewer pattern in the agent loop could apply adversarial stance via system prompt without structural changes
- Noodle appears to be poteto’s collection of AI coding workflow patterns
Links
- Source — skills.sh/poteto/noodle/adversarial-review
- poteto on skills.sh
- Noodle collection
- Agent loop ADR — the reviewer step this pattern plugs into
- Red team on Wikipedia — adversarial review’s intellectual ancestor
- Constitutional AI (Anthropic) — related use of AI critiquing AI output