AI Product Building Coding Tools AI Agents

Multi-model code review creates adversarial robustness — each model catches what others miss

Using 3 different LLMs to review the same PR exploits the fact that models have different failure modes, creating emergent coverage no single model achieves

@elvissun (Elvis Sun) — OpenClaw Agent Swarm · Feb 24, 2026 · 8 connections

Elvis’s agent swarm doesn’t just use multiple models for generation — it uses three different models to review every PR before human approval. The effect is adversarial: requiring all three models to pass before merging means each model’s blindspots are covered by the others.

This multiplies the insight from Verification is the single highest-leverage practice for agent-assisted coding: if verification 2-3x quality, verification by diverse verifiers compounds further. It’s the code review equivalent of Treat AI like a distributed team, not a single assistant — not just parallel work, but parallel judgment. Combined with Evaluate agent tools with real multi-step tasks, not toy single-call examples, the implication is that evaluation itself benefits from model diversity, not just task execution. A more deliberate version of this pattern is Weaponize sycophancy with adversarial agent ensembles instead of fighting it, where competing scoring incentives (bug-finder vs. adversary vs. referee) exploit each agent’s eagerness to please rather than fighting it.

Connected Insights

References (4)

→ Evaluate agent tools with real multi-step tasks, not toy single-call examples → Treat AI like a distributed team, not a single assistant → Verification is the single highest-leverage practice for agent-assisted coding → Weaponize sycophancy with adversarial agent ensembles instead of fighting it

Referenced by (4)

← Verification is the single highest-leverage practice for agent-assisted coding ← Treat AI like a distributed team, not a single assistant ← AI is the computer — orchestration across 19 models is the product, not any single model ← Confluence of tendencies produces extreme outcomes — lollapalooza effects emerge when multiple psychological biases push the same direction