AI Product Building Coding Tools
Verification is the single highest-leverage practice for agent-assisted coding
Giving an agent a way to verify its own work 2-3x the quality of output — without verification, you're shipping blind
Boris Cherny + Anthropic Official Best Practices · · 24 connections
Connected Insights
References (8)
→ Declarative beats imperative when working with agents → Compound engineering makes each unit of work improve all future work → Multi-model code review creates adversarial robustness — each model catches what others miss → Property-based testing explores agent input spaces that example-based tests miss → The 80/99 gap is where AI products die — demo accuracy and production reliability are infinitely far apart → A mediocre agent inside a strong harness outperforms a stronger agent inside a messy one → Rollback safety nets enable autonomous iteration — not model intelligence → Verification is a Red Queen race — optimizing against a fixed eval contaminates it
Referenced by (16)
← Autonomous coding loops need small stories and fast feedback to work ← Production agents route routine cases through decision trees, reserving humans for complexity ← Evaluate agent tools with real multi-step tasks, not toy single-call examples ← Multi-model code review creates adversarial robustness — each model catches what others miss ← Harness engineering — humans steer, agents execute, documentation is the system of record ← Excessive self-regard makes fixable failures persist — people excuse poor performance instead of correcting it ← Weaponize sycophancy with adversarial agent ensembles instead of fighting it ← Property-based testing explores agent input spaces that example-based tests miss ← The 80/99 gap is where AI products die — demo accuracy and production reliability are infinitely far apart ← A mediocre agent inside a strong harness outperforms a stronger agent inside a messy one ← Rollback safety nets enable autonomous iteration — not model intelligence ← Every optimization has a shadow regression — guard commands make the shadow visible ← Verification is a Red Queen race — optimizing against a fixed eval contaminates it ← Adversarial branch-walking beats review for planning — walk every design branch until resolved ← Auto-generated narrow monitors beat handwritten broad checks — a tight mesh over the exact shape of the code ← Speed without feedback amplifies errors — agents lack the self-correction mechanism that constrains human mistakes