The intuitive assumption is that better models will eventually solve the verification problem. The reality is the opposite: more capable models expand the deployment surface and increase the stakes of the verification gap, making the infrastructure more valuable, not less. Verification infrastructure sits permanently between AI and production and does not get displaced when the next model drops — unlike A mediocre agent inside a strong harness outperforms a stronger agent inside a messy one which focuses on the harness around a single agent, verification infrastructure operates across the entire deployment boundary.
The generation problem got solved faster than anyone expected. The verification problem is structurally harder: permanently adversarial and almost entirely unbuilt. This asymmetry is where the next important companies get built, as the valuable infrastructure is not the eval suite itself but the system that keeps the Verification is a Red Queen race — optimizing against a fixed eval contaminates it running — generating adversarial scenarios faster than agents can optimize against them.