Meta-agents that autonomously optimize task agents beat hand-engineered harnesses on production benchmarks
AutoAgent's meta-agent hit #1 on SpreadsheetBench (96.5%) and TerminalBench (55.1%) by autonomously iterating on a task agent's harness for 24+ hours — every other leaderboard entry was hand-engineered
@kevingu (Kevin Gu) — AutoAgent: First Open Source Library for Self-Optimizing Agents · · 9 connections
Connected Insights
References (6)
→ Autonomous coding loops need small stories and fast feedback to work → Evals are the gradient signal for harness engineering — the same data quality rigor from ML training applies → Evolved harnesses transfer across models — a single optimized harness improves five different LLMs → Full trace filesystems beat compressed summaries for harness optimization — 10M tokens of context outperforms 26K → A mediocre agent inside a strong harness outperforms a stronger agent inside a messy one → Tool design is continuous observation — see like an agent
Referenced by (3)
← Agents learn at three distinct layers — model weights, harness code, and context configuration ← A mediocre agent inside a strong harness outperforms a stronger agent inside a messy one ← Full trace filesystems beat compressed summaries for harness optimization — 10M tokens of context outperforms 26K