Coding Tools

Dan Shipper & Kieran Klaassen (Every) — Compound Engineering33

Compound engineering makes each unit of work improve all future work

The 80/20 ratio (80% plan+review, 20% work+compound) ensures learning compounds across iterations, not just code

Coding Tools

Verification is the single highest-leverage practice for agent-assisted coding

Giving an agent a way to verify its own work 2-3x the quality of output — without verification, you're shipping blind

Boris Cherny + Anthropic Official Best Practices28

Anthropic Official Best Practices22

The context window is the fundamental constraint — everything else follows

Every best practice in AI coding (subagents, /clear, focused tasks, specs files) traces back to managing a single scarce resource: context

Ryan Carson — Ralph / Autonomous Coding Loop21

Autonomous coding loops need small stories and fast feedback to work

The Ralph pattern ships 13 user stories in 1 hour by decomposing into context-window-sized tasks with explicit acceptance criteria and test-based feedback

Boris Cherny — How I Use Claude Code17

Treat AI like a distributed team, not a single assistant

Running 15 parallel Claude streams with specialized roles (writer, reviewer, architect) produces better results than one perfect conversation

Coding ToolsArchitecture

Scaffolding is tech debt against the next model — the bitter lesson applied to product building

Code built to extend model capability 10-20% becomes worthless when the next model ships, making most product scaffolding an ephemeral trade-off rather than a lasting investment

Boris Cherny (@bcherny) — Inside Claude Code With Its Creator, Y Combinator Light Cone podcast15

Andrej Karpathy — Coding Observations13

Declarative beats imperative when working with agents

Give agents success criteria and watch them go — don't tell them what to do step by step

@elvissun (Elvis Sun) — OpenClaw Agent Swarm13

An orchestrator agent that manages other agents solves the parallel coordination problem without human bottleneck

Instead of humans managing AI agents, a meta-agent spawns specialized agents, routes tasks by model strength, and monitors progress — turning agent swarms into autonomous dev teams

Coding ToolsFuture of AI

Build for the model six months from now, not the model of today

AI product builders should target the capability frontier the model hasn't reached yet, because today's PMF gets leapfrogged when the next model ships

Boris Cherny (@bcherny) — Inside Claude Code With Its Creator, Y Combinator Light Cone podcast11

Knowledge SystemsCoding Tools

Spec files are external memory that survives context resets

A structured specs/ folder (design.md, implementation.md, decisions.md) bridges human intent and agent execution across sessions

Community pattern — spec-first development (implementations by AWS Kiro, GitHub spec-kit, and multiple Claude Code workflows)11

Boris Cherny (@bcherny, Claude Code team) — Twitter reply to @EthanLipnik10

Agentic search beats RAG for live codebases

Claude Code abandoned RAG and vector DB in favor of letting the agent grep/glob/read — reasoning about where to look outperforms pre-indexed similarity search for code

Future of AICoding Tools

Frontier companies absorb every useful agentic pattern into base products

If a workaround truly extends agent capabilities, OpenAI and Anthropic — the biggest power users of their own models — will build it in, making external dependencies temporary

@systematicls — How To Be A World-Class Agentic Engineer10

OpenAI Codex Team — Harness Engineering: Leveraging Codex in an Agent-First World10

Harness engineering — humans steer, agents execute, documentation is the system of record

OpenAI built a million-line production codebase with zero manually-written code in 5 months. The discipline shifted from writing code to designing the harness: architecture constraints, documentation, tooling, and feedback loops that make agents reliable at scale.

Learning Technical Concepts chat — discussion of parallel agent workflows10

Parallel agents create a management problem, not a coding problem

When AI agents can work on multiple projects simultaneously, the bottleneck shifts from writing code to coordinating parallel workstreams

@trq212 (Thariq, Claude Code team) — Lessons from Building Claude Code10

Tool design is continuous observation — see like an agent

Designing effective agent tools requires iterating by watching actual model behavior, not specifying upfront; tools that helped weaker models may constrain stronger ones

Analysis of Machina (@EXM7777) — 30-Day AI Mastery Roadmap9

Building real projects teaches AI skills faster than following structured curricula

A non-technical user who built a production WhatsApp bot reached 'Operator' level that a 30-day AI mastery roadmap targets — through building, not studying

Anthropic Engineering — Writing Effective Tools for Agents9

Evaluate agent tools with real multi-step tasks, not toy single-call examples

Weak evaluation tasks hide tool design flaws — strong tasks require chained calls, ambiguity resolution, and verifiable outcomes

@kevingu (Kevin Gu) — AutoAgent: First Open Source Library for Self-Optimizing Agents9

Meta-agents that autonomously optimize task agents beat hand-engineered harnesses on production benchmarks

AutoAgent's meta-agent hit #1 on SpreadsheetBench (96.5%) and TerminalBench (55.1%) by autonomously iterating on a task agent's harness for 24+ hours — every other leaderboard entry was hand-engineered

shadcn (via X/Twitter) — /done skill pattern9

Session capture turns ephemeral AI conversations into a compounding knowledge base

shadcn's /done pattern — dumping key decisions, questions, and follow-ups to markdown after each Claude session — applies file-based memory architecture to development workflow

@systematicls — How To Be A World-Class Agentic Engineer8

CLAUDE.md should be a routing table, not a knowledge base

Treat CLAUDE.md as a minimal IF-ELSE directory pointing to context files — not a 26,000-line monolith that bloats every session

@elvissun (Elvis Sun) — OpenClaw Agent Swarm8

Multi-model code review creates adversarial robustness — each model catches what others miss

Using 3 different LLMs to review the same PR exploits the fact that models have different failure modes, creating emergent coverage no single model achieves

Anthropic Engineering — Writing Effective Tools for AI Agents — Using AI Agents8

Tools are a new kind of software — contracts between deterministic systems and non-deterministic agents

Agent tools must be designed for how agents think (context-limited, non-deterministic, description-dependent), not how programmers think

Kushal Byatnal — Extend ($1M+ ARR, 3 engineers)7

Boring tech wins for AI-native startups — simpler stack means faster AI-assisted shipping

React + Node + TypeScript + Postgres + Redis scales to $1M ARR with 3 engineers; monorepo is a superpower for AI coding assistants

Coding ToolsFuture of AI

Engineering is no longer the junior partner — at the frontier, research and engineering have fused

The researcher who can build the harness, the eval, and the data pipeline is the one whose hypotheses actually get tested; everyone else waits in a queue. The split between 'has ideas' and 'can run them' has collapsed into one role

@itsreallyvivek (vivek) — how to be good at research6

Anthropic documentation — Prompt Caching6

Prompt caching makes long context economically viable

Prefix-matching cache enables 80%+ cost reduction for multi-turn conversations, making rich context systems affordable at scale

Coding ToolsEngineering

Research speed is mostly the speed at which you discover you're wrong — which makes tooling a first-class research activity

The edge isn't a stroke of genius but volume: more runs per day, more wrong ideas discarded per week, a faster-updating model of reality. That makes one-command runs, config-reproducible experiments, and seconds-not-archaeology run comparison core research work, not chores

@itsreallyvivek (vivek) — how to be good at research6

Manthan Gupta (@manthanguptaa) — How Karpathy's Autoresearch Works; Andrej Karpathy — autoresearch program.md6

Rollback safety nets enable autonomous iteration — not model intelligence

The minimum viable safety net for autonomy is a quantifiable metric, atomic changes, and automatic rollback — these make cheap failure possible, which makes aggressive exploration safe

@aparnadhinak (Aparna Dhinakaran) — Hermes Harness Architecture5

Delegation is not orchestration — durable, externally-steerable child runs are the architectural leap

Hermes can spawn child runs with their own task IDs that return structured summaries, but they die with the parent; true orchestration needs run IDs, lifecycle control, and steering that survive parent completion

@systematicls — How To Be A World-Class Agentic Engineer5

One session per contract beats long-running agent sessions

Fresh context per task contract outperforms 24-hour agent sessions because cross-contract context bloat degrades performance by construction

Geoff Huntley — Latent Patterns Principles (verification over testing)5

Property-based testing explores agent input spaces that example-based tests miss

Generative tests that produce random or adversarial inputs discover edge cases in agent behavior that hand-written examples never cover — verification over testing means proving properties, not checking cases

@ivanhzhao (Ivan Zhao, Notion CEO) — Steam, Steel, and Infinite Minds5

Humans should supervise agent loops from a leveraged point, not sit inside every one

Human-in-the-loop isn't always desirable — putting a person inside every iteration is the Red Flag Act (a man legally required to walk ahead of every car waving a flag); the goal is leveraged oversight of many loops, not manual inspection of each

Coding ToolsFuture of AI

Technical knowledge can become a liability when working with AI

Experts get stuck on implementation details while novices describe outcomes and ship faster

Editorial synthesis — informed by @WorkflowWhisper (Alton Syn) automation tweets + Karpathy coding observations5

Coding ToolsDecision Making

Adversarial branch-walking beats review for planning — walk every design branch until resolved

The most effective planning intervention is not post-hoc review or divergent brainstorming but convergent, exhaustive questioning that traverses each branch of the decision tree with recommended answers

@mattpocockuk (Matt Pocock) — grill-me skill (mattpocock/skills, 9.5K stars, 151K views)4

Future of AICoding Tools

Every role codes when implementation cost drops to zero — the generalist builder replaces the specialist engineer

When AI handles implementation, the title 'software engineer' gives way to generalist builders who code, write specs, design, and talk to users

Boris Cherny (@bcherny) — Inside Claude Code With Its Creator, Y Combinator Light Cone podcast4

Yoonho Lee et al. — Meta-Harness: End-to-End Optimization of Model Harnesses (arXiv:2603.28052)4

Full trace filesystems beat compressed summaries for harness optimization — 10M tokens of context outperforms 26K

Meta-Harness gives its proposer agent a filesystem containing full source code, scores, and execution traces of every prior candidate, enabling up to 10M tokens of diagnostic context per iteration — dramatically outperforming prior methods limited to 26K compressed tokens

@mvanhorn (Matt Van Horn) — Every Claude Code Hack I Know (March 2026)3

Inference capability lowers input fidelity requirements — smart listeners make imprecise input work

When the consumer of input has strong inference ability, the quality bar for that input drops — voice works not because transcription improved, but because the listener got smarter

@systematicls — How To Be A World-Class Agentic Engineer3

Separate research from implementation to preserve agent context for execution

Mixing research and implementation pollutes context with irrelevant alternatives — split them into separate agent sessions so the implementer gets only the chosen approach

Future of AICoding Tools

Software abundance unlocks entire categories of applications that never existed

Software has always been more expensive than we can afford; when AI drops costs 10-20x, previously unviable software becomes economically possible

@dabit3 (Nader Dabit) — Joining Cognition / Software Abundance3

Boris Cherny (@bcherny) — Inside Claude Code With Its Creator, Y Combinator Light Cone podcast3

Uncorrelated context windows are a form of test time compute — fresh perspectives multiply capability

Multiple agents with independent context windows avoid polluting each other's reasoning, and throwing more context at a problem from different angles increases capability

@RampLabs — How We Made Ramp Sheets Self-Maintaining3

Unfocused agents develop path dependency — without a specific mission, they explore the same paths repeatedly

Agents given broad mandates (like 'find bugs') converge on familiar exploration paths, catching high-radius issues but missing narrow situational problems