All insights

AI Agents

AI Product Building

105 insights in this topic

105 insights

AI AgentsArchitecture

A mediocre agent inside a strong harness outperforms a stronger agent inside a messy one

The surrounding machinery — metrics, rollback, scoping, observability — determines autonomous system performance more than model capability

Manthan Gupta (@manthanguptaa) — How Karpathy's Autoresearch Works And What You Can Learn From It25
Coding ToolsAI Agents

The context window is the fundamental constraint — everything else follows

Every best practice in AI coding (subagents, /clear, focused tasks, specs files) traces back to managing a single scarce resource: context

Anthropic Official Best Practices22
AI AgentsCoding Tools

Autonomous coding loops need small stories and fast feedback to work

The Ralph pattern ships 13 user stories in 1 hour by decomposing into context-window-sized tasks with explicit acceptance criteria and test-based feedback

Ryan Carson — Ralph / Autonomous Coding Loop21
Knowledge SystemsAI Agents

Persistent agent memory preserves institutional knowledge that walks out the door with employees

When agents maintain daily changelogs, decision logs, and work preferences, organizational knowledge survives personnel changes

@nicbstme (Nicolas Bustamante) + @rohit4verse (Rohit) — agent memory patterns18
Coding ToolsAI Agents

Treat AI like a distributed team, not a single assistant

Running 15 parallel Claude streams with specialized roles (writer, reviewer, architect) produces better results than one perfect conversation

Boris Cherny — How I Use Claude Code17
AI AgentsArchitecture

Production agents route routine cases through decision trees, reserving humans for complexity

Handle exact matches and known patterns without AI; invoke the model for ambiguity, and route genuinely complex cases to human judgment

@vasuman — AI Agents 10116
AI AgentsArchitecture

Agents that store error patterns learn continuously without fine-tuning or retraining

Dash's 'GPU-poor continuous learning' separates validated knowledge from error-driven learnings — five lines of code replaces expensive retraining

@ashpreetbedi — Dash (OpenAI-inspired data agent)14
ArchitectureAI Agents

The three-layer AI stack: Memory, Search, Reasoning

The emerging AI product architecture has three layers — Memory (who is this user), Search (find the right information), Reasoning (navigate complex information) — all running on PostgreSQL

Synthesis from Supermemory, QMD, and PageIndex architectures14
AI AgentsArchitecture

Verification is a Red Queen race — optimizing against a fixed eval contaminates it

Eval suites degrade the moment you use them to improve an agent — the agent adapts to the distribution, and the eval stops measuring what it was designed to measure

@natashamalpani (Natasha Malpani) — The Verification Economy: The Red Queen Problem (Part III)14
AI AgentsCoding Tools

Declarative beats imperative when working with agents

Give agents success criteria and watch them go — don't tell them what to do step by step

Andrej Karpathy — Coding Observations13
AI AgentsBusiness Models

Domain-specific skill libraries are the real agent moat, not core infrastructure

An elite team can replicate any agent's tool architecture in months, but accumulated domain workflows (LBO modeling, compliance, bankruptcy) represent years of domain expertise

@nicbstme — Lessons from Reverse Engineering Excel AI Agents13
AI AgentsCoding Tools

An orchestrator agent that manages other agents solves the parallel coordination problem without human bottleneck

Instead of humans managing AI agents, a meta-agent spawns specialized agents, routes tasks by model strength, and monitors progress — turning agent swarms into autonomous dev teams

@elvissun (Elvis Sun) — OpenClaw Agent Swarm13
Knowledge SystemsAI Agents

Skill graphs enable progressive disclosure for complex domains

Single skill files hit a ceiling — complex domains need interconnected knowledge that agents navigate progressively from index to description to links to sections to full content

@arscontexta (Heinrich) — Twitter thread on skill graphs13
Knowledge SystemsAI Agents

Evolving summaries beat append-only memory — rewrite profiles, don't accumulate facts

An evolve_summary() function that rewrites category profiles with new information handles contradictions naturally, unlike append-only logs

Rohit (@rohit4verse) — How to Build Agents That Never Forget12
AI AgentsArchitecture

Observability is the missing discipline for agent systems — you can't improve what you can't measure

Agent systems need telemetry (token usage, latency, error rates, cost per task) as a first-class engineering concern, not an afterthought bolted on after production failures

Geoff Huntley — Latent Patterns Principles (verification over testing)12
AI AgentsArchitecture

Markdown skill files may replace expensive fine-tuning

A SKILL.md file that teaches an agent how to do something specific can match domain-specific fine-tuned models — at zero training cost

Nicolas Bustamante (@nicbstme) — Lessons from Building AI Agents for Financial Services12
ArchitectureKnowledge SystemsAI Agents

Structure plus reasoning beats flat similarity for complex domains

Across documents, code, and skills, the same pattern holds: structured knowledge navigated by reasoning outperforms flat indexes searched by similarity

Recurring pattern across PageIndex, Claude Code agentic search, and @arscontexta skill graphs12
AI AgentsArchitecture

Evals are behavioral pressure vectors, not neutral measurements — poorly chosen evals distort agent development

Each eval shapes agent behavior like a selection pressure; accumulating tests without strategic purpose creates 'an illusion of improving your agent' while distorting development in unproductive directions, and correctness alone misleads because agents that succeed inefficiently create hidden cost

LangChain — How We Build Evals for Deep Agents11
AI AgentsArchitecture

In agent-native architecture, features are prompts — not code

The shift from coding specific functions to describing outcomes that agents achieve by composing atomic tools

@danshipper — Agent-Native Architectures (co-authored with Claude)11
AI AgentsDecision Making

Every optimization has a shadow regression — guard commands make the shadow visible

When optimizing metric A, metric B silently degrades unless you run a separate invariant check (a guard) alongside the primary verification

Udit Goenka (@uditg) — autoresearch Claude Code skill v1.6.1 (Guard feature by Roman Pronskiy, JetBrains)10
Business ModelsAI Agents

Memory is where agent lock-in lives — without it, agents are commoditized

Stateless model APIs are easily swapped; stateful memory creates a proprietary dataset of user interactions and preferences that makes the agent sticky and differentiated

@hwchase17 (Harrison Chase) — Your harness, your memory10
Coding ToolsAI Agents

Parallel agents create a management problem, not a coding problem

When AI agents can work on multiple projects simultaneously, the bottleneck shifts from writing code to coordinating parallel workstreams

Learning Technical Concepts chat — discussion of parallel agent workflows10
AI AgentsCoding Tools

Tool design is continuous observation — see like an agent

Designing effective agent tools requires iterating by watching actual model behavior, not specifying upfront; tools that helped weaker models may constrain stronger ones

@trq212 (Thariq, Claude Code team) — Lessons from Building Claude Code10
AI AgentsKnowledge Systems

Agents learn at three distinct layers — model weights, harness code, and context configuration

Most people jump to model fine-tuning when discussing agent learning, but learning also happens at the harness layer (code, tools, instructions baked into all instances) and the context layer (per-user or per-tenant configuration like CLAUDE.md and skills)

@hwchase17 (Harrison Chase) — Continual Learning for AI Agents9
AI AgentsCoding Tools

Evaluate agent tools with real multi-step tasks, not toy single-call examples

Weak evaluation tasks hide tool design flaws — strong tasks require chained calls, ambiguity resolution, and verifiable outcomes

Anthropic Engineering — Writing Effective Tools for Agents9
AI AgentsCoding Tools

Meta-agents that autonomously optimize task agents beat hand-engineered harnesses on production benchmarks

AutoAgent's meta-agent hit #1 on SpreadsheetBench (96.5%) and TerminalBench (55.1%) by autonomously iterating on a task agent's harness for 24+ hours — every other leaderboard entry was hand-engineered

@kevingu (Kevin Gu) — AutoAgent: First Open Source Library for Self-Optimizing Agents9
AI AgentsArchitecture

Safety enforcement belongs in tool design, not system prompts

At scale, embedding safety constraints in the tool's API (blocking destructive operations by default) beats relying on behavioral compliance with system prompt instructions

@nicbstme — Lessons from Reverse Engineering Excel AI Agents9
AI AgentsKnowledge Systems

Two-tier agent memory separates organizational workflow knowledge from individual user preferences

Deployment-level memory captures shared tool strategies and sequencing patterns; user-level memory captures personal templates and communication styles — initially skipping user-level had a significant performance impact

@tonygentilcore (Tony Gentilcore, Glean) — Trace Learning for Self-Improving Agents9
AI AgentsArchitecture

Detect everything, notify selectively — the observability-to-notification ratio determines system trust

Watch every signal but ensure alerts reaching humans always mean something; teams ignore noisy monitors AND noisy agents equally fast

@RampLabs — How We Made Ramp Sheets Self-Maintaining8
AI AgentsArchitecture

Evals are the gradient signal for harness engineering — the same data quality rigor from ML training applies

The analogy between ML training and agent development is structural: evals encode desired behavior like training data encodes ground truth, and the same principles (data quality, curation, train/test splits) determine outcomes

@Vtrivedy10 (Viv) — Better Harness: A Recipe for Harness Hill-Climbing with Evals8
Future of AIAI Agents

The intelligence-to-judgement ratio determines which professions AI automates first

Intelligence work (complex but rule-based) is already automatable; judgement (experience, taste, intuition) remains human — software engineering crossed the threshold first

@julienbek — Services: The New Software8
Coding ToolsAI Agents

Multi-model code review creates adversarial robustness — each model catches what others miss

Using 3 different LLMs to review the same PR exploits the fact that models have different failure modes, creating emergent coverage no single model achieves

@elvissun (Elvis Sun) — OpenClaw Agent Swarm8
AI AgentsArchitecture

AI is the computer — orchestration across 19 models is the product, not any single model

Perplexity launched a unified agent system orchestrating 19 backend models that delegate tasks, manage files, execute code, and browse the web. The differentiation isn't the models — it's the orchestration. 'The computer is the orchestration system.'

@AravSrinivas (Aravind Srinivas, Perplexity CEO) — AI Is the Computer8
AI AgentsCoding Tools

Tools are a new kind of software — contracts between deterministic systems and non-deterministic agents

Agent tools must be designed for how agents think (context-limited, non-deterministic, description-dependent), not how programmers think

Anthropic Engineering — Writing Effective Tools for AI Agents — Using AI Agents8
AI AgentsArchitecture

The trace→eval→harness flywheel compounds agent quality — every production interaction generates its own training data

Production traces where agents fail become eval cases; better evals improve the harness; better harnesses produce better traces — creating a self-reinforcing improvement loop

@Vtrivedy10 (Viv) — Better Harness: A Recipe for Harness Hill-Climbing with Evals8
AI AgentsArchitecture

Traces not scores enable agent improvement — without trajectories, improvement rate drops hard

When AutoAgent's meta-agent received only pass/fail scores without reasoning traces, the improvement rate dropped significantly; understanding why matters as much as knowing that

@kevingu (Kevin Gu) — AutoAgent: First Open Source Library for Self-Optimizing Agents8
AI Agents

Self-improving agents overfit to eval metrics — the meta-agent games rubrics unless structurally constrained

AutoAgent's meta-agent gets lazy, inserting rubric-specific prompting so the task agent can game metrics; defense requires forcing self-reflection on generalizability

@kevingu (Kevin Gu) — AutoAgent: First Open Source Library for Self-Optimizing Agents7
AI AgentsArchitecture

Sessions are runtime infrastructure, not just resumable transcripts

Hermes stores sessions in SQLite with search and lineage so CLI, messaging platforms, and scheduled jobs all attach to one session plane — routing can resolve before the model even runs

@aparnadhinak (Aparna Dhinakaran) — Hermes Harness Architecture7
AI AgentsKnowledge Systems

Treat an agent as an operating system, not a stateless function

Agents need RAM (conversation context), a hard drive (persistent memory), garbage collection (decay/pruning), and I/O management (tools) — the OS mental model unlocks architectural clarity

Rohit (@rohit4verse) — How to Build Agents That Never Forget7
AI AgentsArchitecture

Agent edits are automatic decision instrumentation — every human correction is a structured signal

When agents propose and humans edit, the delta between proposal and correction captures tacit judgment as first-class data without requiring manual logging

@JayaGup10 (Jaya Gupta) — The Trillion Dollar Loop B2B Never Had6
AI AgentsKnowledge Systems

Accumulated agent traces produce emergent world models — discovered, not designed

When agent decision trajectories accumulate over time, they form a context graph that reveals entities, relationships, and constraints nobody explicitly modeled

@rohit4verse (Rohit) — The Missing Layer in Your Agentic Stack6
AI AgentsFuture of AI

Agent trust transfers from human credibility — colleagues adopt agents operated by people they trust

When a human's agent consistently performs well, other team members inherit that trust and willingly depend on the agent, creating a credibility chain

@danshipper (Dan Shipper) — 'What personal software actually is' (tweet thread)6
AI AgentsBusiness Models

Closed harnesses behind APIs create memory lock-in by design

When the harness lives behind a proprietary API, memory state and schema become invisible and non-portable — model providers are incentivized to push more of the harness behind their APIs

@hwchase17 (Harrison Chase) — Your harness, your memory6
ArchitectureAI Agents

Context layers must be living systems, not static artifacts

Unlike semantic layers that rot when maintainers leave, context layers need self-updating feedback loops where agent errors refine the context corpus

@jasonscui — Your Data Agents Need Context, a16z6
Knowledge SystemsAI Agents

Cross-user knowledge transfer works without fine-tuning — just a database and prompt engineering

When one person teaches an agent something, another person benefits automatically — no RLHF, no training infrastructure, just structured storage and retrieval

Ashpreet Bedi — Memory: How Agents Learn (Agno Framework)6
AI AgentsArchitecture

Policy enforcement must run independently of model cooperation — hooks, not prompt instructions

Hermes runs lifecycle hooks that block, rewrite, or audit operations at fixed events, so policy and side-effects never depend on the model choosing to comply

@aparnadhinak (Aparna Dhinakaran) — Hermes Harness Architecture6
AI AgentsArchitecture

Evolved harnesses transfer across models — a single optimized harness improves five different LLMs

Meta-Harness discovered a retrieval harness that improved math reasoning by 4.7 percentage points average across five held-out models it was never optimized for, suggesting harness quality is model-agnostic

Yoonho Lee et al. — Meta-Harness: End-to-End Optimization of Model Harnesses (arXiv:2603.28052)6
Knowledge SystemsArchitectureAI Agents

Intelligence location — code vs prompts — determines system fragility and flexibility

Critical architectural fork: prompt-driven systems (Pal's 400-line routing prompt) are flexible but break when models change; code-driven systems (our validate-graph.js) are rigid but reliable — best systems need both

Ayush Jhunjhunwala — KG Architecture Comparative Research (10+ systems analyzed)6
ArchitectureAI Agents

KV cache hit rate is the most critical metric for production agents

Maintaining stable prompt prefixes and append-only context architecture maximizes cache reuse, dramatically reducing both cost and latency for agentic workflows

@nicbstme — The LLM Context Tax: Best Tips for Tax Avoidance6
AI AgentsCoding Tools

Rollback safety nets enable autonomous iteration — not model intelligence

The minimum viable safety net for autonomy is a quantifiable metric, atomic changes, and automatic rollback — these make cheap failure possible, which makes aggressive exploration safe

Manthan Gupta (@manthanguptaa) — How Karpathy's Autoresearch Works; Andrej Karpathy — autoresearch program.md6
Business ModelsAI Agents

Task horizon breaks seat-based pricing — usage scales with workflow depth × length, not headcount

Task horizon is the length dial: how long an AI works on its own before a human steps in. The unit shifted from the call to the workflow — agents run for hours, spawn sub-agents, and burn millions of tokens per decision path, so usage stops scaling with seats; multiply length by depth to get the token bill

@JayaGup10 (Jaya Gupta) — Who will set price / intelligence?6
AI AgentsArchitecture

Traces are the universal substrate for agent learning — all three layers consume the same execution logs

Whether updating model weights, improving harness code, or refining context/memory, agent learning flows start from the same raw material: traces capturing the full execution path of what an agent did

@hwchase17 (Harrison Chase) — Continual Learning for AI Agents6
AI AgentsKnowledge Systems

Tribal knowledge is the irreducible human input that enables agent automation

Automated context construction handles most of the corpus, but the most critical context is implicit, conditional, and historically contingent — only humans can provide it

@jasonscui — Your Data Agents Need Context, a16z6
AI AgentsArchitecture

Agent harnesses are persistent infrastructure, not scaffolding models will absorb

As models improve, old scaffolding disappears but new scaffolding replaces it — harnesses aren't going away, they're evolving

@hwchase17 (Harrison Chase) — Your harness, your memory5
AI AgentsBusiness Models

The intelligence lives in the workflow, not the model — and a model can't simply read it

In a real vertical, the decisive logic (what to escalate, which rule wins, when a human signs off) lives in SOPs and operational experience; the agentic workflow encodes it and becomes the carrier's operating memory

@joeschmidtiv (Joe Schmidt IV, a16z) — Avoiding Death on the Yellow Brick Road5
AI AgentsArchitecture

Auto-generated narrow monitors beat handwritten broad checks — a tight mesh over the exact shape of the code

1,000+ AI-generated monitors that each target specific code paths catch more bugs than 10 hand-written checks that cover general categories

@RampLabs — How We Made Ramp Sheets Self-Maintaining5
AI AgentsArchitecture

Compression should be a forking lifecycle event, not a destructive rewrite

Instead of repeatedly overwriting one transcript, Hermes seeds a child session from each summary and records parent-child lineage — producing an auditable chain of compressions

@aparnadhinak (Aparna Dhinakaran) — Hermes Harness Architecture5
AI AgentsCoding Tools

Delegation is not orchestration — durable, externally-steerable child runs are the architectural leap

Hermes can spawn child runs with their own task IDs that return structured summaries, but they die with the parent; true orchestration needs run IDs, lifecycle control, and steering that survive parent completion

@aparnadhinak (Aparna Dhinakaran) — Hermes Harness Architecture5
AI Agents

Holdout eval sets are the generalization gate for autonomous harness optimization — without them, the loop overfits

Autonomous harness hill-climbing tends to overfit to the optimization set; splitting evals into optimization and holdout categories — mirroring ML train/test splits — is the structural defense

@Vtrivedy10 (Viv) — Better Harness: A Recipe for Harness Hill-Climbing with Evals5
AI AgentsFuture of AI

Malleable software — a tiny core that writes its own plugins — replaces fixed-feature applications

Instead of adapting your workflow to the tool, the tool observes your workflow and extends itself to match it

@tobi (Tobi Lutke, Shopify CEO) — Pi and Clawdbot5
AI AgentsArchitecture

Model compensations become liabilities as capabilities advance — yesterday's fixes hobble today's agent

Engineering workarounds for earlier model limitations accumulate as technical debt that actively degrades agent performance when models improve

@izzymiller (Izzy Miller, Hex) — Building AI Agents for Data Analytics5
AI AgentsCoding Tools

One session per contract beats long-running agent sessions

Fresh context per task contract outperforms 24-hour agent sessions because cross-contract context bloat degrades performance by construction

@systematicls — How To Be A World-Class Agentic Engineer5
ArchitectureAI Agents

Private evals should measure business outcomes that matter — not external benchmarks

A firm's learning loop runs on private evals tied to real business outcomes and private RL environments trained on internal traces, so the model improves against what the company cares about rather than public leaderboards

@satyanadella (Satya Nadella) — A frontier without an ecosystem is not stable5
Coding ToolsAI Agents

Property-based testing explores agent input spaces that example-based tests miss

Generative tests that produce random or adversarial inputs discover edge cases in agent behavior that hand-written examples never cover — verification over testing means proving properties, not checking cases

Geoff Huntley — Latent Patterns Principles (verification over testing)5
AI AgentsArchitecture

Reasoning evaporation permanently destroys agent decision chains when the context window closes

An agent's multi-step reasoning exists only in the context window; when the session ends, the output survives but the decision chain — why each step was taken — is gone forever

@rohit4verse (Rohit) — The Missing Layer in Your Agentic Stack5
AI AgentsArchitecture

Separate tool registration from tool exposure — install broadly, reveal narrowly

Hermes registers all tools into a central registry at import time but a separate layer decides what each run actually shows the model, scoped by platform and scenario

@aparnadhinak (Aparna Dhinakaran) — Hermes Harness Architecture5
AI AgentsCoding Tools

Humans should supervise agent loops from a leveraged point, not sit inside every one

Human-in-the-loop isn't always desirable — putting a person inside every iteration is the Red Flag Act (a man legally required to walk ahead of every car waving a flag); the goal is leveraged oversight of many loops, not manual inspection of each

@ivanhzhao (Ivan Zhao, Notion CEO) — Steam, Steel, and Infinite Minds5
AI AgentsArchitecture

Time-bounded evaluation forces optimization for real-world usefulness instead of idealized performance

A fixed wall-clock budget per experiment makes results comparable, normalizes across hardware, and forces agents to optimize for improvement per unit time

Manthan Gupta (@manthanguptaa) — How Karpathy's Autoresearch Works And What You Can Learn From It5
AI AgentsArchitecture

Trust boundaries must be externalized — not held in engineers' heads

Where an agent's behavior is well-understood vs. unknown should be mapped, made auditable, and connected to deployment gates — not left as implicit tribal knowledge

@natashamalpani (Natasha Malpani) — The Verification Economy: The Red Queen Problem (Part III)5
AI AgentsArchitecture

Unattended agent jobs must run through the same permission machinery as interactive sessions

Hermes makes cron a first-class subsystem — scheduled jobs are gated by the same permissions, delivered through the same paths, and isolated per profile, instead of living as peripheral scripts

@aparnadhinak (Aparna Dhinakaran) — Hermes Harness Architecture5
Business ModelsAI Agents

Vertical models beat frontier models in their domain — specialization wins on every metric

Intercom's Apex, a specialized customer service LLM, beat every frontier model including Anthropic and OpenAI on resolution rate, latency, hallucination rate, and cost

@eoghan (Eoghan McCabe, Intercom CEO) — Never Stop Disrupting Yourself: Introducing the Fin API Platform5
AI AgentsArchitecture

WebMCP turns websites into agent-native interfaces

Chrome's MCP integration lets websites expose structured tools to agents instead of agents scraping and guessing at UI elements

Chrome for Developers — WebMCP announcement (https://developer.chrome.com/blog/web-mcp)5
Future of AIAI Agents

AI's self-improvement loop means each generation builds the next one faster

GPT-5.3-Codex was instrumental in creating itself — recursive improvement compresses timelines and explains why building for obsolescence is the only safe strategy

@mattshumer_ (Matt Shumer) — Something Big Is Happening4
AI AgentsArchitecture

Causal triage must gate automated fixes — statistical regression detection alone can't distinguish your bugs from external failures

Raw error-rate spikes after deployment can't tell you whether YOUR code broke or a third-party API went down; a triage agent that establishes causal links between code changes and observed errors must gate any automated fixing agent

Vishnu Suresh (LangChain) — How My Agents Self-Heal in Production4
Knowledge SystemsAI Agents

Compilation scales but curation compounds — two camps for knowledge graph construction

LLM-compiled systems (Karpathy, Pal) grow fast by feeding raw content through model judgment; human-curated systems (our graph, brainctl) grow slowly but every node is validated — compilation scales linearly, curation compounds through connections

Ayush Jhunjhunwala — KG Architecture Comparative Research (10+ systems analyzed)4
AI AgentsArchitecture

Data agent failures stem from missing business context, not SQL generation gaps

The industry initially blamed text-to-SQL capability for data agent failures, but the real blockers are undefined business definitions, ambiguous sources of truth, and missing tribal knowledge

@jasonscui — Your Data Agents Need Context, a16z4
AI AgentsFuture of AI

Deputies and Sheriffs — distributed agent teams with hierarchical authority replace centralized software

Individual employees train specialized 'Deputy' agents while organizational 'Sheriff' agents manage permissions, rules, and onboarding across the team

@danshipper (Dan Shipper) — 'What personal software actually is' (tweet thread)4
AI Agents

Eval suites must shrink, not just grow — spring cleaning prevents stale behavioral pressure

Saturated evals waste compute without providing signal; more intelligent models or changed desired behaviors make old evals irrelevant, requiring regular pruning alongside addition

@Vtrivedy10 (Viv) — Better Harness: A Recipe for Harness Hill-Climbing with Evals4
AI AgentsFuture of AI

The first-draft pattern is the killer app for long-horizon agents — agents produce, humans review

Long-horizon agents produce comprehensive first drafts (PRs, analyses, reports) that humans verify — this is where the 10x productivity gain actually lives

@hwchase17 (Harrison Chase) — Context Engineering Our Way to Long-Horizon Agents4
AI AgentsCoding Tools

Full trace filesystems beat compressed summaries for harness optimization — 10M tokens of context outperforms 26K

Meta-Harness gives its proposer agent a filesystem containing full source code, scores, and execution traces of every prior candidate, enabling up to 10M tokens of diagnostic context per iteration — dramatically outperforming prior methods limited to 26K compressed tokens

Yoonho Lee et al. — Meta-Harness: End-to-End Optimization of Model Harnesses (arXiv:2603.28052)4
AI AgentsKnowledge Systems

Memory defines the agent — a zip of markdown files IS the agent, and portable memory between harnesses is the frontier

An agent IS its memory — a zip of markdown (system prompt + skills + tools) defines its identity; making that portable between harnesses is the current frontier

@hwchase17 (Harrison Chase) — Everything Gets Rebuilt: Agents, Harnesses, and the New Compute Layer4
ArchitectureAI Agents

Order the system prompt by volatility to keep prompt prefixes cache-friendly

Hermes composes the system prompt in three tiers — stable, context, volatile — so the unchanging prefix stays cacheable while turn-by-turn data lives at the end

@aparnadhinak (Aparna Dhinakaran) — Hermes Harness Architecture4
AI AgentsFuture of AI

Personal software grows through relationship, not configuration

Unlike traditional SaaS where users adapt to the tool, personal software agents grow personality and skills in response to their user through ongoing interaction

@danshipper (Dan Shipper) — 'What personal software actually is' (tweet thread)4
AI Agents

Same-model meta-task pairings outperform cross-model — agents understand their own architecture better than humans or other models do

Claude meta-agent + Claude task agent outperformed Claude meta-agent + GPT task agent because the meta-agent shares weights and implicitly understands how the inner model reasons

@kevingu (Kevin Gu) — AutoAgent: First Open Source Library for Self-Optimizing Agents4
AI AgentsArchitecture

Shadow execution enables safe trace learning — replay write operations without touching production data

By replaying actions that would write to external apps in a shadow path, agents can learn from realistic end-to-end flows without impacting customer data

@tonygentilcore (Tony Gentilcore, Glean) — Trace Learning for Self-Improving Agents4
Knowledge SystemsAI Agents

A skill's folder structure is its context architecture — the file system is a form of context engineering

Skills are not just markdown files but folders where scripts, references, and assets enable progressive disclosure — the agent reads deeper files only when it reaches the relevant step

@trq212 (Thariq) — Lessons from Building Claude Code: How We Use Skills4
ArchitectureAI Agents

AI trace data has an indefinite useful lifespan — SaaS observability's 30-day retention model destroys institutional knowledge

Infrastructure metrics expire quickly but AI conversations and reasoning traces gain value over time; 30-day retention windows erase the very data that reveals failure patterns and training signals

@aparnadhinak (Aparna Dhinakaran) — Data Architectures For Tracing Harnesses & Agents4
AI AgentsArchitecture

Traces replace code as the source of truth for agent systems — debugging shifts from 'show me the code' to 'send me the trace'

In agent systems, execution traces replace source code as the primary debugging and collaboration artifact — you can't predict step 14's context from reading the code

@hwchase17 (Harrison Chase) — Context Engineering Our Way to Long-Horizon Agents4
ArchitectureAI Agents

Virtual filesystems replace sandboxes for agent navigation — intercept commands instead of provisioning infrastructure

Mintlify's ChromaFs intercepts Unix commands and translates them into database queries, cutting boot time from 46 seconds to 100ms and cost from $70k/year to near-zero

Mintlify — How We Built a Virtual Filesystem for Our Assistant4
AI AgentsArchitecture

Agentic UX (AUX) is a distinct design problem — agents don't want to use software the way humans do

AUX (Agentic User Experience) is neither human UX adapted for agents nor raw APIs — it's a third design discipline for how agents want to consume software

@dharmesh (Dharmesh Shah, HubSpot CTO)3
Future of AIAI Agents

Agents running the platform vs. agents on the platform — the operator shift changes what software must be

The shift from agents as features inside your product to agents as operators running your product — passengers become pilots

@dharmesh (Dharmesh Shah, HubSpot CTO)3
AI AgentsKnowledge Systems

Context learning spans agent, tenant, and org levels — and you can mix all three

Agent-level context updates the agent's own configuration over time; tenant-level (user/org/team) gives each tenant their own evolving context; production systems mix multiple levels simultaneously

@hwchase17 (Harrison Chase) — Continual Learning for AI Agents3
AI AgentsFuture of AI

Enterprise agents need deterministic structure while startups need autonomous loops — same models, different harnesses

Enterprises need deterministic graph-based harnesses for predictability; startups benefit from autonomous loop agents — the harness choice, not the model, determines reliability

@hwchase17 (Harrison Chase) — Everything Gets Rebuilt: Agents, Harnesses, and the New Compute Layer3
AI AgentsKnowledge Systems

Hot-path and offline learning are two temporal modes for agent context updates — each with different tradeoffs

Agents can update their context in the hot path (during task execution, like saving to memory while working) or offline (batch processing recent traces after the fact, like OpenClaw's 'dreaming'), with an additional dimension of explicit vs implicit memory updates

@hwchase17 (Harrison Chase) — Continual Learning for AI Agents3
AI AgentsArchitecture

LLM-as-judge must be calibrated against human judgment — uncalibrated judges are worse than no judges

An LLM judge without human-labeled calibration data produces false confidence; the bridge is humans labeling traces, then training the judge to replicate those labels

@hwchase17 (Harrison Chase) — Context Engineering Our Way to Long-Horizon Agents3
AI AgentsArchitecture

Long-horizon evals test compounding behavior, not point-in-time accuracy

Hex's Metric City benchmark simulates 90 days of agent use with evolving data to measure whether the agent gets smarter over time — Day 0: 4%, Day 90: 24%

@izzymiller (Izzy Miller, Hex) — Building AI Agents for Data Analytics3
AI AgentsKnowledge Systems

Procedural memory is the highest-impact type of agent memory — it determines what the agent actually does

Of three memory types (semantic/episodic/procedural), procedural — instructions, skills, and tools — has the highest impact because it changes what the agent actually does

@hwchase17 (Harrison Chase) — Everything Gets Rebuilt: Agents, Harnesses, and the New Compute Layer3
Coding ToolsAI Agents

Separate research from implementation to preserve agent context for execution

Mixing research and implementation pollutes context with irrelevant alternatives — split them into separate agent sessions so the implementer gets only the chosen approach

@systematicls — How To Be A World-Class Agentic Engineer3
AI AgentsArchitecture

Teacher-student trace distillation with consensus validation beats single-oracle learning

A single high-reasoning teacher trace isn't reliable enough for enterprise learning; comparing multiple student traces under production constraints with consensus validation produces trustworthy strategies

@tonygentilcore (Tony Gentilcore, Glean) — Trace Learning for Self-Improving Agents3
AI AgentsCoding Tools

Uncorrelated context windows are a form of test time compute — fresh perspectives multiply capability

Multiple agents with independent context windows avoid polluting each other's reasoning, and throwing more context at a problem from different angles increases capability

Boris Cherny (@bcherny) — Inside Claude Code With Its Creator, Y Combinator Light Cone podcast3
AI AgentsCoding Tools

Unfocused agents develop path dependency — without a specific mission, they explore the same paths repeatedly

Agents given broad mandates (like 'find bugs') converge on familiar exploration paths, catching high-radius issues but missing narrow situational problems

@RampLabs — How We Made Ramp Sheets Self-Maintaining3
AI AgentsCoding Tools

Weaponize sycophancy with adversarial agent ensembles instead of fighting it

Deploy bug-finder, adversary, and referee agents with scoring incentives that exploit each agent's eagerness to please — triangulating truth from competing biases

@systematicls — How To Be A World-Class Agentic Engineer3
AI AgentsKnowledge Systems

Agents need workflow-level tool strategies, not individual tool instructions — the hard part is how tools combine

In enterprise environments, the challenge isn't finding the right tool but understanding how tools work together; intentionally narrow strategies that capture workflow patterns generalize better than broad abstractions

@tonygentilcore (Tony Gentilcore, Glean) — Trace Learning for Self-Improving Agents3
Knowledge SystemsAI Agents

Knowledge is not memory — ingesting documents is solved, learning from interactions is not

Knowledge (ingesting documents into RAG) is largely solved; memory (learning from task execution to improve future behavior) remains unsolved after 2+ years of industry effort

@hwchase17 (Harrison Chase) — Everything Gets Rebuilt: Agents, Harnesses, and the New Compute Layer2
AI AgentsArchitecture

Conflicting context causes agent collapse, not graceful degradation

When an LLM encounters contradictory information in its context, it enters extended deliberation loops rather than choosing one interpretation — production finding from Hex

@izzymiller (Izzy Miller, Hex) — Building AI Agents for Data Analytics1