Knowledge Graph

@nicbstme (Nicolas Bustamante) — Lessons from Building AI Agents for Financial Services38

Context is the product, not the model

Anyone can call the API — differentiation comes from the data you access, skills you build, UX you design, and domain knowledge you encode

Dan Shipper & Kieran Klaassen (Every) — Compound Engineering33

Compound engineering makes each unit of work improve all future work

The 80/20 ratio (80% plan+review, 20% work+compound) ensures learning compounds across iterations, not just code

AI Product BuildingCoding Tools

Verification is the single highest-leverage practice for agent-assisted coding

Giving an agent a way to verify its own work 2-3x the quality of output — without verification, you're shipping blind

Boris Cherny + Anthropic Official Best Practices28

Jaya Gupta & Ashu Garg — Foundation Capital, Context Graphs26

Decision traces are the missing data layer — a trillion-dollar gap

Systems store what happened but not why; capturing the reasoning behind decisions creates searchable precedent and a new system of record

Manthan Gupta (@manthanguptaa) — How Karpathy's Autoresearch Works And What You Can Learn From It25

A mediocre agent inside a strong harness outperforms a stronger agent inside a messy one

The surrounding machinery — metrics, rollback, scoping, observability — determines autonomous system performance more than model capability

Anthropic Official Best Practices22

The context window is the fundamental constraint — everything else follows

Every best practice in AI coding (subagents, /clear, focused tasks, specs files) traces back to managing a single scarce resource: context

Ryan Carson — Ralph / Autonomous Coding Loop21

Autonomous coding loops need small stories and fast feedback to work

The Ralph pattern ships 13 user stories in 1 hour by decomposing into context-window-sized tasks with explicit acceptance criteria and test-based feedback

@nicbstme (Nicolas Bustamante) + @rohit4verse (Rohit) — agent memory patterns18

Persistent agent memory preserves institutional knowledge that walks out the door with employees

When agents maintain daily changelogs, decision logs, and work preferences, organizational knowledge survives personnel changes

@danshipper + @nicbstme — Agent-Native Architectures + Fintool17

Files are the universal interface between humans and agents

Markdown and YAML files on disk beat databases because agents already know file operations and humans can inspect everything

Boris Cherny — How I Use Claude Code17

Treat AI like a distributed team, not a single assistant

Running 15 parallel Claude streams with specialized roles (writer, reviewer, architect) produces better results than one perfect conversation

@nicbstme — 10 Years Building Vertical Software: My Perspective on the Selloff16

LLMs selectively destroy vertical software moats — 5 fall, 5 hold

Learned interfaces, custom workflows, public data access, talent scarcity, and bundling collapse under LLMs, while proprietary data, regulatory lock-in, network effects, transaction embedding, and system-of-record status remain defensible

@vasuman — AI Agents 10116

Production agents route routine cases through decision trees, reserving humans for complexity

Handle exact matches and known patterns without AI; invoke the model for ambiguity, and route genuinely complex cases to human judgment

Nikunj Kothari — Revealed Preferences16

Proprietary feedback loops create moats that widen with every interaction

When usage generates data that competitors cannot replicate — correction patterns, preference signals, domain-specific edge cases — the product improves faster than any new entrant can catch up

@clarashih (Clara Shih) — Head of Business AI at Meta (former CEO of Salesforce AI), How to Survive the SaaS Reckoning15

SaaS survives as the governance and coordination layer — determinism still rules

When non-deterministic AI feeds into deterministic systems (databases, approvals, audit trails), the deterministic system governs; SaaS is that system

AI Product BuildingCoding ToolsArchitecture

Scaffolding is tech debt against the next model — the bitter lesson applied to product building

Code built to extend model capability 10-20% becomes worthless when the next model ships, making most product scaffolding an ephemeral trade-off rather than a lasting investment

Boris Cherny (@bcherny) — Inside Claude Code With Its Creator, Y Combinator Light Cone podcast15

@nicbstme (Nicolas Bustamante) — Every SaaS Is Now an API14

B2B becomes B2A — agents become the buyer

Software is increasingly consumed by agents, not humans; the agent recommends, the human approves

@ashpreetbedi — Dash (OpenAI-inspired data agent)14

Agents that store error patterns learn continuously without fine-tuning or retraining

Dash's 'GPU-poor continuous learning' separates validated knowledge from error-driven learnings — five lines of code replaces expensive retraining

AI Product BuildingFuture of AIEconomics

Organizational shape is the emerging moat in AI — what AI cannot copy is the institution underneath

When models improve fast, interfaces converge, and product velocity becomes cheap, the durable advantage moves to how a company attracts exceptional people, distributes authority, and compounds judgment over time

@JayaGup10 (Jaya Gupta) — The next biggest moat in AI14

@joeschmidtiv (Joe Schmidt IV, a16z) — Avoiding Death on the Yellow Brick Road14

The system of work is the moat, not the model — the model is fungible underneath

A vertical app company wins by owning the surface where a company's work actually executes — data capture, workflow system of action, and governance — while each new model generation flows through underneath

Synthesis from Supermemory, QMD, and PageIndex architectures14

The three-layer AI stack: Memory, Search, Reasoning

The emerging AI product architecture has three layers — Memory (who is this user), Search (find the right information), Reasoning (navigate complex information) — all running on PostgreSQL

@natashamalpani (Natasha Malpani) — The Verification Economy: The Red Queen Problem (Part III)14

Verification is a Red Queen race — optimizing against a fixed eval contaminates it

Eval suites degrade the moment you use them to improve an agent — the agent adapts to the distribution, and the eval stops measuring what it was designed to measure

Andrej Karpathy — Coding Observations13

Declarative beats imperative when working with agents

Give agents success criteria and watch them go — don't tell them what to do step by step

AI Product BuildingAI AgentsBusiness Models

Domain-specific skill libraries are the real agent moat, not core infrastructure

An elite team can replicate any agent's tool architecture in months, but accumulated domain workflows (LBO modeling, compliance, bankruptcy) represent years of domain expertise

@nicbstme — Lessons from Reverse Engineering Excel AI Agents13

@elvissun (Elvis Sun) — OpenClaw Agent Swarm13

An orchestrator agent that manages other agents solves the parallel coordination problem without human bottleneck

Instead of humans managing AI agents, a meta-agent spawns specialized agents, routes tasks by model strength, and monitors progress — turning agent swarms into autonomous dev teams

@julienbek — Services: The New Software13

Sell the work, not the tool — model improvements compound for services, against software

If you sell the tool, you race the model; if you sell the outcome, every model improvement makes your service faster, cheaper, and harder to compete with

@arscontexta (Heinrich) — Twitter thread on skill graphs13

Skill graphs enable progressive disclosure for complex domains

Single skill files hit a ceiling — complex domains need interconnected knowledge that agents navigate progressively from index to description to links to sections to full content

@izzymiller (Izzy Miller, Hex) — Building AI Agents for Data Analytics12

The context flywheel is a Day 90 moat — Day 0 comparisons are misleading

Point-in-time capability benchmarks miss the compounding advantage: on Day 0 a raw model matches your product, but by Day 90 accumulated context creates an unbridgeable gap

Rohit (@rohit4verse) — How to Build Agents That Never Forget12

Evolving summaries beat append-only memory — rewrite profiles, don't accumulate facts

An evolve_summary() function that rewrites category profiles with new information handles contradictions naturally, unlike append-only logs

Mental ModelsDecision MakingPhilosophy

A latticework of mental models beats isolated facts for real understanding

You can't know anything useful by remembering isolated facts — they must hang on a latticework of theory from multiple disciplines, with 80-90 key models carrying 90% of the freight

Charlie Munger — Poor Charlie's Almanack, Talk 2: Elementary Worldly Wisdom (pp. 164-170)12

Geoff Huntley — Latent Patterns Principles (verification over testing)12

Observability is the missing discipline for agent systems — you can't improve what you can't measure

Agent systems need telemetry (token usage, latency, error rates, cost per task) as a first-class engineering concern, not an afterthought bolted on after production failures

Mental ModelsDecision MakingPhilosophy

Reasoning by analogy has a ceiling — you can never get beyond what already exists by copying what already exists

Analogy is faster, easier, and less mentally taxing — fine for most decisions — but it forecloses any solution outside the existing solution set; first-principles reasoning is the only path that can produce non-incremental answers

@jaynitx — first principles thinking: how to see what everyone else misses12

Nicolas Bustamante (@nicbstme) — Lessons from Building AI Agents for Financial Services12

Markdown skill files may replace expensive fine-tuning

A SKILL.md file that teaches an agent how to do something specific can match domain-specific fine-tuned models — at zero training cost

AI Product BuildingArchitectureKnowledge SystemsAI Agents

Structure plus reasoning beats flat similarity for complex domains

Across documents, code, and skills, the same pattern holds: structured knowledge navigated by reasoning outperforms flat indexes searched by similarity

Recurring pattern across PageIndex, Claude Code agentic search, and @arscontexta skill graphs12

AI Product BuildingCoding ToolsFuture of AI

Build for the model six months from now, not the model of today

AI product builders should target the capability frontier the model hasn't reached yet, because today's PMF gets leapfrogged when the next model ships

Boris Cherny (@bcherny) — Inside Claude Code With Its Creator, Y Combinator Light Cone podcast11

Mental ModelsDecision MakingEconomics

When production constraints dissolve, the bottleneck shifts from execution to judgment

Hiring was hard, code was slow, shipping took months — AI dissolves all three, revealing judgment as the binding constraint that was always there

Alfred Lin (@Alfred_Lin) — AI Adoption vs. AI Advantage11

LangChain — How We Build Evals for Deep Agents11

Evals are behavioral pressure vectors, not neutral measurements — poorly chosen evals distort agent development

Each eval shapes agent behavior like a selection pressure; accumulating tests without strategic purpose creates 'an illusion of improving your agent' while distorting development in unproductive directions, and correctness alone misleads because agents that succeed inefficiently create hidden cost

@danshipper — Agent-Native Architectures (co-authored with Claude)11

In agent-native architecture, features are prompts — not code

The shift from coding specific functions to describing outcomes that agents achieve by composing atomic tools

Charlie Munger — Poor Charlie's Almanack, Talk 11: The Psychology of Human Misjudgment (pp. 523-527)11

First conclusions become nearly permanent — the brain resists its own updates

Inconsistency-Avoidance Tendency means early-formed habits and first conclusions are maintained even against strong disconfirming evidence

AI Product BuildingBusiness ModelsArchitecture

Revealed preferences trump stated preferences — track what users do, not what they say

Users' actual behavior (what they click, skip, edit, redo) is the ground truth for product decisions; stated preferences in surveys and interviews systematically mislead

Nikunj Kothari — Revealed Preferences11

AI Product BuildingKnowledge SystemsCoding Tools

Spec files are external memory that survives context resets

A structured specs/ folder (design.md, implementation.md, decisions.md) bridges human intent and agent execution across sessions

Community pattern — spec-first development (implementations by AWS Kiro, GitHub spec-kit, and multiple Claude Code workflows)11

Mental ModelsPsychologyEngineering

Systems that prevent bad behavior beat moral appeals — design the cash register, not the sermon

People who create mechanisms making dishonest behavior hard to accomplish are more effective than those who preach against dishonesty

Charlie Munger — Poor Charlie's Almanack, Talk 11: The Psychology of Human Misjudgment (pp. 500-511)11

AI Product BuildingFuture of AI

Technology transitions create more of the 'dying' thing, not less

Every predicted death — mainframes, physical retail, traditional media — resulted in growth of both old and new; AI will create more software, not less

@stevesi (Steven Sinofsky) — Death of Software?11

Boris Cherny (@bcherny, Claude Code team) — Twitter reply to @EthanLipnik10

Agentic search beats RAG for live codebases

Claude Code abandoned RAG and vector DB in favor of letting the agent grep/glob/read — reasoning about where to look outperforms pre-indexed similarity search for code

AI Product BuildingFuture of AIDecision Making

AI strategy is a self-rewriting equation — solving one constraint changes which constraint matters next

SaaS metrics were downstream of just two forces (distribution cost + switching cost); AI has many coupled variables — capability, cost, latency, deployment, regulation, talent — each decomposing into sub-curves, so the equation rewrites itself faster than any fixed playbook can track

@JayaGup10 (Jaya Gupta) — Who will set price / intelligence?10

AI Product BuildingAI AgentsDecision Making

Every optimization has a shadow regression — guard commands make the shadow visible

When optimizing metric A, metric B silently degrades unless you run a separate invariant check (a guard) alongside the primary verification

Udit Goenka (@uditg) — autoresearch Claude Code skill v1.6.1 (Guard feature by Roman Pronskiy, JetBrains)10

AI Product BuildingFuture of AICoding Tools

Frontier companies absorb every useful agentic pattern into base products

If a workaround truly extends agent capabilities, OpenAI and Anthropic — the biggest power users of their own models — will build it in, making external dependencies temporary

@systematicls — How To Be A World-Class Agentic Engineer10

OpenAI Codex Team — Harness Engineering: Leveraging Codex in an Agent-First World10

Harness engineering — humans steer, agents execute, documentation is the system of record

OpenAI built a million-line production codebase with zero manually-written code in 5 months. The discipline shifted from writing code to designing the harness: architecture constraints, documentation, tooling, and feedback loops that make agents reliable at scale.

Mental ModelsDecision MakingMathematics

Invert, always invert — many problems are best solved backward

Thinking in reverse is one of the most powerful problem-solving techniques: instead of asking what you want, ask what you want to avoid, then don't do that

Charlie Munger — Poor Charlie's Almanack, Talk 4: Practical Thought About Practical Thought (pp. 299-305)10

AI Product BuildingBusiness ModelsAI Agents

Memory is where agent lock-in lives — without it, agents are commoditized

Stateless model APIs are easily swapped; stateful memory creates a proprietary dataset of user interactions and preferences that makes the agent sticky and differentiated

@hwchase17 (Harrison Chase) — Your harness, your memory10

Learning Technical Concepts chat — discussion of parallel agent workflows10

Parallel agents create a management problem, not a coding problem

When AI agents can work on multiple projects simultaneously, the bottleneck shifts from writing code to coordinating parallel workstreams

@eoghan (Eoghan McCabe, Intercom CEO) — Never Stop Disrupting Yourself: Introducing the Fin API Platform10

Self-disruption follows the value chain downward — software companies must eat their own agent layer before someone else does

Intercom deliberately disrupted their software business with agents, and now disrupts their agent business with AI models, because value accrues to the model layer

PageIndex by VectifyAI — https://github.com/VectifyAI/PageIndex10

Similarity is not relevance — relevance requires reasoning

Vector search finds semantically similar content, but what users need is relevant content, and determining relevance requires LLM reasoning, not just pattern matching

@trq212 (Thariq, Claude Code team) — Lessons from Building Claude Code10

Tool design is continuous observation — see like an agent

Designing effective agent tools requires iterating by watching actual model behavior, not specifying upfront; tools that helped weaker models may constrain stronger ones

@hwchase17 (Harrison Chase) — Continual Learning for AI Agents9

Agents learn at three distinct layers — model weights, harness code, and context configuration

Most people jump to model fine-tuning when discussing agent learning, but learning also happens at the harness layer (code, tools, instructions baked into all instances) and the context layer (per-user or per-tenant configuration like CLAUDE.md and skills)

Mental ModelsDecision Making

AI compresses the distance between idea and execution but not between good and bad judgment

When everyone can build anything, the differentiator stops being speed and starts being judgment — what to build, what to say no to, when to change course

Alfred Lin (@Alfred_Lin) — AI Adoption vs. AI Advantage9

Analysis of Machina (@EXM7777) — 30-Day AI Mastery Roadmap9

Building real projects teaches AI skills faster than following structured curricula

A non-technical user who built a production WhatsApp bot reached 'Operator' level that a 30-day AI mastery roadmap targets — through building, not studying

Anthropic Engineering — Writing Effective Tools for Agents9

Evaluate agent tools with real multi-step tasks, not toy single-call examples

Weak evaluation tasks hide tool design flaws — strong tasks require chained calls, ambiguity resolution, and verifiable outcomes

@kevingu (Kevin Gu) — AutoAgent: First Open Source Library for Self-Optimizing Agents9

Meta-agents that autonomously optimize task agents beat hand-engineered harnesses on production benchmarks

AutoAgent's meta-agent hit #1 on SpreadsheetBench (96.5%) and TerminalBench (55.1%) by autonomously iterating on a task agent's harness for 24+ hours — every other leaderboard entry was hand-engineered

@nicbstme — Lessons from Reverse Engineering Excel AI Agents9

Safety enforcement belongs in tool design, not system prompts

At scale, embedding safety constraints in the tool's API (blocking destructive operations by default) beats relying on behavioral compliance with system prompt instructions

shadcn (via X/Twitter) — /done skill pattern9

Session capture turns ephemeral AI conversations into a compounding knowledge base

shadcn's /done pattern — dumping key decisions, questions, and follow-ups to markdown after each Claude session — applies file-based memory architecture to development workflow

@tonygentilcore (Tony Gentilcore, Glean) — Trace Learning for Self-Improving Agents9

Two-tier agent memory separates organizational workflow knowledge from individual user preferences

Deployment-level memory captures shared tool strategies and sequencing patterns; user-level memory captures personal templates and communication styles — initially skipping user-level had a significant performance impact

@systematicls — How To Be A World-Class Agentic Engineer8

CLAUDE.md should be a routing table, not a knowledge base

Treat CLAUDE.md as a minimal IF-ELSE directory pointing to context files — not a 26,000-line monolith that bloats every session

@RampLabs — How We Made Ramp Sheets Self-Maintaining8

Detect everything, notify selectively — the observability-to-notification ratio determines system trust

Watch every signal but ensure alerts reaching humans always mean something; teams ignore noisy monitors AND noisy agents equally fast

@Vtrivedy10 (Viv) — Better Harness: A Recipe for Harness Hill-Climbing with Evals8

Evals are the gradient signal for harness engineering — the same data quality rigor from ML training applies

The analogy between ML training and agent development is structural: evals encode desired behavior like training data encodes ground truth, and the same principles (data quality, curation, train/test splits) determine outcomes

@JayaGup10 (Jaya Gupta) — Who will set price / intelligence?8

Inference-time compute makes cost-per-outcome a choice — and that's the application layer's counterattack on the labs

No prior software had a dial where 10x more compute buys a better answer; a 10-second and a 10-minute query on the same model are different products at different prices. Margin depends on the system's judgment of where to spend tokens, not on model pricing — the lab wants to expand usage, the application wants to spend only where the outcome is worth it

AI Product BuildingFuture of AIAI Agents

The intelligence-to-judgement ratio determines which professions AI automates first

Intelligence work (complex but rule-based) is already automatable; judgement (experience, taste, intuition) remains human — software engineering crossed the threshold first

@julienbek — Services: The New Software8

@nicbstme (Nicolas Bustamante) — Model-Market Fit8

Model-market fit comes before product-market fit — without it, no amount of product excellence drives adoption

AI startups need a prerequisite layer beneath PMF: the capability threshold where models can actually satisfy market demands. Legal AI crossed it at 87% accuracy; finance AI at 56% hasn't — same demand, opposite outcomes.

@elvissun (Elvis Sun) — OpenClaw Agent Swarm8

Multi-model code review creates adversarial robustness — each model catches what others miss

Using 3 different LLMs to review the same PR exploits the fact that models have different failure modes, creating emergent coverage no single model achieves

@AravSrinivas (Aravind Srinivas, Perplexity CEO) — AI Is the Computer8

AI is the computer — orchestration across 19 models is the product, not any single model

Perplexity launched a unified agent system orchestrating 19 backend models that delegate tasks, manage files, execute code, and browse the web. The differentiation isn't the models — it's the orchestration. 'The computer is the orchestration system.'

@hwchase17 (Harrison Chase) — Your harness, your memory8

Open harnesses with customer-owned databases are the antidote to model-provider lock-in

An open, model-agnostic harness that stores memory in a database you control (Postgres, Mongo, Redis) keeps both model choice and memory portable

@nicbstme (Nicolas Bustamante) — Model-Market Fit8

The 80/99 gap is where AI products die — demo accuracy and production reliability are infinitely far apart

Getting an AI system from 80% demo accuracy to 99% production reliability requires fundamentally different engineering than the first 80% — most teams underestimate this gap by orders of magnitude

@DavidGeorge83 (David George, a16z) — There Are Only Two Paths Left for Software8

Cap headcount, not compute — token spend per engineer replaces headcount as the scaling unit

At $1,000/month per engineer as table stakes, top engineers manage 20-30 agents simultaneously; R&D scales through compute investment, not hiring

Anthropic Engineering — Writing Effective Tools for AI Agents — Using AI Agents8

Tools are a new kind of software — contracts between deterministic systems and non-deterministic agents

Agent tools must be designed for how agents think (context-limited, non-deterministic, description-dependent), not how programmers think

@Vtrivedy10 (Viv) — Better Harness: A Recipe for Harness Hill-Climbing with Evals8

The trace→eval→harness flywheel compounds agent quality — every production interaction generates its own training data

Production traces where agents fail become eval cases; better evals improve the harness; better harnesses produce better traces — creating a self-reinforcing improvement loop

@kevingu (Kevin Gu) — AutoAgent: First Open Source Library for Self-Optimizing Agents8

Traces not scores enable agent improvement — without trajectories, improvement rate drops hard

When AutoAgent's meta-agent received only pass/fail scores without reasoning traces, the improvement rate dropped significantly; understanding why matters as much as knowing that

Mental ModelsDecision Making

Amplification widens the judgment gap — AI magnifies clear thinking into compounding advantage and confused thinking into accelerating waste

Same tools, divergent outcomes — strong teams with clear strategies get faster and more focused, weak teams with vague strategies get noisier and more distracted

Alfred Lin (@Alfred_Lin) — AI Adoption vs. AI Advantage7

@julienbek — Services: The New Software7

Autopilots capture the work budget — six dollars in services for every one in software

Copilots sell tools to professionals; autopilots sell outcomes to end customers and access the vastly larger services TAM from day one

Kushal Byatnal — Extend ($1M+ ARR, 3 engineers)7

Boring tech wins for AI-native startups — simpler stack means faster AI-assisted shipping

React + Node + TypeScript + Postgres + Redis scales to $1M ARR with 3 engineers; monorepo is a superpower for AI coding assistants

AI Product BuildingFuture of AI

AI automation amplifies demand for expert human judgment rather than replacing it

Pre-labeling cuts costs 100,000x for simple tasks, but projects that needed 500 contributors now need 100 doing far higher-value work at up to $200/hour

@GowriShankarNag — #MarketMapMondays: The Label Paradox of AI, Antler India7

Charlie Munger — Poor Charlie's Almanack, Talk 11: The Psychology of Human Misjudgment (pp. 556-563)7

Excessive self-regard makes fixable failures persist — people excuse poor performance instead of correcting it

The Tolstoy effect causes people to rationalize fixable shortcomings rather than address them, requiring meritocratic culture and objective evaluation as antidotes

@izzymiller (Izzy Miller, Hex) relaying Barry McCardel (Hex CEO) — Building AI Agents for Data Analytics7

Sand vs Stone — if models double in capability tomorrow, what washes away and what remains?

Framework for evaluating AI product durability: context flywheels and domain expertise are stone; model workarounds and clever engineering are sand

@kevingu (Kevin Gu) — AutoAgent: First Open Source Library for Self-Optimizing Agents7

Self-improving agents overfit to eval metrics — the meta-agent games rubrics unless structurally constrained

AutoAgent's meta-agent gets lazy, inserting rubric-specific prompting so the task agent can game metrics; defense requires forcing self-reflection on generalizability

@aparnadhinak (Aparna Dhinakaran) — Hermes Harness Architecture7

Sessions are runtime infrastructure, not just resumable transcripts

Hermes stores sessions in SQLite with search and lineage so CLI, messaging platforms, and scheduled jobs all attach to one session plane — routing can resolve before the model even runs

Mental ModelsEngineeringDecision Making

Speed without feedback amplifies errors — agents lack the self-correction mechanism that constrains human mistakes

Humans serve as natural bottlenecks who self-correct after repeated mistakes; agents perpetuate identical errors indefinitely at unsustainable rates

Mario Zechner — Thoughts on Slowing the Fuck Down7

@satyanadella (Satya Nadella) — A frontier without an ecosystem is not stable7

The learning loop becomes the firm's new IP — a hill-climbing machine that compounds unlike any other asset

Every improved workflow generates better training signal, which accelerates the accumulation of tacit knowledge unique to the firm; companies that build this loop early gain an advantage that's hard to replicate regardless of any new model capability

Rohit (@rohit4verse) — How to Build Agents That Never Forget7

Treat an agent as an operating system, not a stateless function

Agents need RAM (conversation context), a hard drive (persistent memory), garbage collection (decay/pruning), and I/O management (tools) — the OS mental model unlocks architectural clarity

AI Product BuildingBusiness ModelsArchitecture

The UI moat collapses — API quality becomes the purchasing criterion

When agents are the primary users of software, beautiful dashboards stop mattering and API design becomes the competitive surface

@chrysb (Chrys Bader) + @nicbstme (Nicolas Bustamante) — Apps Are Dead + Every SaaS Is Now an API7

@JayaGup10 (Jaya Gupta) — The Trillion Dollar Loop B2B Never Had6

Agent edits are automatic decision instrumentation — every human correction is a structured signal

When agents propose and humans edit, the delta between proposal and correction captures tacit judgment as first-class data without requiring manual logging

@rohit4verse (Rohit) — The Missing Layer in Your Agentic Stack6

Accumulated agent traces produce emergent world models — discovered, not designed

When agent decision trajectories accumulate over time, they form a context graph that reveals entities, relationships, and constraints nobody explicitly modeled

@danshipper (Dan Shipper) — 'What personal software actually is' (tweet thread)6

Agent trust transfers from human credibility — colleagues adopt agents operated by people they trust

When a human's agent consistently performs well, other team members inherit that trust and willingly depend on the agent, creating a credibility chain

@ivanhzhao (Ivan Zhao, Notion CEO) — Steam, Steel, and Infinite Minds6

AI is steel for organizations — when software carries the context, human communication stops being the load-bearing wall

Before steel, buildings capped at six or seven floors because iron buckled under its own weight; AI that maintains context across workflows removes human communication (meetings, messages) as the structure that caps how far an org can scale before it degrades

AI Product BuildingDecision MakingFuture of AI

Building in AI is running a trading book — you're long some curves, short others, and exposed to correlations that break when they matter

Value in AI is never captured once and defended; it's continuously repriced and relocated. Durable companies know which assumptions they're long and which they're short, choose which variables to bet on, know which can kill them, and build to recover faster than a wrong bet can compound

@JayaGup10 (Jaya Gupta) — Who will set price / intelligence?6

Charlie Munger — Poor Charlie's Almanack, Talk 2: Elementary Worldly Wisdom (pp. 196-200)6

Circle of competence determines where you can win

Every person has a circle of competence — playing inside it with discipline compounds advantage, playing outside it guarantees loss, and it's very hard to enlarge

AI Product BuildingAI AgentsBusiness Models

Closed harnesses behind APIs create memory lock-in by design

When the harness lives behind a proprietary API, memory state and schema become invisible and non-portable — model providers are incentivized to push more of the harness behind their APIs

@hwchase17 (Harrison Chase) — Your harness, your memory6

@nicbstme — The LLM Context Tax: Best Tips for Tax Avoidance6

Context inefficiency compounds three penalties: cost, latency, and quality degradation

Every wasted token in an LLM context window doesn't just cost money — it slows responses and degrades output quality, creating a triple tax on production agents

@jasonscui — Your Data Agents Need Context, a16z6

Context layers must be living systems, not static artifacts

Unlike semantic layers that rot when maintainers leave, context layers need self-updating feedback loops where agent errors refine the context corpus

Ashpreet Bedi — Memory: How Agents Learn (Agno Framework)6

Cross-user knowledge transfer works without fine-tuning — just a database and prompt engineering

When one person teaches an agent something, another person benefits automatically — no RLHF, no training infrastructure, just structured storage and retrieval

AI Product BuildingFuture of AIKnowledge Systems

Don't be the discriminator — be the patron, not the judge

Taste (selecting from AI output) is the function that gets automated first; participating in creation through friction and will is what endures

@WillManidis (Will Manidis) — Against Taste6

@aparnadhinak (Aparna Dhinakaran) — Hermes Harness Architecture6

Policy enforcement must run independently of model cooperation — hooks, not prompt instructions

Hermes runs lifecycle hooks that block, rewrite, or audit operations at fixed events, so policy and side-effects never depend on the model choosing to comply

@JayaGup10 (Jaya Gupta) — The next biggest moat in AI6

Emotional promises must be structural promises — if the structure doesn't back the pitch, the promise is fake

Each cultural claim — ownership, customer proximity, speed, talent density — is a structural commitment about decision rights, status hierarchy, and authority allocation; misalignment between the two reads as fake even when candidates can't articulate it

AI Product BuildingCoding ToolsFuture of AI

Engineering is no longer the junior partner — at the frontier, research and engineering have fused

The researcher who can build the harness, the eval, and the data pipeline is the one whose hypotheses actually get tested; everyone else waits in a queue. The split between 'has ideas' and 'can run them' has collapsed into one role

@itsreallyvivek (vivek) — how to be good at research6

Yoonho Lee et al. — Meta-Harness: End-to-End Optimization of Model Harnesses (arXiv:2603.28052)6

Evolved harnesses transfer across models — a single optimized harness improves five different LLMs

Meta-Harness discovered a retrieval harness that improved math reasoning by 4.7 percentage points average across five held-out models it was never optimized for, suggesting harness quality is model-agnostic

@jaynitx — first principles thinking: how to see what everyone else misses6

First-principles thinking is uncomfortable because it transfers responsibility — analogy outsources blame to 'best practices'

When you reason by analogy you have a defense ('I did what everyone said'); when you reason from first principles you own the outcome. The discomfort most people feel about first-principles thinking is responsibility, not difficulty

Mental ModelsPsychologyEconomics

Incentive-caused bias makes good people rationalize harmful behavior

People don't consciously choose to be unethical — incentive structures cause them to drift into immoral behavior and then rationalize it as virtuous

Charlie Munger — Poor Charlie's Almanack, Talk 11: The Psychology of Human Misjudgment (pp. 505-514)6

AI Product BuildingKnowledge SystemsArchitectureAI Agents

Intelligence location — code vs prompts — determines system fragility and flexibility

Critical architectural fork: prompt-driven systems (Pal's 400-line routing prompt) are flexible but break when models change; code-driven systems (our validate-graph.js) are rigid but reliable — best systems need both

Ayush Jhunjhunwala — KG Architecture Comparative Research (10+ systems analyzed)6

Ayush Jhunjhunwala — KG Architecture Comparative Research (10+ systems analyzed)6

Knowledge evolution is the biggest unsolved problem across all graph architectures

Almost nobody has solved how knowledge graphs grow without rotting — most are append-only, auto-decay is too aggressive, and even the best systems only add links without pruning, merging, or detecting contradictions

@nicbstme — The LLM Context Tax: Best Tips for Tax Avoidance6

KV cache hit rate is the most critical metric for production agents

Maintaining stable prompt prefixes and append-only context architecture maximizes cache reuse, dramatically reducing both cost and latency for agentic workflows

@JayaGup10 (Jaya Gupta) — Who will set price / intelligence?6

In AI the threat is layer migration, not a competitor — work relocates across layers when any variable moves

In SaaS a rival company killed you; the one exception was platform dependency, where a pricing or terms change could wipe you out. AI makes that exception the default — work migrates into the model, an open-weight alternative, the customer's data platform, an agent runtime, or the device itself

@joeschmidtiv (Joe Schmidt IV, a16z) — Avoiding Death on the Yellow Brick Road6

Routing across the whole model market — and absorbing every migration — is a defense the labs can't copy

A vertical company picks the best model per sub-task across all vendors, absorbs eval/migration work on every upgrade, and sells the lowest cost for the exact intelligence each step needs

@JayaGup10 (Jaya Gupta) — The Trillion Dollar Loop B2B Never Had6

Permissioned inference is harder than permissioned retrieval — enterprise context graphs need reasoning-level access control

Controlling who sees data is solved; controlling whose history shapes reasoning for others is the unsolved trust layer enterprise context graphs require

@JayaGup10 (Jaya Gupta) — Who will set price / intelligence?6

The price of intelligence is the new organizing axis — labs, applications, and countries are all fighting to set it

The cost of intelligence is no longer an input to software but the axis around which companies, markets, and geopolitics reorganize; labs want usage routed through them, applications want to allocate intelligence better than labs, countries want it cheap enough to be national infrastructure

Anthropic documentation — Prompt Caching6

Prompt caching makes long context economically viable

Prefix-matching cache enables 80%+ cost reduction for multi-turn conversations, making rich context systems affordable at scale

@ivanhzhao (Ivan Zhao, Notion CEO) — Steam, Steel, and Infinite Minds6

The gains come from redesigning work around AI, not bolting AI onto human workflows

Like factory owners who first swapped waterwheels for steam engines and changed nothing else (modest gains), today's orgs bolt chatbots onto human-designed workflows — the explosion comes only when the work is redesigned around agents

AI Product BuildingCoding ToolsEngineering

Research speed is mostly the speed at which you discover you're wrong — which makes tooling a first-class research activity

The edge isn't a stroke of genius but volume: more runs per day, more wrong ideas discarded per week, a faster-updating model of reality. That makes one-command runs, config-reproducible experiments, and seconds-not-archaeology run comparison core research work, not chores

@itsreallyvivek (vivek) — how to be good at research6

Manthan Gupta (@manthanguptaa) — How Karpathy's Autoresearch Works; Andrej Karpathy — autoresearch program.md6

Rollback safety nets enable autonomous iteration — not model intelligence

The minimum viable safety net for autonomy is a quantifiable metric, atomic changes, and automatic rollback — these make cheap failure possible, which makes aggressive exploration safe

@DavidGeorge83 (David George, a16z) — There Are Only Two Paths Left for Software6

The comfortable middle is over — software companies must either accelerate AI growth or rebuild for 40%+ margins

Growth-path companies ship AI-native products in 4-person pods with token-based pricing; margin-path companies flatten management, raise prices, and let low-value customers churn — anything in between faces multiple compression

@itsreallyvivek (vivek) — how to be good at research6

A loss curve is reassurance, not analysis — pull a hundred failures and read every one

Experiments throw off far more information than you consume — transcripts, failure cases, the strange tail — and most of it dies unread. Most ML bugs live in the data and fail silently; Ng's move is to pull 100 failures, sort them into piles, and attack the biggest pile

AI Product BuildingBusiness ModelsAI Agents

Task horizon breaks seat-based pricing — usage scales with workflow depth × length, not headcount

Task horizon is the length dial: how long an AI works on its own before a human steps in. The unit shifted from the call to the workflow — agents run for hours, spawn sub-agents, and burn millions of tokens per decision path, so usage stops scaling with seats; multiply length by depth to get the token bill

@JayaGup10 (Jaya Gupta) — Who will set price / intelligence?6

Mental ModelsEconomicsEngineering

Technology helps moat businesses but kills commodity businesses

In commodity businesses, productivity improvements flow entirely to customers; in businesses with competitive advantages, the same improvements go to the bottom line — most people fail to do this second step of analysis

Charlie Munger — Poor Charlie's Almanack, Talk 2: Elementary Worldly Wisdom (pp. 192-198)6

@hwchase17 (Harrison Chase) — Continual Learning for AI Agents6

Traces are the universal substrate for agent learning — all three layers consume the same execution logs

Whether updating model weights, improving harness code, or refining context/memory, agent learning flows start from the same raw material: traces capturing the full execution path of what an agent did

@jasonscui — Your Data Agents Need Context, a16z6

Tribal knowledge is the irreducible human input that enables agent automation

Automated context construction handles most of the corpus, but the most critical context is implicit, conditional, and historically contingent — only humans can provide it

@hwchase17 (Harrison Chase) — Your harness, your memory5

Agent harnesses are persistent infrastructure, not scaffolding models will absorb

As models improve, old scaffolding disappears but new scaffolding replaces it — harnesses aren't going away, they're evolving

AI Product BuildingAI AgentsBusiness Models

The intelligence lives in the workflow, not the model — and a model can't simply read it

In a real vertical, the decisive logic (what to escalate, which rule wins, when a human signs off) lives in SOPs and operational experience; the agentic workflow encodes it and becomes the carrier's operating memory

@joeschmidtiv (Joe Schmidt IV, a16z) — Avoiding Death on the Yellow Brick Road5

@RampLabs — How We Made Ramp Sheets Self-Maintaining5

Auto-generated narrow monitors beat handwritten broad checks — a tight mesh over the exact shape of the code

1,000+ AI-generated monitors that each target specific code paths catch more bugs than 10 hand-written checks that cover general categories

Mental ModelsPsychologyEconomics

Being chosen vs being seen — emotional validation captures founder-level intensity at employee-level structure

Being chosen is emotional: you are special, we believe in you, you belong here. Being seen is structural: scope, authority, economic participation, decision rights — the trap is paying in identity what you don't want to pay in structure

@JayaGup10 (Jaya Gupta) — The next biggest moat in AI5

@joeschmidtiv (Joe Schmidt IV, a16z) — Avoiding Death on the Yellow Brick Road5

If a problem improves directly with raw model capability, the labs will take it

The Yellow Brick Road test — work that gets better with every pre/post-training dollar (code, writing, images) belongs to the labs; work whose value comes from scaffolding is defensible

AI Product BuildingDecision MakingFuture of AI

Reason backward from an outcome you want to exist — it manufactures originality that absorbed problems can't

Absorbed problems hand you the conclusion without the reasoning, on a crowded racetrack; choosing an outcome you genuinely want and reasoning backward to the experiments drags you into territory no survey paper covers

@itsreallyvivek (vivek) — how to be good at research5

@GowriShankarNag — #MarketMapMondays: The Label Paradox of AI, Antler India5

Commodity work's terminal value is zero but structured expert judgment compounds indefinitely

Appen collapsed from $4.5B to $140M as LLMs displaced commodity annotation, while Scale AI reached $29B by owning expert alignment infrastructure — the market is bifurcating

@aparnadhinak (Aparna Dhinakaran) — Hermes Harness Architecture5

Compression should be a forking lifecycle event, not a destructive rewrite

Instead of repeatedly overwriting one transcript, Hermes seeds a child session from each summary and records parent-child lineage — producing an auditable chain of compressions

Charlie Munger — Poor Charlie's Almanack, Talk 11: The Psychology of Human Misjudgment (pp. 599-604)5

Confluence of tendencies produces extreme outcomes — lollapalooza effects emerge when multiple psychological biases push the same direction

When several psychological tendencies combine toward the same outcome, the result is not additive but explosive — Munger's checklist method diagnoses these compound failures

@businessbarista (Alex Lieberman) quoting @da_fant (David Fant)5

Context centralization is why coding AI works — git is a solved context repository, knowledge work has no equivalent

Engineering AI leads because git centralizes all context in one versioned repository; knowledge work fails on three axes: distributed, unstructured, unverifiable

@aparnadhinak (Aparna Dhinakaran) — Hermes Harness Architecture5

Delegation is not orchestration — durable, externally-steerable child runs are the architectural leap

Hermes can spawn child runs with their own task IDs that return structured summaries, but they die with the parent; true orchestration needs run IDs, lifecycle control, and steering that survive parent completion

AI Product BuildingDecision MakingKnowledge Systems

Shared inputs produce shared conclusions worth nothing — old and cross-disciplinary material is criminally underpriced

If your information diet is trending arxiv plus the group chat, you reach the same conclusions as everyone else at the same time, which makes them worthless. Old material (MoE 1991, LSTMs 1997, the bitter lesson) and cross-disciplinary range are underpriced sources of differentiated ideas

@itsreallyvivek (vivek) — how to be good at research5

@joeschmidtiv (Joe Schmidt IV, a16z) — Avoiding Death on the Yellow Brick Road5

Guardrails aren't just safety — they're what the customer is paying for

Per-use-case, per-customer, continuously-audited governance is the product in a regulated vertical; becoming the compliance control plane is a moat a horizontal player can't credibly hold

@Vtrivedy10 (Viv) — Better Harness: A Recipe for Harness Hill-Climbing with Evals5

Holdout eval sets are the generalization gate for autonomous harness optimization — without them, the loop overfits

Autonomous harness hill-climbing tends to overfit to the optimization set; splitting evals into optimization and holdout categories — mirroring ML train/test splits — is the structural defense

QMD by Tobi Lütke, pg_textsearch by Timescale, TigerData BM25 article5

Hybrid search is the default, not the exception

Neither keyword nor semantic search alone is complete — combining BM25 and vector search with reranking is the baseline for production systems

Ayush Jhunjhunwala — KG Architecture Comparative Research (10+ systems analyzed)5

Knowledge systems need dual-layer storage — narrative depth and structured queries can't share a format

Every system beyond 'markdown files in a folder' discovers that narrative depth (rich prose, context, reasoning) and structured querying (filter, aggregate, cross-reference) need different storage layers with a routing mechanism between them

@nicbstme — 10 Years Building Vertical Software: My Perspective on the Selloff5

LLM competition fragments markets from 3 incumbents to 300

When LLMs lower the cost of building vertical software, competition doesn't add one new entrant — it explodes combinatorially, explaining market repricing before revenue loss

@tobi (Tobi Lutke, Shopify CEO) — Pi and Clawdbot5

Malleable software — a tiny core that writes its own plugins — replaces fixed-feature applications

Instead of adapting your workflow to the tool, the tool observes your workflow and extends itself to match it

@hwchase17 (Harrison Chase) — Your harness, your memory (citing Sarah Wooders, Letta)5

Memory is a harness responsibility, not a pluggable component

Managing context — what enters, what survives compaction, what's queryable — is a core capability of the harness itself, not an add-on service

@izzymiller (Izzy Miller, Hex) — Building AI Agents for Data Analytics5

Model compensations become liabilities as capabilities advance — yesterday's fixes hobble today's agent

Engineering workarounds for earlier model limitations accumulate as technical debt that actively degrades agent performance when models improve

@trq212 (Thariq) — Lessons from Building Claude Code: How We Use Skills5

Metadata consumed by LLMs needs trigger specifications, not human summaries

When an LLM scans metadata to decide what to invoke, the description should specify when to activate — not summarize what the thing does — because LLMs are a fundamentally different consumer than humans

@satyanadella (Satya Nadella) — A frontier without an ecosystem is not stable5

The sovereignty test — can you swap out a generalist model without losing your 'company veteran' expertise?

A firm controls its IP only if it can switch the underlying generalist model while keeping the company-veteran expertise built into its learning system; that portability is the test of control and sovereignty in the AI era

AI Product BuildingFuture of AIDecision Making

New technology first imitates the medium it replaces — the transition form hides the final form

Early phone calls were telegram-terse, early movies were filmed stage plays, and today's AI is a chatbot mimicking a search box; McLuhan's 'driving into the future via the rearview window' is why we mistake the imitation phase for the destination

@ivanhzhao (Ivan Zhao, Notion CEO) — Steam, Steel, and Infinite Minds5

AI Product BuildingFuture of AIKnowledge Systems

You can offload a task, or even a job, but you can never offload your learning

The real opportunity isn't picking the best model — it's building a learning loop on top of models where the firm's accumulated learning, the one thing it can't outsource, compounds across people and AI

@satyanadella (Satya Nadella) — A frontier without an ecosystem is not stable5

@systematicls — How To Be A World-Class Agentic Engineer5

One session per contract beats long-running agent sessions

Fresh context per task contract outperforms 24-hour agent sessions because cross-contract context bloat degrades performance by construction

@GowriShankarNag — #MarketMapMondays: The Label Paradox of AI, Antler India5

Platform economics beat labor arbitrage — margins fund flywheels that body shops cannot

Scale AI's 50%+ gross margins fund ML pre-labeling and workflow optimization, creating a flywheel; Indian BPOs at 10-15% margins cannot invest in R&D and remain trapped competing on price

@satyanadella (Satya Nadella) — A frontier without an ecosystem is not stable5

Private evals should measure business outcomes that matter — not external benchmarks

A firm's learning loop runs on private evals tied to real business outcomes and private RL environments trained on internal traces, so the model improves against what the company cares about rather than public leaderboards

Geoff Huntley — Latent Patterns Principles (verification over testing)5

Property-based testing explores agent input spaces that example-based tests miss

Generative tests that produce random or adversarial inputs discover edge cases in agent behavior that hand-written examples never cover — verification over testing means proving properties, not checking cases

@rohit4verse (Rohit) — The Missing Layer in Your Agentic Stack5

Reasoning evaporation permanently destroys agent decision chains when the context window closes

An agent's multi-step reasoning exists only in the context window; when the session ends, the output survives but the decision chain — why each step was taken — is gone forever

@zain_hoda (Zain Hoda, Vanna AI) — The Agent Will Eat Your System of Record5

Agents eat your system of record — the rigid app was the constraint, not the schema

When agents can clone your entire CRM in seconds and become the real interface, the SaaS product becomes a dumb write endpoint. Data moats evaporate because agents eliminate the rigid app that demanded rigid schemas.

@Konstantine (Konstantine Buhler, Sequoia Capital) — The Great SaaS Consolidation + AI Ascent 20255

AI won't destroy SaaS moats — it'll make the biggest ones even bigger

Enterprise SaaS consolidates rather than fragments: we could see 5-10 individual trillion-dollar SaaS companies. Moats are people, relationships, and enterprise integrations — not code. Cheaper AI-built software doesn't overcome distribution advantages.

Mental ModelsEconomicsDecision Making

Scale advantages cascade toward dominance until bureaucracy kills them

Advantages of scale — cost curves, social proof, informational edge, advertising reach — compound toward winner-take-all, but large organizations breed bureaucracy and territoriality that can undo every advantage

Charlie Munger — Poor Charlie's Almanack, Talk 2: Elementary Worldly Wisdom (pp. 174-192)5

@aparnadhinak (Aparna Dhinakaran) — Hermes Harness Architecture5

Separate tool registration from tool exposure — install broadly, reveal narrowly

Hermes registers all tools into a central registry at import time but a separate layer decides what each run actually shows the model, scoped by platform and scenario

@ivanhzhao (Ivan Zhao, Notion CEO) — Steam, Steel, and Infinite Minds5

Humans should supervise agent loops from a leveraged point, not sit inside every one

Human-in-the-loop isn't always desirable — putting a person inside every iteration is the Red Flag Act (a man legally required to walk ahead of every car waving a flag); the goal is leveraged oversight of many loops, not manual inspection of each

AI Product BuildingDecision MakingKnowledge Systems

Taste is a muscle, not a gift — train it by forecasting every result before you see it

Predict the outcome of every experiment before running it, guess a paper's numbers from the method alone, call which releases will matter in two years and check your hit rate; a forecast plus a correction, repeated a few hundred times, trains the model in your head the way it trains any other model

@itsreallyvivek (vivek) — how to be good at research5

AI Product BuildingCoding ToolsFuture of AI

Technical knowledge can become a liability when working with AI

Experts get stuck on implementation details while novices describe outcomes and ship faster

Editorial synthesis — informed by @WorkflowWhisper (Alton Syn) automation tweets + Karpathy coding observations5

Shekhar Kirani (Accel India) — LinkedIn post on a conversation with Jason Graefe (Microsoft) about getting AI working inside enterprises5

The first enterprise-AI sale is a trust sale — buyers judge de-risking, not capability

Early AI deployments are bought on reliability, control, and whether the system can be trusted in real workflows; founders systematically over-index on demonstrating capability and under-index on de-risking the buyer

Manthan Gupta (@manthanguptaa) — How Karpathy's Autoresearch Works And What You Can Learn From It5

Time-bounded evaluation forces optimization for real-world usefulness instead of idealized performance

A fixed wall-clock budget per experiment makes results comparable, normalizes across hardware, and forces agents to optimize for improvement per unit time

@natashamalpani (Natasha Malpani) — The Verification Economy: The Red Queen Problem (Part III)5

Trust boundaries must be externalized — not held in engineers' heads

Where an agent's behavior is well-understood vs. unknown should be mapped, made auditable, and connected to deployment gates — not left as implicit tribal knowledge

@aparnadhinak (Aparna Dhinakaran) — Hermes Harness Architecture5

Unattended agent jobs must run through the same permission machinery as interactive sessions

Hermes makes cron a first-class subsystem — scheduled jobs are gated by the same permissions, delivered through the same paths, and isolated per profile, instead of living as peripheral scripts

AI Product BuildingBusiness ModelsAI Agents

Vertical models beat frontier models in their domain — specialization wins on every metric

Intercom's Apex, a specialized customer service LLM, beat every frontier model including Anthropic and OpenAI on resolution rate, latency, hallucination rate, and cost

@eoghan (Eoghan McCabe, Intercom CEO) — Never Stop Disrupting Yourself: Introducing the Fin API Platform5

@joeschmidtiv (Joe Schmidt IV, a16z) — Avoiding Death on the Yellow Brick Road5

The data flywheel is a UX problem — only vertical workflow surfaces can capture the knowledge

Two stacked flywheels (across-customer pattern recognition + within-customer tacit rules) accrue only through workflow-specific capture surfaces that horizontal tools structurally cannot shape

Chrome for Developers — WebMCP announcement (https://developer.chrome.com/blog/web-mcp)5

WebMCP turns websites into agent-native interfaces

Chrome's MCP integration lets websites expose structured tools to agents instead of agents scraping and guessing at UI elements

@satyanadella (Satya Nadella) — A frontier without an ecosystem is not stable4

A frontier without an ecosystem is not stable — if a few models capture all value, the political economy won't tolerate it

The priority must be building a frontier ecosystem, not just a frontier model, so value flows broadly across companies and industries; concentrating all returns in a few models that hollow out industries has no societal permission and is not a stable equilibrium

AI Product BuildingCoding ToolsDecision Making

Adversarial branch-walking beats review for planning — walk every design branch until resolved

The most effective planning intervention is not post-hoc review or divergent brainstorming but convergent, exhaustive questioning that traverses each branch of the decision tree with recommended answers

@mattpocockuk (Matt Pocock) — grill-me skill (mattpocock/skills, 9.5K stars, 151K views)4

AI Product BuildingFuture of AIAI Agents

AI's self-improvement loop means each generation builds the next one faster

GPT-5.3-Codex was instrumental in creating itself — recursive improvement compresses timelines and explains why building for obsolescence is the only safe strategy

@mattshumer_ (Matt Shumer) — Something Big Is Happening4

@vxanand (Varun Anand, Clay co-founder) — Clay's Operating Principles4

Ask for 'no' not 'yes' — default-proceed framing accelerates organizational decisions

Framing proposals as 'I will do X unless you object' rather than 'Can I do X?' shifts the decision burden, maintains momentum, and shows ownership while preserving space for input

Vishnu Suresh (LangChain) — How My Agents Self-Heal in Production4

Causal triage must gate automated fixes — statistical regression detection alone can't distinguish your bugs from external failures

Raw error-rate spikes after deployment can't tell you whether YOUR code broke or a third-party API went down; a triage agent that establishes causal links between code changes and observed errors must gate any automated fixing agent

AI Product BuildingKnowledge SystemsFuture of AI

A clear public explanation is a genuine contribution and an unfakeable credential

Fields choke on undigested ideas, so distilling something hard into a clear explanation is real work, not a service job — and a body of public writing doubles as the strongest credential you can hold, because it's an unfakeable sample of how you think

@itsreallyvivek (vivek) — how to be good at research4

Ayush Jhunjhunwala — KG Architecture Comparative Research (10+ systems analyzed)4

Compilation scales but curation compounds — two camps for knowledge graph construction

LLM-compiled systems (Karpathy, Pal) grow fast by feeding raw content through model judgment; human-curated systems (our graph, brainctl) grow slowly but every node is validated — compilation scales linearly, curation compounds through connections

@jasonscui — Your Data Agents Need Context, a16z4

Context layers supersede semantic layers for agent autonomy

Traditional semantic layers handle metric definitions but agents need a superset: canonical entities, identity resolution, tribal knowledge instructions, and governance guidance

@jasonscui — Your Data Agents Need Context, a16z4

Data agent failures stem from missing business context, not SQL generation gaps

The industry initially blamed text-to-SQL capability for data agent failures, but the real blockers are undefined business definitions, ambiguous sources of truth, and missing tribal knowledge

@danshipper (Dan Shipper) — 'What personal software actually is' (tweet thread)4

Deputies and Sheriffs — distributed agent teams with hierarchical authority replace centralized software

Individual employees train specialized 'Deputy' agents while organizational 'Sheriff' agents manage permissions, rules, and onboarding across the team

Rohit (@rohit4verse) — How to Build Agents That Never Forget4

Embeddings measure similarity, not truth — vector databases have a temporal blind spot

Vector search can't resolve contradictions or understand time; 'I love my job' and 'I'm quitting' retrieve with equal confidence

@Vtrivedy10 (Viv) — Better Harness: A Recipe for Harness Hill-Climbing with Evals4

Eval suites must shrink, not just grow — spring cleaning prevents stale behavioral pressure

Saturated evals waste compute without providing signal; more intelligent models or changed desired behaviors make old evals irrelevant, requiring regular pruning alongside addition

AI Product BuildingFuture of AICoding Tools

Every role codes when implementation cost drops to zero — the generalist builder replaces the specialist engineer

When AI handles implementation, the title 'software engineer' gives way to generalist builders who code, write specs, design, and talk to users

Boris Cherny (@bcherny) — Inside Claude Code With Its Creator, Y Combinator Light Cone podcast4

@hwchase17 (Harrison Chase) — Context Engineering Our Way to Long-Horizon Agents4

The first-draft pattern is the killer app for long-horizon agents — agents produce, humans review

Long-horizon agents produce comprehensive first drafts (PRs, analyses, reports) that humans verify — this is where the 10x productivity gain actually lives

Yoonho Lee et al. — Meta-Harness: End-to-End Optimization of Model Harnesses (arXiv:2603.28052)4

Full trace filesystems beat compressed summaries for harness optimization — 10M tokens of context outperforms 26K

Meta-Harness gives its proposer agent a filesystem containing full source code, scores, and execution traces of every prior candidate, enabling up to 10M tokens of diagnostic context per iteration — dramatically outperforming prior methods limited to 26K compressed tokens

Mental ModelsEconomicsPhilosophy

Great companies are wrappers around a kind of person — institutions that make a new kind of person possible

The most important companies are organizational inventions: they create a new kind of institution around a new kind of work, and in doing so, they let a certain kind of talent finally express themselves

@JayaGup10 (Jaya Gupta) — The next biggest moat in AI4

@satyanadella (Satya Nadella) — A frontier without an ecosystem is not stable4

Every firm must build two capitals — human capital and token capital — and they compound together, not at each other's expense

Human capital is the knowledge, judgment, relationships, and pattern recognition of a firm's people; token capital is the AI capability it builds and owns — and human capital only becomes MORE valuable as token capital grows

@hwchase17 (Harrison Chase) — Everything Gets Rebuilt: Agents, Harnesses, and the New Compute Layer4

Memory defines the agent — a zip of markdown files IS the agent, and portable memory between harnesses is the frontier

An agent IS its memory — a zip of markdown (system prompt + skills + tools) defines its identity; making that portable between harnesses is the current frontier

Ayush Jhunjhunwala — KG Architecture Comparative Research (10+ systems analyzed)4

Navigation beats search for knowledge retrieval — let each data source keep its native query interface

Vector similarity search flattens everything into one embedding space, losing native query affordances; better to let SQL be SQL, files be files, and build a routing layer that picks the right source per question type

@vxanand (Varun Anand, Clay co-founder) — Clay's Operating Principles (value coined by George Dilthey)4

Non-attached action enables clearer course correction — detach from outcomes to see reality

Acting without attachment to being right, to a specific outcome, or to whose idea it was lets you see when something isn't working and change course without ego friction

@aparnadhinak (Aparna Dhinakaran) — Hermes Harness Architecture4

Order the system prompt by volatility to keep prompt prefixes cache-friendly

Hermes composes the system prompt in three tiers — stable, context, volatile — so the unchanging prefix stays cacheable while turn-by-turn data lives at the end

Mental ModelsPsychologyEconomics

Pavlovian association builds durable brand moats that compound for over a century

Brands are conditioned reflexes — the trade name is the stimulus, purchase is the response, and Pavlovian association with things consumers admire creates advantages that scale economics alone cannot explain

Charlie Munger — Poor Charlie's Almanack, Talk 4: Practical Thought About Practical Thought (pp. 305-315)4

@danshipper (Dan Shipper) — 'What personal software actually is' (tweet thread)4

Personal software grows through relationship, not configuration

Unlike traditional SaaS where users adapt to the tool, personal software agents grow personality and skills in response to their user through ongoing interaction

Mental ModelsDecision MakingPhilosophy

Peter Thiel's question is a detector for actual first-principles thinking — if your conclusions match the crowd, you're analogizing

'What important truth do very few people agree with you on?' is the diagnostic — most people can't answer because most people reason by analogy and end up with the same conclusions as everyone else

@jaynitx — first principles thinking: how to see what everyone else misses4

Charlie Munger — Poor Charlie's Almanack, Talk 5: The Need for More Multidisciplinary Skills (pp. 327-336)4

The pilot training model builds reliable knowledge — fluency, checklists, and maintenance prevent cognitive failure

Just as pilot training uses six elements to prevent fatal errors — wide coverage, practice-based fluency, forward and reverse thinking, importance-weighted allocation, mandatory checklists, and regular maintenance — the same structure should govern all serious professional education

OpenAI — https://openai.com/index/scaling-postgresql/4

PostgreSQL scales further than you think

OpenAI runs ChatGPT on one PostgreSQL primary plus ~50 read replicas handling millions of QPS — no sharding of PostgreSQL itself, just excellent operations

@akshay_pachaar — 'Your RAG System Has a Hidden UX Problem' (Daily Dose of Data Science blog), referencing Zilliz semantic highlighting model4

Response UX should match retrieval intelligence

If your system uses semantic search to find results, the display should reflect that intelligence — keyword highlighting on semantic results creates a confusing mismatch

@kevingu (Kevin Gu) — AutoAgent: First Open Source Library for Self-Optimizing Agents4

Same-model meta-task pairings outperform cross-model — agents understand their own architecture better than humans or other models do

Claude meta-agent + Claude task agent outperformed Claude meta-agent + GPT task agent because the meta-agent shares weights and implicitly understands how the inner model reasons

@tonygentilcore (Tony Gentilcore, Glean) — Trace Learning for Self-Improving Agents4

Shadow execution enables safe trace learning — replay write operations without touching production data

By replaying actions that would write to external apps in a shadow path, agents can learn from realistic end-to-end flows without impacting customer data

@trq212 (Thariq) — Lessons from Building Claude Code: How We Use Skills4

A skill's folder structure is its context architecture — the file system is a form of context engineering

Skills are not just markdown files but folders where scripts, references, and assets enable progressive disclosure — the agent reads deeper files only when it reaches the relevant step

Charlie Munger — Poor Charlie's Almanack, Talk 11: The Psychology of Human Misjudgment (pp. 572-579)4

Social proof makes groups passive before visible harm — conformity overrides individual judgment even in life-or-death situations

Social-Proof Tendency causes individuals to follow the crowd into inaction or corruption, with bystander apathy and institutional silence as its most dangerous manifestations

AI Product BuildingBusiness ModelsDecision Making

System or tool? Ask whether the customer would still need you if a lab shipped a direct competitor

Three tests for being safely off the Yellow Brick Road — tools-and-steps, system-vs-tool, and customer-P&L — with the system test (would they still need you?) as the sharpest discriminator

@joeschmidtiv (Joe Schmidt IV, a16z) — Avoiding Death on the Yellow Brick Road4

Rohit (@rohit4verse) — How to Build Agents That Never Forget4

Tiered retrieval prevents context overload — summaries first, details on demand

Reading category summaries first, then drilling to items, then raw resources only if needed keeps memory retrieval within token budgets

@aparnadhinak (Aparna Dhinakaran) — Data Architectures For Tracing Harnesses & Agents4

AI trace data has an indefinite useful lifespan — SaaS observability's 30-day retention model destroys institutional knowledge

Infrastructure metrics expire quickly but AI conversations and reasoning traces gain value over time; 30-day retention windows erase the very data that reveals failure patterns and training signals

@hwchase17 (Harrison Chase) — Context Engineering Our Way to Long-Horizon Agents4

Traces replace code as the source of truth for agent systems — debugging shifts from 'show me the code' to 'send me the trace'

In agent systems, execution traces replace source code as the primary debugging and collaboration artifact — you can't predict step 14's context from reading the code

Mintlify — How We Built a Virtual Filesystem for Our Assistant4

Virtual filesystems replace sandboxes for agent navigation — intercept commands instead of provisioning infrastructure

Mintlify's ChromaFs intercepts Unix commands and translates them into database queries, cutting boot time from 46 seconds to 100ms and cost from $70k/year to near-zero

AI Product BuildingDecision MakingFuture of AI

Your first subfield is an accident of timing — wander across several before you settle, because breadth is insurance

Pay tuition in interpretability, evals, rl, and systems before deciding where you live; somewhere is a corner where your specific weirdness is an unfair advantage. Subfields all saturate, usually right after they peak on twitter, and breadth is what carries you through the transition

@itsreallyvivek (vivek) — how to be good at research4

@JayaGup10 (Jaya Gupta) — Who will set price / intelligence?4

Where inference runs decides who captures margin, owns the context, and earns trust

Value won't all accrue to the cloud — inference moves to wherever it's cheapest without breaking the product: cloud for frontier reasoning, edge for latency, on-device for privacy. Privacy matters more than in SaaS because the model isn't just storing data, it's reasoning over the user's context, memory, code, and permissions

AI Product BuildingKnowledge SystemsDecision Making

Writing is the cheapest defense against fooling yourself — the page finds the gaps your head papers over

An idea feels fully formed until you try to word it; writing exposes the untested assumption, the step that doesn't follow, the two claims that contradict. Darwin made it procedural — log disconfirming evidence on the spot, because memory deletes inconvenient results faster than convenient ones

@itsreallyvivek (vivek) — how to be good at research4

@dharmesh (Dharmesh Shah, HubSpot CTO)3

Agentic UX (AUX) is a distinct design problem — agents don't want to use software the way humans do

AUX (Agentic User Experience) is neither human UX adapted for agents nor raw APIs — it's a third design discipline for how agents want to consume software

AI Product BuildingFuture of AIAI Agents

Agents running the platform vs. agents on the platform — the operator shift changes what software must be

The shift from agents as features inside your product to agents as operators running your product — passengers become pilots

@dharmesh (Dharmesh Shah, HubSpot CTO)3

Mental ModelsDecision MakingEconomics

Bet seldom but heavily when the odds are extreme

The wise ones bet big when they have the odds and don't bet the rest of the time — most of Berkshire's billions came from about ten insights over a lifetime

Charlie Munger — Poor Charlie's Almanack, Talk 2: Elementary Worldly Wisdom (pp. 206-220)3

@hwchase17 (Harrison Chase) — Continual Learning for AI Agents3

Context learning spans agent, tenant, and org levels — and you can mix all three

Agent-level context updates the agent's own configuration over time; tenant-level (user/org/team) gives each tenant their own evolving context; production systems mix multiple levels simultaneously

@hwchase17 (Harrison Chase) — Everything Gets Rebuilt: Agents, Harnesses, and the New Compute Layer3

Enterprise agents need deterministic structure while startups need autonomous loops — same models, different harnesses

Enterprises need deterministic graph-based harnesses for predictability; startups benefit from autonomous loop agents — the harness choice, not the model, determines reliability

@aparnadhinak (Aparna Dhinakaran) — Data Architectures For Tracing Harnesses & Agents3

Evaluations must augment trace data in place — divergent copies drift by design

The moment you export traces to a separate eval system, the copy diverges from where annotations run; evals, annotations, and traces should share a single source of truth

@hwchase17 (Harrison Chase) — Continual Learning for AI Agents3

Hot-path and offline learning are two temporal modes for agent context updates — each with different tradeoffs

Agents can update their context in the hot path (during task execution, like saving to memory while working) or offline (batch processing recent traces after the fact, like OpenClaw's 'dreaming'), with an additional dimension of explicit vs implicit memory updates

Charlie Munger — Poor Charlie's Almanack, Talk 3: Elementary Worldly Wisdom, Revisited (pp. 235-239)3

Ideology is among the most extreme distorters of human cognition

Heavy ideology locks your brain into dysfunctional patterns — if it can warp a genius like Chomsky, imagine what it does to ordinary minds

@mvanhorn (Matt Van Horn) — Every Claude Code Hack I Know (March 2026)3

Inference capability lowers input fidelity requirements — smart listeners make imprecise input work

When the consumer of input has strong inference ability, the quality bar for that input drops — voice works not because transcription improved, but because the listener got smarter

AI Product BuildingBusiness ModelsArchitecture

Latent demand is the strongest product signal — make the thing people already do easier

People will only do things they already do; you can't get them to do a new thing, but you can make their existing behavior frictionless

Boris Cherny (@bcherny) — Inside Claude Code With Its Creator, Y Combinator Light Cone podcast3

Databricks — What Is a Lakebase3

Lakebases decouple compute from storage — databases become elastic infrastructure

Third-generation databases separate compute and storage entirely, putting data in open formats on cloud object stores; the database becomes a serverless layer that scales to zero

@hwchase17 (Harrison Chase) — Context Engineering Our Way to Long-Horizon Agents3

LLM-as-judge must be calibrated against human judgment — uncalibrated judges are worse than no judges

An LLM judge without human-labeled calibration data produces false confidence; the bridge is humans labeling traces, then training the judge to replicate those labels

@nicbstme — The Crumbling Workflow Moat: Aggregation Theory's Final Chapter3

LLMs complete Aggregation Theory by collapsing the interface layer

Ben Thompson's framework reaches its final chapter: LLMs eliminate the interface layer that protected software suppliers, turning the entire web into a backend database where suppliers compete on data quality alone

@izzymiller (Izzy Miller, Hex) — Building AI Agents for Data Analytics3

Long-horizon evals test compounding behavior, not point-in-time accuracy

Hex's Metric City benchmark simulates 90 days of agent use with evolving data to measure whether the agent gets smarter over time — Day 0: 4%, Day 90: 24%

Mental ModelsEngineeringDecision Making

Negative maintenance teammates reduce future work for everyone around them

The rarest team archetype isn't high-performers or low-maintenance people — it's those who actively make life easier for others by solving problems upstream before they propagate

@vxanand (Varun Anand, Clay co-founder) — Clay's Operating Principles3

Learning Technical Concepts chat — Ayush exploring open source revenue3

Open source captures value through services, not software

Free software builds billion-dollar companies because the money is in support, cloud, and governance layers — not the code itself

@hwchase17 (Harrison Chase) — Everything Gets Rebuilt: Agents, Harnesses, and the New Compute Layer3

Procedural memory is the highest-impact type of agent memory — it determines what the agent actually does

Of three memory types (semantic/episodic/procedural), procedural — instructions, skills, and tools — has the highest impact because it changes what the agent actually does

@systematicls — How To Be A World-Class Agentic Engineer3

Separate research from implementation to preserve agent context for execution

Mixing research and implementation pollutes context with irrelevant alternatives — split them into separate agent sessions so the implementer gets only the chosen approach

Charlie Munger — Poor Charlie's Almanack, Talk 11: The Psychology of Human Misjudgment (pp. 537-545)3

Small concessions trigger disproportionate reciprocation — even at the subconscious level

Reciprocation Tendency operates below conscious awareness, making tiny favors or concessions produce outsized compliance — the only reliable defense is structural prohibition

AI Product BuildingFuture of AICoding Tools

Software abundance unlocks entire categories of applications that never existed

Software has always been more expensive than we can afford; when AI drops costs 10-20x, previously unviable software becomes economically possible

@dabit3 (Nader Dabit) — Joining Cognition / Software Abundance3

@natashamalpani (Natasha Malpani) — The Verification Economy: The Red Queen Problem (Part III)3

Stronger models expand the verification gap, not close it

More capable models increase the deployment surface and raise the stakes of failures, making verification infrastructure more valuable rather than less

@tonygentilcore (Tony Gentilcore, Glean) — Trace Learning for Self-Improving Agents3

Teacher-student trace distillation with consensus validation beats single-oracle learning

A single high-reasoning teacher trace isn't reliable enough for enterprise learning; comparing multiple student traces under production constraints with consensus validation produces trustworthy strategies

@jaynitx — first principles thinking: how to see what everyone else misses3

Templates encode someone else's constraints — copying a playbook silently imports its assumptions about audience, resources, and strengths

Templates work because they were tuned for a specific situation; copying them imports invisible assumptions about who you are, what you have, and what you're optimizing for — and the misfit only shows up after you've spent the time

@jaynitx — first principles thinking: how to see what everyone else misses3

Type 1 vs Type 2 decisions — irreversibility decides whether to spend first-principles thinking or analogy

Bezos's split: irreversible decisions deserve slow, methodical first-principles thinking; reversible ones should use fast analogy. The mistake is misallocating — burning fundamentals on what to eat for lunch, or analogizing your way through a one-way door

Boris Cherny (@bcherny) — Inside Claude Code With Its Creator, Y Combinator Light Cone podcast3

Uncorrelated context windows are a form of test time compute — fresh perspectives multiply capability

Multiple agents with independent context windows avoid polluting each other's reasoning, and throwing more context at a problem from different angles increases capability

@RampLabs — How We Made Ramp Sheets Self-Maintaining3

Unfocused agents develop path dependency — without a specific mission, they explore the same paths repeatedly

Agents given broad mandates (like 'find bugs') converge on familiar exploration paths, catching high-radius issues but missing narrow situational problems

@jaynitx — first principles thinking: how to see what everyone else misses3

Users describe solutions within the constraint set they know — 'faster horses' is what stated preferences look like outside the existing tool set

When asked what they want, users reason by analogy from existing tools and return better versions of those tools; first-principles asks what underlying problem is being solved, which is invisible to the user but where the actual opportunity lives

@systematicls — How To Be A World-Class Agentic Engineer3

Weaponize sycophancy with adversarial agent ensembles instead of fighting it

Deploy bug-finder, adversary, and referee agents with scoring incentives that exploit each agent's eagerness to please — triangulating truth from competing biases

@tonygentilcore (Tony Gentilcore, Glean) — Trace Learning for Self-Improving Agents3

Agents need workflow-level tool strategies, not individual tool instructions — the hard part is how tools combine

In enterprise environments, the challenge isn't finding the right tool but understanding how tools work together; intentionally narrow strategies that capture workflow patterns generalize better than broad abstractions

@hwchase17 (Harrison Chase) — Everything Gets Rebuilt: Agents, Harnesses, and the New Compute Layer2

Knowledge is not memory — ingesting documents is solved, learning from interactions is not

Knowledge (ingesting documents into RAG) is largely solved; memory (learning from task execution to improve future behavior) remains unsolved after 2+ years of industry effort

@JayaGup10 (Jaya Gupta) — The next biggest moat in AI2

Mission strength is measured by who it repels — the strongest missions make some people refuse to work there

A mission that offends no one, selects for no one, and costs nothing is functionally fake; the strongest missions take a side, and the people they repel are the same signal as the people they attract

@julienbek — Services: The New Software2

Already-outsourced tasks are the autopilot wedge — vendor swap beats reorg

If work is already outsourced, budget exists, external delivery is accepted, and the buyer purchases outcomes — substitution is frictionless

Mental ModelsEngineeringDecision Making

Resolve ambiguity before passing it downstream — don't forward confusion

Ambiguity compounds as it flows through an organization; the person who encounters it first should resolve it, suggest a path forward, or take a first pass rather than forwarding it unresolved

@vxanand (Varun Anand, Clay co-founder) — Clay's Operating Principles2

@JayaGup10 (Jaya Gupta) — The next biggest moat in AI2

Time-denominated promises decay invisibly — 'over time' is the most dangerous denomination because time doesn't announce itself as it leaves

Promises in the 'over time it'll be bigger / you'll own more / the structure will catch up' shape rot silently because they lack a written mechanism that forces them to mature; you arrive at a later version of your life and realize the future-tense promise never came to be