Knowledge Systems
AI Product Building43 insights in this topic
43 insights
Compound engineering makes each unit of work improve all future work
The 80/20 ratio (80% plan+review, 20% work+compound) ensures learning compounds across iterations, not just code
Persistent agent memory preserves institutional knowledge that walks out the door with employees
When agents maintain daily changelogs, decision logs, and work preferences, organizational knowledge survives personnel changes
Files are the universal interface between humans and agents
Markdown and YAML files on disk beat databases because agents already know file operations and humans can inspect everything
Skill graphs enable progressive disclosure for complex domains
Single skill files hit a ceiling — complex domains need interconnected knowledge that agents navigate progressively from index to description to links to sections to full content
Evolving summaries beat append-only memory — rewrite profiles, don't accumulate facts
An evolve_summary() function that rewrites category profiles with new information handles contradictions naturally, unlike append-only logs
Structure plus reasoning beats flat similarity for complex domains
Across documents, code, and skills, the same pattern holds: structured knowledge navigated by reasoning outperforms flat indexes searched by similarity
Spec files are external memory that survives context resets
A structured specs/ folder (design.md, implementation.md, decisions.md) bridges human intent and agent execution across sessions
Harness engineering — humans steer, agents execute, documentation is the system of record
OpenAI built a million-line production codebase with zero manually-written code in 5 months. The discipline shifted from writing code to designing the harness: architecture constraints, documentation, tooling, and feedback loops that make agents reliable at scale.
Agents learn at three distinct layers — model weights, harness code, and context configuration
Most people jump to model fine-tuning when discussing agent learning, but learning also happens at the harness layer (code, tools, instructions baked into all instances) and the context layer (per-user or per-tenant configuration like CLAUDE.md and skills)
Building real projects teaches AI skills faster than following structured curricula
A non-technical user who built a production WhatsApp bot reached 'Operator' level that a 30-day AI mastery roadmap targets — through building, not studying
Session capture turns ephemeral AI conversations into a compounding knowledge base
shadcn's /done pattern — dumping key decisions, questions, and follow-ups to markdown after each Claude session — applies file-based memory architecture to development workflow
Two-tier agent memory separates organizational workflow knowledge from individual user preferences
Deployment-level memory captures shared tool strategies and sequencing patterns; user-level memory captures personal templates and communication styles — initially skipping user-level had a significant performance impact
CLAUDE.md should be a routing table, not a knowledge base
Treat CLAUDE.md as a minimal IF-ELSE directory pointing to context files — not a 26,000-line monolith that bloats every session
Treat an agent as an operating system, not a stateless function
Agents need RAM (conversation context), a hard drive (persistent memory), garbage collection (decay/pruning), and I/O management (tools) — the OS mental model unlocks architectural clarity
Accumulated agent traces produce emergent world models — discovered, not designed
When agent decision trajectories accumulate over time, they form a context graph that reveals entities, relationships, and constraints nobody explicitly modeled
Context inefficiency compounds three penalties: cost, latency, and quality degradation
Every wasted token in an LLM context window doesn't just cost money — it slows responses and degrades output quality, creating a triple tax on production agents
Cross-user knowledge transfer works without fine-tuning — just a database and prompt engineering
When one person teaches an agent something, another person benefits automatically — no RLHF, no training infrastructure, just structured storage and retrieval
Don't be the discriminator — be the patron, not the judge
Taste (selecting from AI output) is the function that gets automated first; participating in creation through friction and will is what endures
Intelligence location — code vs prompts — determines system fragility and flexibility
Critical architectural fork: prompt-driven systems (Pal's 400-line routing prompt) are flexible but break when models change; code-driven systems (our validate-graph.js) are rigid but reliable — best systems need both
Knowledge evolution is the biggest unsolved problem across all graph architectures
Almost nobody has solved how knowledge graphs grow without rotting — most are append-only, auto-decay is too aggressive, and even the best systems only add links without pruning, merging, or detecting contradictions
A loss curve is reassurance, not analysis — pull a hundred failures and read every one
Experiments throw off far more information than you consume — transcripts, failure cases, the strange tail — and most of it dies unread. Most ML bugs live in the data and fail silently; Ng's move is to pull 100 failures, sort them into piles, and attack the biggest pile
Tribal knowledge is the irreducible human input that enables agent automation
Automated context construction handles most of the corpus, but the most critical context is implicit, conditional, and historically contingent — only humans can provide it
Context centralization is why coding AI works — git is a solved context repository, knowledge work has no equivalent
Engineering AI leads because git centralizes all context in one versioned repository; knowledge work fails on three axes: distributed, unstructured, unverifiable
Shared inputs produce shared conclusions worth nothing — old and cross-disciplinary material is criminally underpriced
If your information diet is trending arxiv plus the group chat, you reach the same conclusions as everyone else at the same time, which makes them worthless. Old material (MoE 1991, LSTMs 1997, the bitter lesson) and cross-disciplinary range are underpriced sources of differentiated ideas
Knowledge systems need dual-layer storage — narrative depth and structured queries can't share a format
Every system beyond 'markdown files in a folder' discovers that narrative depth (rich prose, context, reasoning) and structured querying (filter, aggregate, cross-reference) need different storage layers with a routing mechanism between them
Memory is a harness responsibility, not a pluggable component
Managing context — what enters, what survives compaction, what's queryable — is a core capability of the harness itself, not an add-on service
Metadata consumed by LLMs needs trigger specifications, not human summaries
When an LLM scans metadata to decide what to invoke, the description should specify when to activate — not summarize what the thing does — because LLMs are a fundamentally different consumer than humans
You can offload a task, or even a job, but you can never offload your learning
The real opportunity isn't picking the best model — it's building a learning loop on top of models where the firm's accumulated learning, the one thing it can't outsource, compounds across people and AI
Taste is a muscle, not a gift — train it by forecasting every result before you see it
Predict the outcome of every experiment before running it, guess a paper's numbers from the method alone, call which releases will matter in two years and check your hit rate; a forecast plus a correction, repeated a few hundred times, trains the model in your head the way it trains any other model
A clear public explanation is a genuine contribution and an unfakeable credential
Fields choke on undigested ideas, so distilling something hard into a clear explanation is real work, not a service job — and a body of public writing doubles as the strongest credential you can hold, because it's an unfakeable sample of how you think
Compilation scales but curation compounds — two camps for knowledge graph construction
LLM-compiled systems (Karpathy, Pal) grow fast by feeding raw content through model judgment; human-curated systems (our graph, brainctl) grow slowly but every node is validated — compilation scales linearly, curation compounds through connections
Context layers supersede semantic layers for agent autonomy
Traditional semantic layers handle metric definitions but agents need a superset: canonical entities, identity resolution, tribal knowledge instructions, and governance guidance
Embeddings measure similarity, not truth — vector databases have a temporal blind spot
Vector search can't resolve contradictions or understand time; 'I love my job' and 'I'm quitting' retrieve with equal confidence
Memory defines the agent — a zip of markdown files IS the agent, and portable memory between harnesses is the frontier
An agent IS its memory — a zip of markdown (system prompt + skills + tools) defines its identity; making that portable between harnesses is the current frontier
Navigation beats search for knowledge retrieval — let each data source keep its native query interface
Vector similarity search flattens everything into one embedding space, losing native query affordances; better to let SQL be SQL, files be files, and build a routing layer that picks the right source per question type
A skill's folder structure is its context architecture — the file system is a form of context engineering
Skills are not just markdown files but folders where scripts, references, and assets enable progressive disclosure — the agent reads deeper files only when it reaches the relevant step
Tiered retrieval prevents context overload — summaries first, details on demand
Reading category summaries first, then drilling to items, then raw resources only if needed keeps memory retrieval within token budgets
Writing is the cheapest defense against fooling yourself — the page finds the gaps your head papers over
An idea feels fully formed until you try to word it; writing exposes the untested assumption, the step that doesn't follow, the two claims that contradict. Darwin made it procedural — log disconfirming evidence on the spot, because memory deletes inconvenient results faster than convenient ones
Context learning spans agent, tenant, and org levels — and you can mix all three
Agent-level context updates the agent's own configuration over time; tenant-level (user/org/team) gives each tenant their own evolving context; production systems mix multiple levels simultaneously
Hot-path and offline learning are two temporal modes for agent context updates — each with different tradeoffs
Agents can update their context in the hot path (during task execution, like saving to memory while working) or offline (batch processing recent traces after the fact, like OpenClaw's 'dreaming'), with an additional dimension of explicit vs implicit memory updates
Procedural memory is the highest-impact type of agent memory — it determines what the agent actually does
Of three memory types (semantic/episodic/procedural), procedural — instructions, skills, and tools — has the highest impact because it changes what the agent actually does
Agents need workflow-level tool strategies, not individual tool instructions — the hard part is how tools combine
In enterprise environments, the challenge isn't finding the right tool but understanding how tools work together; intentionally narrow strategies that capture workflow patterns generalize better than broad abstractions
Knowledge is not memory — ingesting documents is solved, learning from interactions is not
Knowledge (ingesting documents into RAG) is largely solved; memory (learning from task execution to improve future behavior) remains unsolved after 2+ years of industry effort