All insights
AI Product Building AI Agents Architecture

Observability is the missing discipline for agent systems — you can't improve what you can't measure

Agent systems need telemetry (token usage, latency, error rates, cost per task) as a first-class engineering concern, not an afterthought bolted on after production failures

Geoff Huntley — Latent Patterns Principles (verification over testing) · · 10 connections

Most agent systems are built with zero observability — no token tracking, no error rate dashboards, no cost-per-task visibility. Teams discover problems through user complaints, not telemetry. This is the equivalent of running a web service without logging or metrics — technically possible, but operationally reckless.

The minimum viable observability for agents includes: token usage per session (are costs spiking?), latency distribution (are some tasks mysteriously slow?), error rates by task type (which capabilities are unreliable?), and cost per completed task (is the unit economics viable?). This extends Decision traces are the missing data layer — a trillion-dollar gap from “why did we decide this?” to “how did the system behave while deciding?” — decision traces capture reasoning, observability captures performance. Combined with KV cache hit rate is the most critical metric for production agents, token-level telemetry reveals whether your caching strategy is actually working or just theoretically sound. And Agents that store error patterns learn continuously without fine-tuning or retraining becomes far more powerful when error patterns are detected systematically through telemetry rather than discovered anecdotally. Beyond measuring system performance, Trust boundaries must be externalized — not held in engineers' heads argues that teams must also map where agent behavior is well-understood vs. unknown — making that characterization auditable and connecting it to deployment gates. And when evaluations run on this telemetry, Evaluations must augment trace data in place — divergent copies drift by design — they should augment the trace data directly rather than living in a sidecar system, because divergent copies drift by design.