AI Product Building Architecture AI Agents

Order the system prompt by volatility to keep prompt prefixes cache-friendly

Hermes composes the system prompt in three tiers — stable, context, volatile — so the unchanging prefix stays cacheable while turn-by-turn data lives at the end

@aparnadhinak (Aparna Dhinakaran) — Hermes Harness Architecture · Jun 1, 2026 · 4 connections

Prompt caching only pays off when the prefix is byte-stable, so where you place changing content directly determines your cache hit rate. Hermes makes this explicit by composing the system prompt in three tiers ordered by how often each changes: stable (identity, tool guidance, skills index), context (project files from the working directory), and volatile (memory snapshots, user profile, the timestamp line). Keeping stable content first and volatile last means normal turns reuse the cached prefix — the concrete design pattern behind why KV cache hit rate is the most critical metric for production agents and what makes Prompt caching makes long context economically viable. It’s the same instinct as CLAUDE.md should be a routing table, not a knowledge base: keep the durable layer where it won’t churn the prefix.

Connected Insights

References (3)

→ CLAUDE.md should be a routing table, not a knowledge base → KV cache hit rate is the most critical metric for production agents → Prompt caching makes long context economically viable

Referenced by (1)

← KV cache hit rate is the most critical metric for production agents