Prompt caching only pays off when the prefix is byte-stable, so where you place changing content directly determines your cache hit rate. Hermes makes this explicit by composing the system prompt in three tiers ordered by how often each changes: stable (identity, tool guidance, skills index), context (project files from the working directory), and volatile (memory snapshots, user profile, the timestamp line). Keeping stable content first and volatile last means normal turns reuse the cached prefix — the concrete design pattern behind why KV cache hit rate is the most critical metric for production agents and what makes Prompt caching makes long context economically viable. It’s the same instinct as CLAUDE.md should be a routing table, not a knowledge base: keep the durable layer where it won’t churn the prefix.