The naive approach to memory retrieval dumps everything potentially relevant into the prompt. The smarter approach uses tiered retrieval: pull category summaries first, ask the LLM “is this enough?”, and only drill down to atomic facts or raw resources if the summary is insufficient. Each tier is progressively more detailed and more expensive in tokens.
This is Skill graphs enable progressive disclosure for complex domains applied to memory instead of knowledge: just as an agent navigates from index to YAML frontmatter to prose to full content, a memory system navigates from summaries to items to raw resources. The constraint driving both patterns is the same — The context window is the fundamental constraint — everything else follows means you can’t load everything, so you need a strategy for loading the right level of detail. Time decay compounds the effect: the system weights recent memories higher, ensuring current state wins over stale data, connecting to why Embeddings measure similarity, not truth — vector databases have a temporal blind spot — recency matters as much as similarity.