AI Product Building Architecture

Similarity is not relevance — relevance requires reasoning

Vector search finds semantically similar content, but what users need is relevant content, and determining relevance requires LLM reasoning, not just pattern matching

PageIndex by VectifyAI — https://github.com/VectifyAI/PageIndex · Feb 24, 2026 · 9 connections

The foundational insight behind PageIndex: vector embeddings measure semantic distance, but distance and relevance are different things. A document about “interest rates” might be semantically similar to a query about “monetary policy effects on housing,” but the actually relevant section lives inside a chapter on domestic policy implications — something only Structure plus reasoning beats flat similarity for complex domains can find.

PageIndex proves this by building a tree index (like a table of contents optimized for LLMs) and having the LLM reason through it like a domain expert would — “this question about interest rates is probably under Monetary Policy.” The Mafin 2.5 system (powered by PageIndex) achieved 98.7% accuracy on FinanceBench, significantly outperforming traditional vector-based RAG systems. This is the same principle driving Agentic search beats RAG for live codebases: the agent reasons about where to look rather than relying on similarity to surface the right chunks. It also connects to why Context is the product, not the model — the quality of retrieval depends on how you structure context, not which model you use for embeddings.

Connected Insights

References (3)

→ Structure plus reasoning beats flat similarity for complex domains → Agentic search beats RAG for live codebases → Context is the product, not the model

Referenced by (6)

← Agentic search beats RAG for live codebases ← Hybrid search is the default, not the exception ← Embeddings measure similarity, not truth — vector databases have a temporal blind spot ← Evaluate agent tools with real multi-step tasks, not toy single-call examples ← Data agent failures stem from missing business context, not SQL generation gaps ← Navigation beats search for knowledge retrieval — let each data source keep its native query interface