Introducing Contextual Retrieval
Anthropic is an AI safety and research company that’s working to build reliable, interpretable, and steerable AI systems.
Single-vector embedding models have a theoretical limit determined by embedding dimensionality on the combinations of top-k documents they can retrieve, regardless of model size or data.
This limitation is demonstrated empirically, including on a purpose-built LIMIT dataset where state-of-the-art embedding models fail even on simple queries. Alternative architectures like cross-encoders, multi-vector models, or sparse representations outperform embeddings on such tasks requiring more combinatorial coverage.