~/bookmarks

William's Bookmark Library

/**/

On the Theoretical Limitations of Embedding-Based Retrieval

alphaxiv.orgSaved September 1, 20253 minAugust 28, 2025

Jinhyuk Lee, Orion Weller, Iftekhar Naim, Michael Boratko · via arXiv

Summary

Single-vector embedding models have a theoretical limit determined by embedding dimensionality on the combinations of top-k documents they can retrieve, regardless of model size or data.

This limitation is demonstrated empirically, including on a purpose-built LIMIT dataset where state-of-the-art embedding models fail even on simple queries. Alternative architectures like cross-encoders, multi-vector models, or sparse representations outperform embeddings on such tasks requiring more combinatorial coverage.

Topics

Embedding-Based Retrieval Information Retrieval Machine Learning Limitations Neural Retrieval Models Theoretical Computer Science

Visit Site All Bookmarks

William's Bookmark Library

/**/

On the Theoretical Limitations of Embedding-Based Retrieval

alphaxiv.orgSaved September 1, 20253 minAugust 28, 2025

Jinhyuk Lee, Orion Weller, Iftekhar Naim, Michael Boratko · via arXiv

Summary

Single-vector embedding models have a theoretical limit determined by embedding dimensionality on the combinations of top-k documents they can retrieve, regardless of model size or data.

This limitation is demonstrated empirically, including on a purpose-built LIMIT dataset where state-of-the-art embedding models fail even on simple queries. Alternative architectures like cross-encoders, multi-vector models, or sparse representations outperform embeddings on such tasks requiring more combinatorial coverage.

Topics

Embedding-Based Retrieval Information Retrieval Machine Learning Limitations Neural Retrieval Models Theoretical Computer Science

Visit Site All Bookmarks

~/bookmarks

On the Theoretical Limitations of Embedding-Based Retrieval

Summary

Topics

Discover Similar Content

Introducing Contextual Retrieval

The Case Against pgvector | Alex Jacobs

BM42: New Baseline for Hybrid Search - Qdrant

Discover Similar Content

On the Theoretical Limitations of Embedding-Based Retrieval

Summary

Topics

Discover Similar Content

Introducing Contextual Retrieval

The Case Against pgvector | Alex Jacobs

BM42: New Baseline for Hybrid Search - Qdrant