Indexing - Qdrant
Qdrant is an Open-Source Vector Database and Vector Search Engine written in Rust. It provides fast and scalable vector similarity search service with...
The Hacker News vector search dataset contains 28.74 million postings with 384-dimensional vector embeddings generated using the all-MiniLM-L6-v2 model.
The dataset is provided as a Parquet file and can be loaded into ClickHouse for scalable semantic search and similarity queries. Users can build vector similarity indexes and perform efficient semantic searches using cosine distance, with applications ranging from document retrieval to generative AI summarization.
A discussion on this: https://news.ycombinator.com/item?id=46081053 At the embedding size, this is an 11 billion parameter data set.