Hacker News vector search dataset using ClickHouse
Dataset containing 28+ million Hacker News postings & their vector embeddings
HackerBook is a static, offline archive of 20 years of Hacker News data, packaged as HTML, JSON, and gzipped SQLite shards that run entirely in the browser via SQLite WASM.
The browser fetches only needed shards for fast navigation, with content hashing in filenames ensuring cache correctness and immutability. Users can view pre-built sites locally, build from BigQuery raw data using Node.js tools, or deploy their own snapshots.