Nathan Rooy created a collection of one million screenshots from small web sites, avoiding popular domains. Screenshots were captured with Playwright, visual embeddings generated via a custom triplet loss encoder, and organized using self-organizing maps (SOMs) with parallel color distribution for layout.
Highlights
Curates one million screenshots specifically from the small web to bypass the noise of popular, ad-driven sites.
Employs Self-Organizing Maps with parallel color distribution to organize visual embeddings into a coherent layout.
Uses a custom triplet loss encoder to generate visual embeddings that capture micro-placement similarities.
Provides a zoomable interactive map at screenshots.nry.me for exploring the visual landscape of the small web.
Contrasts the small web with mainstream platforms, arguing that substance often outweighs click-through rates in niche corners of the internet.
auto-generated
Nathan Rooy · via nry.me
Context
Audience
Web Developers, Data Scientists, Digital Archivists, Web Researchers
Common CrawlPlaywrightSelf-Organizing MapsVisual Embeddingsonemillionscreenshots.com
Discover Similar Content
onemillionscreenshots.com
One Million Screenshots
Explore the web’s biggest homepage. Discover similar sites. See changes over time. Get web data.
github.com
GitHub - DOSAYGO-STUDIO/HackerBook: Hacker Book - COMMUNITY, ALL THE HN ARE BELONG TO YOU. An unkillable, static offline archive of all of Hacker News.
Hacker Book - COMMUNITY, ALL THE HN ARE BELONG TO YOU. An unkillable, static offline archive of all of Hacker News. - DOSAYGO-STUDIO/HackerBook
shot-scraper.datasette.io
shot-scraper
shot-scraper is a command-line utility for taking automated screenshots of websites using a headless browser. Install via pip, then run commands like ...