The article explains the Lottery Ticket Hypothesis, showing that large neural networks contain small subnetworks that match full model performance. This challenges traditional bias-variance theory by proving overparameterization aids generalization rather than causing overfitting.
Highlights
Large neural networks contain small, trainable subnetworks called winning tickets.
This hypothesis contradicts the traditional bias-variance tradeoff principle.
Overparameterization increases the odds of finding effective solutions.
Massive models achieve high generalization despite theoretical predictions of failure.
The discovery marks a paradigm shift in understanding machine learning scaling.
Proposal & discussion of how default mode networks for LLMs are an example of missing capabilities for search and novelty in contemporary AI systems.
m.youtube.com
Deep Dive into LLMs like ChatGPT with Andrej Karpathy
This is a general audience deep dive into the Large Language Model (LLM) AI technology that powers ChatGPT and related products. It is covers the full...
vickiboykis.com
How big are our embeddings now and why?
Embedding sizes and architectures have changed remarkably over the past 5 years