AutoResearch is an open-source framework by Andrej Karpathy that enables AI agents to autonomously conduct LLM pretraining experiments on a single GPU overnight.
The agent modifies training code, runs 5-minute experiments, evaluates results using validation bits-per-byte metrics, and iterates based on performance improvements. Humans provide high-level research directives through a single program.md file while the agent handles all code modifications and experimental iterations, completing approximately 12 structured experiments per hour.