GitHub - rasbt/LLMs-from-scratch: Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step - rasbt/LLMs-from-scratch
TimeCapsuleLLM is an experimental language model trained exclusively on texts from 1800–1850 London to eliminate modern bias and accurately reflect the language and worldview of that era.
The project avoids fine-tuning existing models, opting instead to train from scratch using historical books, legal documents, and newspapers. Initial results show the model responds using period-appropriate language, although output currently lacks complexity and coherence due to limited training data.
The next steps involve expanding the training dataset from 50 to 500–600 historical texts to improve performance and fidelity.