~/books

William's Reading List

/**/

The RLHF Book

Reinforcement learning from human feedback, alignment, and post-training LLMs

Nathan Lambert

Book Metadata

Publisher

Manning

Published

2026

Duration

6 hr 15 min

ISBN

9781633434301

Genres

Computers

About This Book

This is a guide to reinforcement learning from human feedback (RLHF), alignment, and post-training for Large Language Models (LLMs). Author Nathan Lambert blends perspectives from fields like philosophy and economics with the core mathematics and computer science of RLHF to provide a practical guide for applying RLHF to models. Reinforcement Learning From Human Feedback (RLHF) is the process of using human responses to a model’s output to shape its alignment and behavior. This book explores the ideas, established techniques, and best practices of the field, beginning with an overview of leading papers before detailing training and optimization tools. Content Overview Foundations: How advanced AI models are taught from human feedback and how large-scale preference data is collected. Core Methods: Derivations and implementations for policy-gradient methods used to train AI models with reinforcement learning (RL). Optimization: Direct Preference Optimization (DPO), direct alignment algorithms, and methods for preference finetuning. Modern Developments: The transition to reinforcement learning from verifiable rewards (RLVR) and industry techniques for character, personality, and AI feedback training. Evaluation: How to approach evaluation and how the field's standards have changed over time. Implementation: Standard recipes for post-training combining instruction tuning with RLHF, and the development history of open models like Llama-Instruct, Zephyr, Olmo, and Tülu. Advanced Techniques The book details optimization tools such as reward models, regularization, and instruction tuning, alongside advanced techniques including constitutional AI, synthetic data, and current open questions in the field. It examines modern RLHF training pipelines and their trade-offs through hands-on experiments and mini-implementations. About the Reader This book is intended for engineers and AI scientists looking to get started in AI training and students seeking a foothold in the industry. About the Author Nathan Lambert is the post-training lead at the Allen Institute for AI, having previously worked for HuggingFace, Deepmind, and Facebook AI. He has guest lectured at Stanford, Harvard, and MIT, and is a presenter at NeurIPS and other AI conferences. He has received the “Best Theme Paper Award” at ACL and “Geekwire Innovation of the Year.” Nathan earned a PhD in Electrical Engineering and Computer Science from University of California, Berkeley.

The RLHF Book | William Callahan - Bookshelf (Library)

~/books

The RLHF Book

Similar Content

Related Books

Build a Reasoning Model (From Scratch)

A Simple Guide to Retrieval Augmented Generation

Knowledge Graphs and LLMs in Action

Related Bookmarks

Deep Dive into LLMs like ChatGPT with Andrej Karpathy

Using LLM-as-a-Judge For Evaluation: A Complete Guide – Hamel's Blog - Hamel Husain

Cerebras

Related Investments

Toucan

Similar Content

Similar Content

Related Books

Build a Reasoning Model (From Scratch)

A Simple Guide to Retrieval Augmented Generation

Knowledge Graphs and LLMs in Action

Related Bookmarks

Deep Dive into LLMs like ChatGPT with Andrej Karpathy

Using LLM-as-a-Judge For Evaluation: A Complete Guide – Hamel's Blog - Hamel Husain

Cerebras

Related Investments

Toucan