welcome

Welcome

Thank you for purchasing the MEAP edition of The RLHF Book. I’m excited to get a polished encapsulation of much hard work of mine over the last few years into the hands of the growing community of AI.

RLHF burst onto the scene following the release of ChatGPT, where it was the crucial added technique that transformed GPT 3.5 into the ChatGPT that we fell in love with. Over the last few years, I’ve been doing open research building models like ChatGPT, and I’m consistently shocked by how little of the information is public on how to do this – even basic definitions for common training algorithms.

The goal of this book is to be the canonical reference for RLHF as it matures into an established area of research. I’m thankful to have early eyes on the polished version for the Manning Early Access Program, building on top of the first draft that’s been public at rlhfbook.com. This is the book I wished I had when I started learning about RLHF for language models almost three years ago.

In this time, I’ve watched first-hand as RLHF, and “post-training,” which has emerged as the term of art for related methods, has moved from a niche research topic to one of the central mechanisms shaping how modern AI systems behave. As models have scaled, the role of human data has evolved in the process. The result is a fast-developing field where research labs, open-source communities, and companies of every size are continually experimenting, iterating, and publishing new methods.