Welcome
Thank you for purchasing the MEAP edition of Reinforcement Learning from Human Feedback. I’m excited to get a polished encapsulation of much hard work of mine over the last few years into the hands of the growing community of AI.
RLHF burst onto the scene following the release of ChatGPT, where it was the crucial added technique that transformed GPT 3.5 into the ChatGPT that we fell in love with. Over the last few years, I’ve been doing open research building models like ChatGPT, and I’m consistently shocked by how little of the information is public on how to do this – even basic definitions for common training algorithms.
The goal of this book is to be the canonical reference for RLHF as it matures into an established area of research. I’m thankful to have early eyes on the polished version for the Manning Early Access Program, building on top of the first draft that’s been public at rlhfbook.com. This is the book I wished I had when I started learning about RLHF for language models almost three years ago.