5 Preference Alignment and RAG
This chapter covers
- Reinforcement learning from human feedback (RLHF)
- Direct preference optimization (DPO)
- Group-robust alignment (GRPO)
- Retrieval-augmented generation (RAG) for factual grounding
As we’ve seen, decoding strategies and prompting techniques can guide a language model’s output at inference time. These methods do not change the model’s underlying parameters or architecture but significantly influence the diversity, fluency, and usefulness of its generated text. In this chapter, we shift focus to techniques that align a language model more directly with user intent. Either by training the model to prefer certain outputs through reinforcement learning and preference modeling, or by augmenting its context at inference time with external, up-to-date information.
We begin with preference alignment using Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), and Group Relative Policy Optimization (GRPO). These methods guide the model to produce outputs that better reflect human values, task-specific expectations and reasoning. Then, we cover knowledge alignment via Retrieval-Augmented Generation (RAG), which allows a model to dynamically incorporate factual and domain-specific information at runtime—without changing the model weights.
Together, these techniques form the foundation for controlling, specializing, and grounding large language models in real-world applications.