chapter four

4 Instruction Fine-tuning

 

This chapter covers

  • Why instruction fine-tuning is the foundation before RLHF
  • How models are fine-tuned today to follow instructions

Early large pretrained language models were trained with a next-token prediction objective and, by default, did not come with an explicit interface for following instructions. Around the release of GPT-3 [1], prompting and in-context learning became a widely used way to adapt a single model to many tasks (though task-specific fine-tuning remained common), by showing examples in-context and asking the model to complete a similar task. A practical next step was instruction fine-tuning, which teaches the model to respond in an instruction-response format rather than just continuing text.

4.1 Chat templates and the structure of instructions

4.2 Best practices of instruction tuning

4.3 Implementation