8 Training and evaluating large language models

 

This chapter covers

  • Deep dive into hyperparameters
  • Hyperparameter optimization with Ray
  • Effective strategies for experiment tracking
  • Parameter efficient fine-tuning
  • Various quantization techniques

Large language models have transformed how we approach tasks ranging from translation to content generation. However, their size brings unique challenges that require efficient strategies for training, tuning, and evaluation.

This chapter offers a practical overview of the most effective tools and techniques for improving the efficiency and manageability of large models throughout development and deployment. We begin by exploring hyperparameters and their impact on model performance, followed by optimization strategies such as pruning, distillation, quantization, and sharding.

To support large-scale experimentation, Ray and Weights & Biases are widely adopted in modern machine learning workflows. Ray provides a scalable framework for distributed training and hyperparameter optimization, with native integration into major cloud providers like AWS and GCP. Weights & Biases complements this with comprehensive tools for experiment tracking, model monitoring, and result visualization. Used together, they enable more structured and efficient development cycles.

8.1 Deep dive into hyperparameters

8.1.1 How parameters and hyperparameters factor into gradient descent

8.2 Model tuning and hyperparameter optimization

8.2.1 Track experiments

8.3 Parameter efficient fine-tuning LLMs

8.3.1 Low-rank adaptation

8.3.2 Weight-decomposed low-rank adaptation

8.3.3 Quantization

8.3.4 Efficient fine-tuning of quantized LLMs with QLoRA

8.3.5 Quantization-aware low-rank adaptation

8.3.6 Low-rank plus quantized matrix decomposition

8.3.7 Bringing it all together: Choosing the right PEFT strategy

8.4 Summary