chapter eight

8 Training and evaluating large language models

This chapter covers

A deep dive into hyperparameters
Hyperparameter optimization with Ray
Effective strategies for experiment tracking
Parameter-efficient fine-tuning
Various quantization techniques

Large language models (LLMs) have transformed how we approach tasks ranging from translation to content generation. However, their size brings unique challenges that require efficient strategies for training, tuning, and evaluation.

This chapter offers a practical overview of the most effective tools and techniques for improving the efficiency and manageability of large models throughout development and deployment. We begin by exploring hyperparameters and their effects on model performance, followed by optimization strategies such as pruning, distillation, quantization, and sharding.

To support large-scale experimentation, Ray and Weights & Biases (W&B) are widely adopted in modern machine learning workflows. Ray provides a scalable framework for distributed training and hyperparameter optimization, with native integration into major cloud providers like AWS and GCP. W&B complements this with comprehensive tools for experiment tracking, model monitoring, and result visualization. Used together, they enable more structured and efficient development cycles.

8.1 Deep dive into hyperparameters

8.1.1 How parameters and hyperparameters factor into gradient descent

8.2 Model tuning and hyperparameter optimization

8 Training and evaluating large language models

This chapter covers

8.1 Deep dive into hyperparameters

8.1.1 How parameters and hyperparameters factor into gradient descent

8.2 Model tuning and hyperparameter optimization

8.2.1 Tracking experiments

8.3 Parameter-efficient fine-tuning LLMs

8.3.1 Low-rank adaptation

8.3.2 Weight-decomposed low-rank adaptation

8.3.3 Quantization

8.3.4 Efficient fine-tuning of quantized LLMs with QLoRA

8.3.5 Quantization-aware low-rank adaptation

8.3.6 Low-rank plus quantized matrix decomposition

8.3.7 Bringing it all together: Choosing the right PEFT strategy

Summary