chapter seven

7 Fine-Tuning LLMs for Improved Performance

This chapter covers

Choosing among prompting, RAG, or fine-tuning for domain-specific tasks
Preparing high-quality training data for fine-tuning
The fine-tuning process for both closed-source and open-source models
A complete hands-on walkthrough of building a reliable customer support assistant
Using knowledge distillation to create smaller, faster student models for production

"This model gives okay results, but it's not precise enough for our medical terminology."

"The AI generates good text, but it doesn't understand our company's internal jargon."

"We need more accuracy for legal documents—general models make too many subtle errors."

Do these concerns sound familiar? You're not alone. While today's Large Language Models (LLMs) demonstrate impressive capabilities across a wide range of tasks, they often fall short when facing highly specialized domains or complex workflows requiring domain expertise.

In previous chapters, we discussed various techniques to optimize LLM performance, including prompting (zero-shot, few-shot) and Retrieval-Augmented Generation (RAG). These methods allow us to get better results from models without changing their underlying weights. However, there are cases where these techniques are insufficient, and more fundamental adjustments to the model’s behavior are necessary. This is where fine-tuning becomes valuable.

7.1 Choosing the right approach: Prompting, RAG, or fine-tuning?

7.1.1 When to use each approach: A practical decision framework

7.1.2 Hybrid approach: Combining RAG and fine-tuning

7.2 Case studies: Finetuning in the real world

7 Fine-Tuning LLMs for Improved Performance

This chapter covers

7.1 Choosing the right approach: Prompting, RAG, or fine-tuning?

7.1.1 When to use each approach: A practical decision framework

7.1.2 Hybrid approach: Combining RAG and fine-tuning

7.2 Case studies: Finetuning in the real world

7.2.1 Medicine: Med-PaLM 2

7.2.2 Finance: BloombergGPT and FinGPT

7.2.3 Code generation: Codex

7.3 Understanding the finetuning process

7.3.1 The data preparation phase

7.3.2 Fine-tuning closed source models (OpenAI)

7.3.3 Understanding key fine-tuning approaches

7.3.4 Building a customer support assistant with META’s LLaMA open-source model

7.4 Knowledge Distillation for LLMs

7.4.1 How knowledge distillation works in LLMs

7.5 Summary

7.6 References