chapter five

5 Exploring and evaluating language models

This chapter covers

Understanding the capabilities of LMs
Selecting suitable LMs
Customizing LMs for specific tasks
LMs in the wider application context
Evaluating LMs

In this chapter, we will dive into the world of language models (LMs) and explore their capabilities, limitations, and practical applications. When building generative AI products, you need to have a solid understanding of these models to make informed decisions about model selection, deployment, customization, and risk management. You also need to support your engineers when it comes to taking design decisions about the integration, adaptation, and evaluation of LMs within the larger AI system you are building.

A word on terminology: while giant LLMs were the main culprit of the generative AI boom, the current trend is towards downscaling and using smaller, more efficient models. In this and the subsequent chapters, I will generally refer to “language models” (LMs), which can be both large (LLMs; 2B+ parameters) and small (SLMs; <2B parameters). We will be focusing on the text modality, only touching multi-modal models for those use cases where other modalities are combined with text.

5.1 How language models work

5.1.1 Understanding the training data of a language model

5.1.2 The task of language modeling

5.1.3 Expanding the capabilities of a language model

5.2 Usage scenarios for language models

5.2.1 Direct interaction between user and model

5.2.2 Programmatic use

5.2.3 Using the language model for predefined tasks

5.3 Mapping the LM landscape

5.3.1 Mainstream commercial LLMs

5.3.2 Open-source models

5.3.3 Small Language Models

5.3.4 Multi-modal models

5.4 Managing the LM lifecycle

5.4.1 Model selection

5.4.2 Evaluating language models

5.4.3 Adapting the language model to your requirements

5.4.4 Collecting feedback during production

5.4.5 Continuously optimizing your LM setup

5.5 Summary