chapter five

5 Exploring and evaluating language models

This chapter covers

Understanding the capabilities of LMs
Selecting suitable LMs
Customizing LMs for specific tasks
LMs in the wider application context
Evaluating LMs

In this chapter, we will dive into the world of language models (LMs), which can be used for a wide variety of tasks, starting with content creation and moving on to tasks such as text summarization, translation, and more complex problem-solving. The chapter will provide a solid understanding of LMs to help you make informed decisions about model selection, deployment, customization, and risk management. You also need to support your engineers in making design decisions about the integration, adaptation, and evaluation of LMs within the larger AI system you are building.

Terminology

While giant LLMs were the main “culprit” of the generative AI boom, there is also a trend towards downscaling and using smaller, more efficient models. In the following, I will use “language model” (LM) as a general term encompassing both large (LLMs; 2B+ parameters) and small (SLMs; <2B parameters) models.

5.1 How language models work

5.1.1 Understanding the training data of a language model

5.1.2 The task of language modeling

5.1.3 Expanding the capabilities of a language model

5.2 Usage scenarios for language models

5.2.1 Direct interaction between user and model

5.2.2 Programmatic use

5.2.3 Using the language model for predefined tasks

5.3 Mapping the LM landscape

5.3.1 Mainstream commercial LLMs

5.3.2 Open-source models

5.3.3 Reasoning language models

5.3.4 Small language models

5.3.5 Multi-modal models

5.4 Managing the LM lifecycle

5.4.1 Model selection

5.4.2 Evaluating language models

5.4.3 Customizing the language model to your requirements

5.4.4 Collecting feedback during production

5.4.5 Continuously optimizing your LM setup

5.5 References

5.6 Summary