chapter one

1 The need for transformers

This chapter covers

How transformers revolutionized natural language processing
Attention mechanisms—transformers’ key architectural component
How to use transformers
When and why to use transformers

The field of machine learning (ML), and natural language processing (NLP) in particular, has undergone a revolutionary change with the invention of a new class of neural networks called transformers. These models, striking for their capacity to understand and generate natural language, are the backbone of widely used generative AI applications, such as OpenAI’s ChatGPT and Anthropic’s Claude.

Transformers, along with their derivatives such as large language models (LLMs), employ a unique architectural approach that incorporates an innovative component known as the “attention mechanism.” The attention mechanism enables the model to concentrate in varying degrees on distinct segments of the input data, thereby enhancing its ability to process and comprehend complex sequential data. This capability is critical to how LLMs process natural language, and it also applies to the broader use of transformers in processing audio streams, images, and video.

Let’s start by comparing transformers with their predecessors, the long short-term memory (LSTM) models, and then examine each component of a transformer in more detail.

1.1 The transformers breakthrough

1 The need for transformers

This chapter covers

1.1 The transformers breakthrough

1.1.1 Translation before transformers

1.1.2 How are transformers different?

1.1.3 Unveiling the attention mechanism

1.1.4 The power of multihead attention

1.2 How to use transformers

1.3 When and why to use transformers

1.4 From transformer to LLM: The lasting blueprint

Summary