1 The need for transformers

 

This chapter covers

  • How transformers revolutionized NLP
  • Attention mechanism - the transformers key architectural component
  • How to use transformers
  • When and why you want to use transformers

The field of machine learning (ML), and natural language processing (NLP) in particular, has undergone a revolutionary change with the invention of a new class of neural networks called transformers. These models, striking for their capacity to understand and generate natural language, are the backbone of widely-used generative AI applications such as OpenAI’s ChatGPT and Anthropic’s Claude.

Transformers, along with their derivatives like large language models (LLMs), take advantage of a unique architectural approach that incorporates an innovative component called the "attention mechanism." The attention mechanism enables the model to concentrate in varying degrees on distinct segments of the input data, thereby enhancing its ability to process and comprehend complex sequential data. This capability is critical to how LLMs process natural language, and it also applies to the broader use of transformers in processing audio streams, images, and video.

Let’s start by comparing transformers with their predecessors, the LSTM models, and examine each component of a transformer in more detail.

1.1 The transformers breakthrough

1.1.1 Translation before transformers

1.1.2 How are transformers different?

1.1.3 Unveiling the attention mechanism

1.1.4 The power of multi-head attention

1.2 How to use transformers

1.3 When and why you’d want to use transformers

1.4 From Transformer to LLM: The lasting blueprint

1.5 Summary