11 Large Language Models (LLMs)

 

This chapter covers

  • Understanding the intuition of large language models
  • Identifying and preparing LLM training data
  • Deeply understanding the operations in training a large language model
  • Implementation details and LLM tuning approaches

What are large language models?

Large language models (LLMs) are machine learning models that are specialized for natural language processing problems, like language generation. Consider the autocomplete feature on your mobile device’s keyboard (figure 11.1). When you start typing “Hey, what are…”, the keyboard likely predicts that the next word is “you”, “we”, or “the”, because these are the most common next words after the phrase. It makes this choice by scanning a table of probabilities that was trained on commonly available pieces of content - this simple table is a language model.

Figure 11.1 Example of autocomplete as a language model

A large language model (LLM) is exactly the same idea, with some fundamental upgrades to enable interesting capabilities that come with predicting more than one word at a time:

The intuition behind language prediction

Why the size of tokens and parameters matter

An LLM training workflow

Preparing training data

Selecting and collecting data

Cleaning and preprocessing data

Encoding: From text to numbers

Tokenization

Vectorization

Designing the ANN architecture (And why transformers)

Encoding: Creating trainable embeddings

Sampling a batch of tokens

Creating a trainable embedding matrix

Creating positional encodings

Combining the embedding matrix and positional encodings

Self-attention: Start training the LLM

Linear weight matrix projections

Ask every other token

Calculating attention weights

Weighted sum

Multiple attention heads

Layer normalization

Decoding: Meaning through neural networks

Project up layer

Project down layer

Layer normalization

Stacking Transformer blocks

Making a prediction

Backpropagation and calculating loss

Controlling the LLM

Training epochs

Saving checkpoints

Stopping mechanisms

Hyperparameter tuning

Few-shot and zero-shot learning

Refining LLMs with Reinforcement Learning