chapter twelve

12 Training a Transformer to generate text

This chapter covers

Building a scaled-down version of the GPT-2XL model tailored to your needs
Preparing data for training a GPT-style Transformer
Training a GPT-style Transformer from scratch
Generating text using the trained GPT model

In chapter 11, we developed the GPT-2XL model from scratch but were unable to train it due to its vast number of parameters. Training a model with 1.5 billion parameters requires supercomputing facilities and an enormous amount of data. Consequently, we loaded pretrained weights from OpenAI into our model and then used the GPT-2XL model to generate text.

12.1 Building and training a GPT from scratch

12.1.1 The architecture of a GPT to generate text

12.1.2 The training process of the GPT model to generate text

12.2 Tokenizing text of Hemingway novels

12.2.1 Tokenizing the text

12.2.2 Creating batches for training

12.3 Building a GPT to generate text

12.3.1 Model hyperparameters

12.3.2 Modeling the causal self-attention mechanism

12.3.3 Building the GPT model

12.4 Training the GPT model to generate text

12.4.1 Training the GPT model

12.4.2 A function to generate text

12.4.3 Text generation with different versions of the trained model

Summary