12 Training a Transformer to generate text
- Building a scaled-down version of the GPT-2XL model tailored to your needs
- Preparing data for training a GPT-style Transformer
- Training a GPT-style Transformer from scratch
- Generating text using the trained GPT model
In chapter 11, we developed the GPT-2XL model from scratch but were unable to train it due to its vast number of parameters. Training a model with 1.5 billion parameters requires supercomputing facilities and an enormous amount of data. Consequently, we loaded pretrained weights from OpenAI into our model and then used the GPT-2XL model to generate text.
12.1 Building and training a GPT from scratch
12.1.1 The architecture of a GPT to generate text
12.1.2 The training process of the GPT model to generate text
12.2 Tokenizing text of Hemingway novels
12.2.1 Tokenizing the text
12.2.2 Creating batches for training
12.3 Building a GPT to generate text
12.3.1 Model hyperparameters
12.3.2 Modeling the causal self-attention mechanism
12.3.3 Building the GPT model
12.4 Training the GPT model to generate text
12.4.1 Training the GPT model
12.4.2 A function to generate text
12.4.3 Text generation with different versions of the trained model
Summary