chapter twelve
This chapter covers
- Building a scaled-down version of the GPT-2XL model tailored to your needs
- Preparing data for training a GPT-style Transformer
- Training a GPT-style Transformer from scratch
- Generating text using the trained GPT model
In chapter 11, we developed the GPT-2XL model from scratch but were unable to train it due to its vast number of parameters. Training a model with 1.5 billion parameters requires supercomputing facilities and an enormous amount of data. Consequently, we loaded pretrained weights from OpenAI into our model and then used the GPT-2XL model to generate text.