chapter nine

9 Introducing generative models

This chapter covers

An explanation of the text generation problem.
An introduction to unsupervised learning.
Learning structure using attention mechanism.
Building up from simple probabilistic models to deep learning models.
The transformer architecture and its variants and applications.

Previously, we’ve seen AI, specifically deep learning, excel in the areas of classification and regression. However, the potential of this technology goes beyond passive data analysis. It can extend to generating new data, such as images, text, and even videos. This creative work, once considered to be within the exclusive purview of human intelligence, is now being significantly influenced by deep learning, fueling much of the AI boom and enthusiasm we’ve witnessed in recent years.

9.1 A motivating example: generating names character by character

9.2 Self-supervised learning

9.2.1 Limits of the Bigram Model

9.3 Generating our training data

9.4 Embeddings and multi-layer perceptrons

9.5 Attention

9.5.1 Dot Product Self-attention

9.5.2 Scaled dot product causal self-attention

9.6 Transformers

9.6.1 The Decoder

9.7 Other Transformer Architectures

9.7.1 The Encoder

9.7.2 The Encoder Decoder

9.8 Tokenization

9.8.1 Generating Sentences

9.9 Conclusion

9.10 Exercises

9.11 Summary