chapter nine
9 How transformers work
This chapter covers
- An explanation of the text generation problem
- An introduction to unsupervised learning
- Learning structure using an attention mechanism
- Building up from simple probabilistic models to deep learning models
- The transformer architecture and its variants and applications
While earlier chapters have showcased deep learning’s capabilities in regression and classification, the true transformative power of this technology extends far beyond analyzing existing data. Deep learning is now venturing into creative territory—generating entirely new images, composing original text, and even producing realistic videos. These generative capabilities, once considered to be within the exclusive purview of human intelligence, have become central to the current AI revolution, fueling much of the AI boom and enthusiasm we’ve witnessed in recent years.