9 How transformers work
This chapter covers
- An explanation of the text generation problem.
- An introduction to unsupervised learning.
- Learning structure using attention mechanism.
- Building up from simple probabilistic models to deep learning models.
- The transformer architecture and its variants and applications.
While earlier chapters have showcased deep learning’s capabilities in regression and classification, the true transformative power of this technology extends far beyond analyzing existing data. Deep learning now ventures into creative territory—generating entirely new images, composing original text, and even producing realistic videos. These generative capabilities, once considered to be within the exclusive purview of human intelligence, have become central to the current AI revolution, fueling much of the AI boom and enthusiasm we’ve witnessed in recent years.