chapter nine

9 Stackable deep learning: Transformers

This chapter covers

Seeing how transformers enable limitless stacking and scaling
Fine-tuning transformers for your application
Applying transformers to extractive and abstraction summarization of long documents
Generating plausible, grammatically correct text with transformers
Estimating the information capacity of a transformer

Transformers are changing the world. The increased intelligence transformers bring to AI is transforming culture, society, and the economy. For the first time, transformers are making us question the long-term economic value of human intelligence and creativity. And the ripple effects of transformers extend beyond just the economy. Transformers are changing not only how we work and play but even how we think, communicate, and create. Within less than a year, transformer-enabled AI, known as large language models (LLMs), created whole new job categories, such as prompt engineering, real-time content curation, and fact-checking (grounding). Tech companies are racing to recruit engineers who can design effective LLM prompts and incorporate LLMs into their workflows. Transformers are automating and accelerating productivity for information economy jobs that previously required a level of creativity and abstraction out of reach for machines.

9.1 Recursion vs. recurrence

9.1.1 Attention is not all you need

9.1.2 A LEGO set for language

9.2 Filling the attention gaps

9.2.1 Positional encoding

9.2.2 Connecting all the pieces

9.2.3 Transformer translation

9.3 Bidirectional backpropagation and BERT

9.3.1 Tokenization and pretraining

9.3.2 Fine-tuning

9.3.3 Implementation

9.3.4 Fine-tuning a pretrained BERT model for text classification

9.4 Test yourself

Summary