Part 3 starts by introducing neural attention (chapter 7). This chapter shows how simple forms of attention may improve models and our understanding of what these models do with data. Chapter 8 introduces the concept of multitask learning, where several tasks are learned at the same time, a technique that may assist with learning the separate tasks involved. Chapter 9 introduces Transformers, including BERT and its competitor, XLNet. In chapter 10, we get hands-on with BERT and inspect the embeddings it produces.