chapter five

5 State-of-the-art in deep learning: Transformers

This chapter covers

Representing text in numerical format for machine learning models
Building a Transformer model using the Keras sub-classing API

We have seen many different deep learning models so far, namely fully connected networks, convolutional neural networks, and recurrent neural networks. We used a fully connected network to reconstruct corrupted images, a convolutional neural network to classify vehicles from other images, and finally an RNN to predict future CO2 concentration values. In this chapter we are going to talk about a new type of model known as the Transformer.

5.1 Representing text as numbers

5.2 Understanding the Transformer model

5.2.1 The encoder-decoder view of the Transformer

5.2.2 Diving deeper

5.2.3 Self-attention layer

5.2.4 Understanding self-attention using scalars

5.2.5 Self-attention as a cooking competition

5.2.6 Masked self-attention layers

5.2.7 Multi-head attention

5.2.8 Fully connected layer

5.2.9 Putting everything together

Summary

Answers to exercises