chapter five

5 State-of-the-art in deep learning: Transformers

 

This chapter covers,

  • Representing text in numerical format for machine learning models
  • Implementing a basic small-scale Transformer while using the Keras Sub-classing API to create reusable layers that will form the basic building blocks of the Transformer model

We have seen many different deep learning models so far. Namely, fully-connected networks, convolutional neural networks and recurrent neural networks. We used a full-connected network to reconstruct corrupted images, a convolutional neural network to classify vehicles from other images and finally a RNN to predict future CO2 concentration values. In this chapter we are going to talk about a new type of model known as the Transformer.

5.1      Representing text as numbers

5.2      Understanding the Transformer model

5.2.1   The encoder-decoder view of the Transformer

5.2.2   Diving deeper

5.2.3   Self-attention layer

5.2.4   Understanding self-attention using scalars

5.2.5   Self-attention as a cooking competition

5.2.6   Masked self-attention layers

5.2.7   Multi-head attention

5.2.8   Fully-connected layer

5.2.9   Putting everything together

5.3      Summary

5.4      Answers