- How the attention mechanism assigns weights to elements in a sequence
- Building an encoder–decoder transformer from scratch for language translation
- Word embedding and positional encoding
- Training a transformer from scratch to translate German to English
Understanding attention and transformer architectures is foundational for modern generative AI, especially for text-to-image models. This chapter comes at the very beginning of our journey to build a text-to-image generator from scratch for two reasons:
2.1 An overview of attention and transformers
2.1.1 How the attention mechanism works
2.1.2 How to create a transformer
2.2 Word embedding and positional encoding
2.2.1 Word tokenization with the Spacy library
2.2.2 A sequence padding function
2.2.3 Input embedding from word embedding and positional encoding
2.3 Creating an encoder–decoder transformer
2.3.1 Coding the attention mechanism
2.3.2 Defining the Transformer() class
2.3.3 Creating a language translator
2.4 Training and using the German-to-English translator
2.4.1 Training the encoder–decoder transformer
2.4.2 Translating German to English with the trained model
Summary