10 Training a Transformer to translate English to French
This chapter covers
- Tokenizing English and French phrases to subwords
- Understanding word embedding and positional encoding
- Training a Transformer from scratch to translate English to French
- Using the trained Transformer to translate an English phrase into French
In the last chapter, we built a Transformer from scratch that can translate between any two languages, based on the paper “Attention Is All You Need.”1 Specifically, we implemented the self-attention mechanism, using query, key, and value vectors to calculate scaled dot product attention (SDPA).
To have a deeper understanding of self-attention and Transformers, we’ll use English-to-French translation as our case study in this chapter. By exploring the process of training a model for converting English sentences into French, you will gain a deep understanding of the Transformer’s architecture and the functioning of the attention mechanism.