10 Training a Transformer to translate English to French

 

This chapter covers

  • Tokenizing English and French phrases to subwords
  • Understanding word embedding and positional encoding
  • Training a Transformer from scratch to translate English to French
  • Using the trained Transformer to translate an English phrase into French

In the last chapter, we built a Transformer from scratch that can translate between any two languages, based on the paper “Attention Is All You Need.”1 Specifically, we implemented the self-attention mechanism, using query, key, and value vectors to calculate scaled dot product attention (SDPA).

To have a deeper understanding of self-attention and Transformers, we’ll use English-to-French translation as our case study in this chapter. By exploring the process of training a model for converting English sentences into French, you will gain a deep understanding of the Transformer’s architecture and the functioning of the attention mechanism.

10.1 Subword tokenization

 
 
 

10.1.1 Tokenizing English and French phrases

 

10.1.2 Sequence padding and batch creation

 
 

10.2 Word embedding and positional encoding

 
 

10.2.1 Word embedding

 
 
 
 

10.2.2 Positional encoding

 
 
 
 

10.3 Training the Transformer for English-to-French translation

 
 
 

10.3.1 Loss function and the optimizer

 
 
 

10.3.2 The training loop

 
 
 

10.4 Translating English to French with the trained model

 
 
 

Summary

 
 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest