11 Sequence-to-sequence

This chapter covers

Preparing a sequence-to-sequence dataset and loader
Combining RNNs with attention mechanisms
Building a machine translation model
Interpreting attentionscores to understand a model’s decisions

Now that we have learned about attention mechanisms, we can wield them to build something new and powerful. In particular, we will develop an algorithm known as sequence-to-sequence (Seq2Seq for short) that can perform machine translation. As the name implies, this is an approach for getting neural networks to take one sequence as input and produce a different sequence as the output. Seq2Seq has been used to get computers to perform symbolic calculus,¹ summarize long documents,² and even translate from one language to another. I’ll show you step by step how we can translate from English to French. In fact, Google used essentially the same approach as its production machine-translation tool, and you can read about it at https://ai.googleblog.com/2016/09/a-neural-network-for-machine.html. If you can imagine your inputs/outputs as sequences of things, there is a good chance Seq2Seq can help you solve the task.

11.1 Sequence-to-sequence as a kind of denoising autoencoder

11.1.1 Adding attention creates Seq2Seq

11.2 Machine translation and the data loader

11.2.1 Loading a small English-French dataset

11.3 Inputs to Seq2Seq

11.3.1 Autoregressive approach

11.3.2 Teacher-forcing approach

11.3.3 Teacher forcing vs. an autoregressive approach

11.4 Seq2Seq with attention

11.4.1 Implementing Seq2Seq

11.4.2 Training and evaluation

Exercises

Summary