This chapter covers
- Mapping one text sequence to another with a neural network
- Understanding sequence-to-sequence tasks and how they’re different from the others you’ve learned about
- Using encoder-decoder model architectures for translation and chat
- Training a model to pay attention to what is important in a sequence
You now know how to create natural language models and use them for everything from sentiment classification to generating novel text (see chapter 9).
Could a neural network translate from English to German? Or even better, would it be possible to predict disease by translating genotype to phenotype (genes to body type)?[1] And what about the chatbot we’ve been talking about since the beginning of the book? Can a neural net carry on an entertaining conversation? These are all sequence-to-sequence problems. They map one sequence of indeterminate length to another sequence whose length is also unknown.
1 geno2pheno: https://academic.oup.com/nar/article/31/13/3850/2904197.
In this chapter, you’ll learn how to build sequence-to-sequence models using an encoder-decoder architecture.
Which of our previous architectures do you think might be useful for sequence-to-sequence problems? The word vector embedding model of chapter 6? The convolutional net of chapter 7 or the recurrent nets of chapter 8 and chapter 9? You guessed it; we’re going to build on the LSTM architecture from the last chapter.