10 Sequence-to-sequence models and attention

 

This chapter covers

  • Mapping one text sequence to another with a neural network
  • Understanding sequence-to-sequence tasks and how they’re different from the others you’ve learned about
  • Using encoder-decoder model architectures for translation and chat
  • Training a model to pay attention to what is important in a sequence

You now know how to create natural language models and use them for everything from sentiment classification to generating novel text (see chapter 9).

Could a neural network translate from English to German? Or even better, would it be possible to predict disease by translating genotype to phenotype (genes to body type)?[1] And what about the chatbot we’ve been talking about since the beginning of the book? Can a neural net carry on an entertaining conversation? These are all sequence-to-sequence problems. They map one sequence of indeterminate length to another sequence whose length is also unknown.

In this chapter, you’ll learn how to build sequence-to-sequence models using an encoder-decoder architecture.

10.1 Encoder-decoder architecture

Which of our previous architectures do you think might be useful for sequence-to-sequence problems? The word vector embedding model of chapter 6? The convolutional net of chapter 7 or the recurrent nets of chapter 8 and chapter 9? You guessed it; we’re going to build on the LSTM architecture from the last chapter.

10.1.1 Decoding thought

10.1.2 Look familiar?

10.1.3 Sequence-to-sequence conversation

10.1.4 LSTM review

10.2 Assembling a sequence-to-sequence pipeline

10.2.1 Preparing your dataset for the sequence-to-sequence training

10.2.2 Sequence-to-sequence model in Keras

10.2.3 Sequence encoder

10.2.4 Thought decoder

sitemap