Chapter 4. Sequence-to-sequence models and attention

 

Chapter 10 from Natural Language Processing in Action by Hobson Lane, Cole Howard, and Hannes Max Hapke

This chapter covers

  • Mapping one text sequence to another with a neural network
  • Understanding sequence-to-sequence tasks and how they’re different from the others you’ve learned about
  • Using encoder-decoder model architectures for translation and chat
  • Training a model to pay attention to what is important in a sequence

You now know how to create natural language models and use them for everything from sentiment classification to generating novel text (see chapter 9).

Could a neural network translate from English to German? Or even better, would it be possible to predict disease by translating genotype to phenotype (genes to body type)?[1] And what about the chatbot we’ve been talking about since the beginning of the book? Can a neural net carry on an entertaining conversation? These are all sequence-to-sequence problems. They map one sequence of indeterminate length to another sequence whose length is also unknown.

In this chapter, you’ll learn how to build sequence-to-sequence models using an encoder-decoder architecture.

10.1. Encoder-decoder architecture

10.2. Assembling a sequence-to-sequence pipeline

10.3. Training the sequence-to-sequence network

10.4. Building a chatbot using sequence-to-sequence networks

10.5. Enhancements

10.6. In the real world

Summary