Chapter 3. From plain retrieval to text generation

 

This chapter covers

  • Expanding queries
  • Using search logs to build training data
  • Understanding recurrent neural networks
  • Generating alternative queries with RNNs

In the early days of the internet and search engines (late 1990s), people only searched for keywords. Users might have typed “movie zemeckis future” to find information about the movie Back to the Future, directed by Robert Zemeckis. Although search engines have evolved, and today we can type queries using natural language, many users still rely on keywords when searching. For these users, it would be advantageous if the search engine could generate a proper query based on the keywords they type: for example, taking “movie Zemeckis future” and generating “Back to the Future by Robert Zemeckis.” Let’s call the generated query an alternative query, in the sense that it’s an alternative (text) representation of the information need expressed by the user.

This chapter will teach you how to add text-generation capabilities to your search engine so that, given a user query, it will generate a few alternative queries to run under the hood together with the original one. The goal is to express the query in additional ways so as to widen the net of the search—without asking the user to think of or type in alternatives. To add text generation to a search engine, you’ll use a powerful architecture for neural networks called a recurrent neural network (RNN).

3.1. Information need vs. query: Bridging the gap

3.2. Learning over sequences

3.3. Recurrent neural networks

3.4. LSTM networks for unsupervised text generation

3.5. From unsupervised to supervised text generation

3.6. Considerations for production systems

Summary