Chapter 7. Searching across languages

 

This chapter covers

  • Cross-language information retrieval
  • Statistical machine translation
  • Seq2seq models for machine translation
  • Word embeddings for machine translation
  • Comparing the effectiveness of machine translation methods for search

In this chapter, we’ll focus on expanding your ability to serve users who speak, read, and write queries in languages other than the language in which documents are written. Specifically, you’ll see how to use machine translation to build a search engine that can automatically translate queries so those queries can be used to search and deliver content from multiple languages. We’ll spend some time looking at how this translation ability can be useful in various contexts, from common web searches to more specific cases where it’s important not to miss search results due to a language barrier. The benefit of being able to automatically translate queries is that your search engines gain the ability to reach more users, without requiring you to store multiple copies of each text document in different languages.

7.1. Serving users who speak multiple languages

7.2. Statistical machine translation

7.3. Working with parallel corpora

7.4. Neural machine translation

7.5. Word and document embeddings for multiple languages

Summary