chapter three

3 Text embeddings

 

This chapter covers

  • Preparing texts for deep learning purposes, using word and document embeddings
  • Showing you the benefits and drawbacks of self-developed text embeddings and pre-trained embeddings developed by others
  • Implementing word similarity with Word2Vec
  • Retrieving documents via document embeddings, using Doc2Vec

After reading this chapter, you will have a practical command of basic and popular text embedding algorithms, and you will have developed insight into how to use embeddings for NLP. We will go through a number of concrete scenarios to reach that goal.

But first, let’s review the basics of embeddings.

3.1 Embeddings

Figure 3.1. There are different types of embeddings: Representational and Procedural. We will discuss two types of procedural embeddings in detail: Word2Vec and Doc2Vec.
mental model chapter3 block1

Embeddings are procedures for converting input data into vector representations. As mentioned in Chapter 1, a vector is like a container (such as an array) containing numbers. Every vector lives in a multidimensional vector space, as a single point, with every value interpreted as a value across a specific dimension. Embeddings result from systematic, well-crafted procedures for projecting ('embedding') input data into such a space.

3.1.1 Embedding by direct computation: representational embeddings

3.1.2 Learning to embed: procedural embeddings

3.2 From words to vectors: Word2Vec

3.3 From documents to vectors: Doc2Vec

3.4 Summary

3.5 Resources

3.6 References