After reading this chapter, you will have a practical command of basic and popular text embedding algorithms, and you will have developed insight into how to use embeddings for NLP. We will go through a number of concrete scenarios to reach that goal. But first, let’s review the basics of embeddings.
Embeddings are procedures for converting input data into vector representations. As mentioned in chapter 1, a vector is like a container (such as an array) containing numbers. Every vector lives in a multidimensional vector space, as a single point, with every value interpreted as a value across a specific dimension. Embeddings result from systematic, well-crafted procedures for projecting (embedding) input data into such a space.
We have seen ample vector representations of texts in chapters 1 and 2, such as one-hot vectors (binary-valued vectors with one bit “on” for a specific word), used for bag-of-word representations and frequency- or TF.IDF-based vectors. All these vector representations were created by embeddings.