6 Reasoning with word vectors (Word2vec)

This chapter covers

Understanding how word vectors are created
Using pretrained models for your applications
Reasoning with word vectors to solve real problems
Visualizing word vectors
Uncovering some surprising uses for word embeddings

One of the most exciting recent advancements in NLP is the “discovery” of word vectors. This chapter will help you understand what they are and how to use them to do some surprisingly powerful things. You’ll learn how to recover some of the fuzziness and subtlety of word meaning that was lost in the approximations of earlier chapters.

In the previous chapters, we ignored the nearby context of a word. We ignored the words around each word. We ignored the effect the neighbors of a word have on its meaning and how those relationships affect the overall meaning of a statement. Our bag-of-words concept jumbled all the words from each document together into a statistical bag. In this chapter, you’ll create much smaller bags of words from a “neighborhood” of only a few words, typically fewer than 10 tokens. You’ll also ensure that these neighborhoods of meaning don’t spill over into adjacent sentences. This process will help focus your word vector training on the relevant words.

6.1 Semantic queries and analogies

6.1.1 Analogy questions

6.2 Word vectors

6.2.1 Vector-oriented reasoning

6.2.2 How to compute Word2vec representations

6.2.3 How to use the gensim.word2vec module

6.2.4 How to generate your own word vector representations

6.2.5 Word2vec vs. GloVe (Global Vectors)

6.2.6 fastText

6.2.7 Word2vec vs. LSA

6.2.8 Visualizing word relationships

6.2.9 Unnatural words

6.2.10 Document similarity with Doc2vec

Summary