4 Finding meaning in word counts: Semantic analysis
This chapter covers
- Analyzing semantics (meaning) to create topic vectors
- Semantic search using the semantic similarity between topic vectors
- Scalable semantic analysis and semantic search for large corpora
- Using semantic components (topics) as features in your NLP pipeline
- Navigating high-dimensional vector spaces
Through the first few chapters, you have learned quite a few natural language processing tricks, but now may be the first time you will be able to do a little bit of “magic.” This is the first time we will talk about a machine being able to understand the meanings of words.
The term frequency–inverse document frequency (TF–IDF) vectors you learned about in chapter 3 helped you estimate the importance of words in a chunk of text. You used TF–IDF vectors and matrices to tell you how important each word is to the overall meaning of a bit of text in a document collection. These TF–IDF “importance” scores worked not only for words but also for short sequences of words, n-grams. They are great for searching text if you know the exact words or n-grams you’re looking for, but they also have certain limitations. Often, you need a representation that takes not just counts of words but also their meanings.