3 Math with words: Term frequency–inverse document frequency vectors
- Counting words, n-grams, and term frequencies to analyze meaning
- Predicting word occurrence probabilities with Zipf’s law
- Representing natural language texts as vectors
- Finding relevant documents in a collection of text using document frequencies
- Estimating the similarity of pairs of documents with cosine similarity
3.1 Bag-of-words vectors
3.2 Vectorizing text DataFrame constructor
3.2.1 Faster, better, easier token counting
3.2.2 Vectorizing your code
3.2.3 Vector space TF–IDF (term frequency–inverse document frequency)
3.3 Vector distance and similarity
3.3.1 Dot product
3.4 Counting TF–IDF frequencies
3.4.1 Analyzing “this”
3.5 Zipf’s law
3.6 Inverse document frequency
3.6.1 Return of Zipf
3.6.2 Relevance ranking
3.6.3 Smoothing out the math
3.7 Using TF–IDF for your bot
3.8 What’s next
3.9 Test yourself
Summary