chapter twelve

Chapter 12. Efficiently finding frequent itemsets with FP-growth

This chapter covers

Finding common patterns in transaction data
The FP-growth algorithm
Finding co-occurring words in a Twitter feed

Have you ever gone to a search engine, typed in a word or part of a word, and the search engine automatically completed the search term for you? Perhaps it recommended something you didn’t even know existed, and you searched for that instead. That has happened to me, sometimes with comical results when I started a search with “why does....” To come up with those search terms, researchers at the search company used a version of the algorithm we’ll discuss in this chapter. They looked at words used on the internet and found pairs of words that frequently occur together.^[1] This requires a way to find frequent itemsets efficiently.

¹ J. Han, J. Pei, Y. Yin, R. Mao, “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,” Data Mining and Knowledge Discovery 8 (2004), 53–87.

Chapter 12. Efficiently finding frequent itemsets with FP-growth

This chapter covers

12.1. FP-trees: an efficient way to encode a dataset

12.2. Build an FP-tree

12.3. Mining frequent items from an FP-tree

12.4. Example: finding co-occurring words in a Twitter feed

12.5. Example: mining a clickstream from a news site

12.6. Summary