Chapter 12. Efficiently finding frequent itemsets with FP-growth

 

This chapter covers

  • Finding common patterns in transaction data
  • The FP-growth algorithm
  • Finding co-occurring words in a Twitter feed

Have you ever gone to a search engine, typed in a word or part of a word, and the search engine automatically completed the search term for you? Perhaps it recommended something you didn’t even know existed, and you searched for that instead. That has happened to me, sometimes with comical results when I started a search with “why does....” To come up with those search terms, researchers at the search company used a version of the algorithm we’ll discuss in this chapter. They looked at words used on the internet and found pairs of words that frequently occur together.[1] This requires a way to find frequent itemsets efficiently.

1 J. Han, J. Pei, Y. Yin, R. Mao, “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,” Data Mining and Knowledge Discovery 8 (2004), 53–87.

12.1. FP-trees: an efficient way to encode a dataset

12.2. Build an FP-tree

12.3. Mining frequent items from an FP-tree

12.4. Example: finding co-occurring words in a Twitter feed

12.5. Example: mining a clickstream from a news site

12.6. Summary