4 Frequency Estimation and Count-Min Sketch
This chapter covers:
- Understanding the streaming model and its constraints
- Exploring practical use cases where frequency estimates arise and how count-min sketch can help
- Learning how count-min sketch works
- Exploring the error in count-min sketch and configuring the data structure
- Understanding how range queries can be solved with count-min sketch
- Exploring heavy hitters and approximate heavy hitters with count-min sketch
Measuring frequency is one of the most common operations in today’s data-intensive applications. Any kind of popularity analysis on a massive-data application, such as producing the bestseller list on Amazon.com, computing top-k trending queries on Google, or monitoring the most frequent source-destination IP address pairs on the network are all frequency estimation problems. Estimating frequency also shows up when monitoring changes in systems that are awake 24/7, such as sensor networks or surveillance cameras. Here we can observe changes in parameters such as the temperature or location change of a sensor in the ocean, new object appearance in the frame, or the number of units by which a stock on the stock market rose or fell in a given time interval. For this purpose, in the next section, we introduce the streaming model of data that emphasizes challenges related to this particular setup.