concept count - min sketch in category algorithms

appears as: count-min sketch, count-min sketch, The count-min sketch
Algorithms and Data Structures for Massive Datasets MEAP V01

This is an excerpt from Manning's book Algorithms and Data Structures for Massive Datasets MEAP V01.

In this chapter, we will learn how to solve popularity problems of interest such as top-k queries, heavy hitters and frequency range queries under the constraint of limited space and time. We will see that with the limitations of the streaming model, many problems that had rather trivial solutions before can now only be solved approximately, yet with count-min sketch, we can achieve enormous space savings and not lose a lot of accuracy.

To that end, we will learn about Count-Min sketch. Count-min sketch has been devised by Cormode and Muthukrishnan in 2005[41] and can be thought of as a young, up-and-coming cousin of Bloom filter. Similarly to how Bloom filter answers membership queries approximately with less space than hash tables, the count-min sketch estimates frequencies of items in less space than a hash table or any linear-space key-value dictionary. Another important similarity is that the count-min sketch is hashing-based, so we continue in the vein of using hashing to create compact and approximate sketches of data. But as we will see in this chapter, count-min sketch is a different animal than Bloom filter, mainly due to the contexts in which its main task of estimating frequency arises.

An example of how estimate works is shown in Figure 4.3 below. As we can see, count-min sketch can overestimate the actual frequency of an item when, during updates for different items, hashes collide, but the overestimate only happens if there was a collision in each row.

Figure 4.3: Example of estimate operations on the count-min sketch from Figure 4.2. In the case of element y, whose true frequency is 1, count-min sketch reports the correct answer of 1 (the minimum of 1, 3 and 1). However, in the case of the element x, whose true frequency is 2, count-min sketch reports 3 (the minimum of 5, 3 and 5). Refer to the Figure 3.2 to convince yourself that during earlier update operations, y and z together incremented all the counters that are used by x, thus resulting in an overestimate for x.
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest