3 Approximate Membership and Bloom Filter

This chapter covers:

Learning what Bloom filters are, why and when they are useful
Understanding how Bloom filters work
Configuring a Bloom filter in a practical setting
Exploring the interplay between Bloom filter parameters
Learning about quotient filter as a Bloom filter replacement
Understanding how quotient filter works, and its comparison to the Bloom filter

Bloom filters seem to be all the rage these days. Most self-respecting industry blogs have articles fleshing out how Bloom filters enhance the performance in their infrastructure, and there are dozens of Bloom filter implementations floating around in various programming languages, each touting its own benefits. Bloom filters are also interesting to computer science researchers, who have, in past decade, designed many modifications and alternatives to the basic data structure, enhancing its various aspects. A skeptic and a curmudgeon in you might ask himself: What’s all the hype?

The large part of the reason behind Bloom filter popularity is that they have that combination of being a fairly simple data structure to design and implement, yet very useful in many contexts. They were invented in 1970s by Burton Bloom[28],[29] but they only really “bloomed” in the last few decades with the onslaught of large amount of data in various domains, and the need to tame and compress such huge datasets.

3.1 How It Works

3.1.1 Insert

3.1.2 Lookup

3.2 Use Cases

3.2.1 Bloom Filters in Networks: Squid

3.2.2 Bitcoin mobile app

3.3 Configuring a Bloom filter for your application

3.3.1 Examples

3.4 A bit of theory

3.4.1 Can we do better?

3.5 Further reading: Bloom filter adaptations and alternatives

3.6 Quotient filter

3.6.1 Quotienting

3.6.2 Resizing

3.7 Summary