3 Approximate membership: Bloom and quotient filters

This chapter covers

Learning what Bloom filters are and why and when they are useful
Configuring a Bloom filter in a practical setting
Exploring the interplay in Bloom filter parameters
Learning about quotient filters as Bloom filter replacements
Comparing the performance of a Bloom filter and a quotient filter

Bloom filters have become a standard in systems that process large datasets. Their widespread use, especially in networks and distributed databases, comes from the effectiveness they exhibit in situations where we need a hash table functionality but do not have the luxury of space. They were invented in the 1970s by Burton Bloom [1], but they only really “bloomed” in the last few decades due to an increasing need to tame and compress big datasets. Bloom filters have also piqued the interest of the computer science research community, which developed many variants on top of the basic data structure in order to address some of the filters’ shortcomings and adapt them to different contexts.

3.1 How it works

3.1.1 Insert

3.1.2 Lookup

3.2 Use cases

3.2.1 Bloom filters in networks: Squid

3.2.2 Bitcoin mobile app

3.3 A simple implementation

3.4 Configuring a Bloom filter

3.4.1 Playing with Bloom filters: Mini experiments

3.5 A bit of theory

3.5.1 Can we do better?

3.6 Bloom filter adaptations and alternatives

3.7 Quotient filter

3.7.1 Quotienting

3.7.2 Understanding metadata bits