3 Approximate Membership: Bloom filter and Quotient filter

 

This chapter covers

  • Learning what Bloom filters are, why and when they are useful
  • Understanding how Bloom filters work
  • Configuring a Bloom filter in a practical setting
  • Exploring the interplay between Bloom filter parameters
  • Learning about quotient filter as a Bloom filter replacement
  • Understanding how quotienting works, as well as merging and resizing of a quotient filter
  • Comparing the performance of Bloom filter and quotient filter

Bloom filters have become a standard in systems that process large datasets. Their widespread use, especially in networks and distributed databases, comes from the effectiveness they exhibit in situations where we need a hash table functionality, but do not have the luxury of space. They were invented in 1970s by Burton Bloom[1],[2] but they only really “bloomed” in the last few decades due to an increasing need to tame and compress big datasets. Bloom filters have also piqued the interest of the computer science research community who developed many variants on top of the basic data structure, in order to address some of its shortcomings and adapt it to different contexts.

3.1 How It Works

3.1.1 Insert

3.1.2 Lookup

3.2 Use Cases

3.2.1 Bloom filters in networks: Squid

3.2.2 Bitcoin mobile app

3.3 A simple implementation

3.4 Configuring a Bloom filter

3.4.1 Playing with Bloom filters: Mini experiments

3.5 A bit of theory

3.6 Bloom filter adaptations and alternatives

3.7 Quotient filter

3.7.1 Quotienting

3.7.2 Understanding metadata bits

3.7.3 Inserting into a quotient filter: an example

3.7.5 Resizing and merging

3.7.6 False positive rate and space considerations

3.8 Comparison between Bloom filters and quotient filters

3.9 Summary