4 Bloom filters: Reducing the memory for tracking content

 

This chapter covers

  • Describing and analyzing Bloom filters
  • Keeping track of large documents using little memory
  • Showing why dictionaries are an imperfect solution
  • Improving the memory print by using Bloom filters
  • Recognizing use cases where Bloom filters improve performance
  • Using metrics to tune the quality of Bloom filters’ solutions

Starting with this chapter we’ll be reviewing less common data structures that solve, as strange as it might seem, common problems. Bloom filters are one of the most prominent examples; they are widely used in most industries, but not as widely known as you would expect for such a cornerstone.

In section 4.1 we introduce the problem that will be our North Star in this chapter: we need to keep track of large entities with the smallest memory print possible.

In section 4.2 we continue our narration by discussing a few increasingly complex solutions, showing their strengths and weaknesses; the latter, in particular, ought to be considered chances for improvement and fertile ground for algorithms designers.

As part of this discussion, we introduce the dictionary, an abstract data type that we discuss in depth in section 4.3, while section 4.4 switches to concrete data structures that implement dictionaries: hash tables, binary search trees, and Bloom filters.

4.1 The dictionary problem: Keeping track of things

4.2 Alternatives to implementing a dictionary

4.3 Describing the data structure API: Associative array

4.4 Concrete data structures

4.4.1 Unsorted array: Fast insertion, slow search

4.4.2 Sorted arrays and binary search: Slow insertion, fast(-ish) search

4.4.3 Hash table: Constant-time on average, unless you need ordering

4.4.4 Binary search tree: Every operation is logarithmic

4.4.5 Bloom filter: As fast as hash tables, but saves memory (with a catch)

4.5 Under the hood: How do Bloom filters work?

4.6 Implementation

4.6.1 Using a Bloom filter

4.6.2 Reading and writing bits

4.6.3 Find where a key is stored

sitemap