4 Bloom Filters:
Reducing the Memory Needed to Keep Track of Content
This chapter covers
- Describing and analyzing Bloom filters
- Solving a problem: keeping track of large documents using little memory
- Showing why dictionaries are an imperfect solution
- Improving the memory print by using Bloom filters
- Recognizing use cases where Bloom filters improve performance
- Using metrics to tune the quality of Bloom filters’ solution
Starting with this chapter we’ll be reviewing less common data structures that solve - as strange as it might seem - common problems. Bloom filters are one of the most prominent examples: they are widely used in most industries, but not as widely known as you would expect for such a cornerstone.
In section 4.1, we introduce the problem that will be our North star in this chapter: we need to keep track of large entities, with the smallest memory print possible.
In section 4.2 we continue our narration by discussing a few increasingly complex solutions, showing their strengths and weaknesses: the latter, in particular, ought to be considered chances of improvements, fertile ground for algorithms designers.
As part of this discussion, we introduce the dictionary, an abstract data type that we discuss in-depth in section 4.3, while section 4.4 switches to concrete data structures that implements dictionaries: hash tables, binary search trees, and bloom filters.