The idea for writing this book took form while we were teaching together at the International University of Sarajevo. In discussion with our students, who were working for local companies, we realized that data structures for massive data were becoming pretty common in everyday use for data engineers and data scientists. It was not just the Googles and the Facebooks of the world that employed these techniques to solve their scalability problems; it was also the companies with much smaller data footprints whose systems were starting to face ever-increasing demands on data-processing speeds.
Over lunch, we would ponder where a student learning to deploy HyperLogLog or a Bloom filter into a working production system could go for an application-friendly overview of it. The original papers introducing these data structures were often mathematically very deep, but with little context for a data engineer trying to fit this data structure into a real system with real data. Aside from an occasional blog post featuring a data structure implementation, resources that bundled this massive data domain-specific algorithmic knowledge were scarce to nonexistent.