chapter one

1 Introduction

This chapter covers

What this book is about and its structure
What makes this book different than other books on algorithms
How massive datasets shape the design of algorithms and data structures
How this book can help you design practical algorithms at a workplace
Fundamentals of computer and system architecture that make massive data challenging for today’s systems

Having picked up this book, you might be wondering what the algorithms and data structures for massive datasets are, and what makes them different than “normal” algorithms you might have encountered thus far? Does the title of this book imply that the classical algorithms (e.g., binary search, merge sort, quicksort, depth-first search, breadth-first search and many other fundamental algorithms) as well as canonical data structures (e.g., arrays, matrices, hash tables, binary search trees, heaps) were built exclusively for small datasets? And if so, why no one has told you that.

1.1 An example

1.1.1 An example: how to solve it

1.1.2 How to solve it, take two/ A book walkthrough

1.2 The structure of this book

1.3 What makes this book different and whom it is for

1.4 Why is massive data so challenging for today’s systems?

1.4.1 The CPU-memory performance gap

1.4.2 Memory hierarchy

1.4.3 Latency vs. bandwidth

1.4.4 What about distributed systems?

1.5 Designing algorithms with hardware in mind

1.6 Summary