1 Introduction


This chapter covers

  • What this book is about and its structure
  • What makes this book different from other books on algorithms
  • How massive datasets shape the design of algorithms and data structures
  • How this book can help you design practical algorithms at a workplace
  • Computer and system architecture fundamentals that make large amounts of data challenging for today’s systems

Since you picked up this book, you might be wondering what the algorithms and data structures for massive datasets are and what makes them different from “normal” algorithms you might have encountered thus far. Does the title of this book imply that the classical algorithms (e.g., binary search, merge sort, quicksort, depth-first search, breadth-first search, and many other fundamental algorithms) as well as canonical data structures (e.g., arrays, matrices, hash tables, binary search trees, heaps) were built exclusively for small datasets?

1.1 An example

1.1.1 An example: How to solve it

1.1.2 How to solve it, take two: A book walkthrough

1.2 The structure of this book

1.3 What makes this book different and whom it is for

1.4 Why is massive data so challenging for today’s systems?

1.4.1 The CPU memory performance gap

1.4.2 Memory hierarchy

1.4.3 Latency vs. bandwidth

1.4.4 What about distributed systems?

1.5 Designing algorithms with hardware in mind