chapter nine

9 Introducing the external memory model

This chapter covers

Introducing computer limitations that affect the design of data-intensive applications
Introducing and describing the external memory model (DAM model)
Building simple scanning, searching, and merging algorithms in external memory
Reviewing use cases where data scientists and programmers work with huge files
Using Big-O notation to measure I/O efficiency of the algorithms

This chapter introduces fundamental ideas that form part 3 of the book. We begin by introducing external memory algorithms and the external memory model [1]. This model will teach us how to view the efficiency of algorithms and data structures in the context of working with large datasets stored on disk.

Most applications maintain data on some type of local or remote storage, files and databases being prominent examples. Storage offers the flexibility of capturing large amounts of data persistently and very cheaply. Even when the system benefits from data summaries that quickly satisfy queries from RAM, we still want to preserve the original data on some slower and larger storage. As we have seen in the case of Bloom filters and Google’s WebTable, when the query returns Present, we make a trip to disk to fetch the (key,value) pair and metadata or to establish that we have a false positive.

9.1 External memory model: The preliminaries

9.2 Example 1: Finding a minimum

9.2.1 Use case: Minimum median income

9 Introducing the external memory model

This chapter covers

9.1 External memory model: The preliminaries

9.2 Example 1: Finding a minimum

9.2.1 Use case: Minimum median income

9.3 Example 2: Binary search

9.3.1 Bioinformatics use case

9.3.2 Runtime analysis

9.4 Optimal searching

9.5 Example 3: Merging K sorted lists

9.5.1 Merging time/date logs

9.5.2 External memory model: Simple or simplistic?

9.6 What’s next

Summary