chapter ten

10 Data Structures for Databases: B-trees, B^ε-trees, LSM-trees

This chapter covers

Learning how database indexes work under the hood
Exploring data structures that live underneath MySQL, LevelDB, RocksDB, TokuDB, etc.
Learning what B-trees are and how lookups, inserts and deletes in B-trees work
Understanding how -trees work, and how buffering helps writes
Learning how log-structured merge-trees (LSM-trees) work, and their performance benefits

Choosing the right database for one’s application requires some understanding of the way in which different database engines are built. Specifically, most databases implement indexes to speed up the search of their large and frequently queried tables. Commonly, an index is placed on one of the columns to speed up the searches on that column. To understand all the performance ramifications of creating an index, we need to understand how different databases build and maintain their indexes.

In most basic terms, an index is a data structure, usually separate from the database table itself, that helps to efficiently route the query to the correct row back into the table. Without an index, the search operation boils down to a linear scan of keys in a given column. When dealing with systems that record 10 billion new rows daily or more (or less!), it is safe to say that the linear search will not cut it.

10.1 How indexing works

10.2 Data structures in this chapter

10.3 B-trees

10.3.1 B-tree balancing

10.3.2 Lookup

10.3.3 Insert

10.3.4 Delete

10.3.5 B⁺-trees

10.3.6 How operations on a B^ε-tree are different

10.3.7 Use case: B-trees in MySQL (and many other places)

10.4 Math bit: Why are B-tree lookups optimal in external memory?

10.4.1 Why B-tree inserts/deletes are not optimal in external memory?

10.5 B^ε-trees

10.5.1 B^ε-tree: How it works

10.5.2 Buffering mechanics

10.5.3 Inserts and deletes

10.5.4 Lookups

10.5.5 Cost Analysis

10.5.6 B^ε-tree: The spectrum of data structures

10.5.7 Use case: B^ε-trees in TokuDB

10.5.8 Make haste slowly, the I/O way

10.6 Log-Structured Merge-trees (LSM-trees)

10.6.1 LSM-tree: How it works

10.6.2 LSM-tree cost analysis

10.6.3 Use case: LSM-trees in Cassandra

10.7 Summary

10 Data Structures for Databases: B-trees, Bε-trees, LSM-trees