10 Data structures for databases: B-trees, Bε-trees, and LSM-trees

 

This chapter covers

  • Learning how database indices work under the hood
  • Exploring data structures that live underneath MySQL, LevelDB, RocksDB, TokuDB, and so on
  • Learning what B-trees are and how lookups, inserts, and deletes in B-trees work
  • Understanding how Bε-trees work and how buffering helps writes
  • Learning how log-structured merge trees (LSM-trees) work and their performance benefits

Choosing the right database for one’s application requires some understanding of the way in which different database engines are built. Specifically, most databases implement indices to speed up the search of their large and frequently queried tables. Commonly, an index is placed on one of the columns to speed up the searches on that column. To understand all the performance ramifications of creating an index, we need to understand how different databases build and maintain their indices.

In the most basic terms, an index is a data structure, usually separate from the database table itself, that helps to efficiently route the query to the correct row back into the table. Without an index, the search operation boils down to a linear scan of keys in a given column. When dealing with systems that record 10 billion new rows daily or more (or less!), it is safe to say that the linear search will not cut it.

10.1 How indexing works

10.2 Data structures in this chapter

10.3 B-trees

10.3.1 B-tree balancing

10.3.2 Lookup

10.3.3 Insert

10.3.4 Delete

10.3.5 B+-trees

10.3.6 How operations on a B+-tree are different

10.3.7 Use case: B-trees in MySQL (and many other places)

10.4 Math bit: Why are B-tree lookups optimal in external memory?

10.4.1 Why B-tree inserts/deletes are not optimal in external memory

10.5 Bε-trees