10 Data Structures for Databases: B-trees, Bε-trees, LSM-trees
This chapter covers
- Learning how database indexes work under the hood
- Exploring data structures that live underneath MySQL, LevelDB, RocksDB, TokuDB, etc.
- Learning what B-trees are and how lookups, inserts and deletes in B-trees work
- Understanding how
-trees work, and how buffering helps writes
- Learning how log-structured merge-trees (LSM-trees) work, and their performance benefits
Choosing the right database for one’s application requires some understanding of the way in which different database engines are built. Specifically, most databases implement indexes to speed up the search of their large and frequently queried tables. Commonly, an index is placed on one of the columns to speed up the searches on that column. To understand all the performance ramifications of creating an index, we need to understand how different databases build and maintain their indexes.
In most basic terms, an index is a data structure, usually separate from the database table itself, that helps to efficiently route the query to the correct row back into the table. Without an index, the search operation boils down to a linear scan of keys in a given column. When dealing with systems that record 10 billion new rows daily or more (or less!), it is safe to say that the linear search will not cut it.