10 Similarity Search Trees: Approximate nearest neighbors search for image retrieval

This chapter covers

Discussing the limits of k-d trees
Describing image retrieval as a use case where k-d trees would struggle
Introducing a new data structure, the R-tree
Presenting SS-trees, a scalable variant of R-trees
Comparing SS-trees and k-d trees
Introducing approximate similarity search

This chapter will be structured slightly differently from our book’s standard, because we will continue here a discussion started in chapter 8. There, we introduced the problem of searching multidimensional data for the nearest neighbor(s) of a generic point (possibly not in the dataset itself). In chapter 9, we introduce k-d trees, a data structure specifically invented to solve this problem.

K-d trees are the best solution to date for indexing low- to medium-dimensional datasets that will completely fit in memory. When we have to operate on high-dimensional data or with big datasets that won’t fit in memory, k-d trees are not enough, and we will need to use more advanced data structures.

In this chapter we first present a new problem, one that will push our indexing data structure beyond its limits, and then introduce two new data structures, R-trees and SS-trees, that can help us solve this category of problems efficiently.

10.1 Right where we left off

10.1.1 A new (more complex) example

10.1.2 Overcoming k-d trees’ flaws

10.2 R-tree

10.2.1 A step back: Introducing B-trees

10.2.2 From B-Tree to R-tree

10.2.3 Inserting points in an R-tree

10.2.4 Search

10.3 Similarity search tree

10.3.1 SS-tree search

10.3.2 Insert

10.3.3 Insertion: Variance, means, and projections

10.3.4 Insertion: Split nodes

10.3.5 Delete

10.4 Similarity Search

10.4.1 Nearest neighbor search