10 Similarity Search Trees
Approximate Nearest Neighbors Search for Image Retrieval
This chapter covers
- Discussing the limits of k-d trees
- Describing image retrieval as a use case where k-d trees would struggle
- Introducing a new data structure, the R-tree
- Presenting SS-trees, a scalable variant of R-trees
- Comparing SS-trees and k-d trees
- Introducing approximate similarity search
This chapter will be structured slightly differently from our book’s standard, simply because we will continue here a discussion started in chapter 8. Back there, we introduced a problem, searching multidimensional data for the nearest neighbor(s) of a generic point (possibly not in the dataset itself); in chapter 9, we introduce k-d trees, a data structure specifically invented to solve this problem.
K-d trees are the best solution to date for indexing low to medium dimensional datasets that will completely fit in memory; when, however, either we have to operate on high-dimensional data, or with big datasets that won’t fit in memory, then k-d trees are not enough, we will need to use more advanced data structures.
In this chapter we first present a new problem, one that will push our indexing data-structure beyond its limits, and then introduce two new data structures, R-trees and SS-trees, that can help us solve this category of problems efficiently.