chapter eight

8 Nearest Neighbors search

This chapter covers

Solving a problem: finding closest points in a multidimensional dataset
Introducing data structures to index multidimensional space
Understanding the issues in indexing high dimensional spaces
Introducing efficient nearest neighbor search

So far in this book we have worked with containers that were holding unidimensional data: the entries we stored in queues, trees and hash tables were always assumed to be (or to be translatable to) numbers, simple values that could be compared in the most intuitive mathematical sense.

In this chapter, we will see how this simplification doesn’t always hold true in real datasets, and the issues connected to handling more complex, multidimensional data. Do not despair, though, as in the next chapters we will also describe data structures that can help handle these data, and see real applications that leverage efficient nearest neighbor search as part of their workflow, like clustering.

8 Nearest Neighbors search

This chapter covers

8.1 The Nearest Neighbors Search Problem

8.2 Solutions

8.2.1 First Attempts

8.2.2 Sometimes Caching is NOT the Answer

8.2.3 Simplify Things to Get a Hint

8.2.4 Carefully Choose a Data Structure

8.3 Description and API

8.4 Moving to k-dimensional Spaces

8.4.1 Unidimensional Binary Search

8.4.2 Moving to Higher Dimensions

8.4.3 Modeling 2D Partitions with a Data Structure

8.5 Summary