chapter eight

8 Nearest Neighbors search

 

This chapter covers

  • Solving a problem: finding closest points in a multidimensional dataset
  • Introducing data structures to index multidimensional space
  • Understanding the issues in indexing high dimensional spaces
  • Introducing efficient nearest neighbor search

So far in this book we have worked with containers that were holding unidimensional data: the entries we stored in queues, trees and hash tables were always assumed to be (or to be translatable to) numbers, simple values that could be compared in the most intuitive mathematical sense.

 In this chapter, we will see how this simplification doesn’t always hold true in real datasets, and the issues connected to handling more complex, multidimensional data. Do not despair, though, as in the next chapters we will also describe data structures that can help handle these data, and see real applications that leverage efficient nearest neighbor search as part of their workflow, like clustering.

Let’s start our journey for this chapter from figure 8.1: a map showing a few cities (taken from an alternate universe) and the locations of some warehouses in their area.

Imagine you are living in 1990s, the dawn of the Internet era, with e-commerce moving its very first steps.