9 K-d Trees: Multi-dimensional Data Indexing

 

This chapter covers

  • Efficiently indexing a 2-D (and in general k-D) dataset
  • Implementing nearest neighbor search with k-d trees
  • Discussing k-d trees strengths and flaws

This chapter will be structured slightly differently from our book’s standard, simply because we will continue here a discussion started in chapter 8. Back there, we introduced a problem, searching multidimensional data for the nearest neighbor(s) of a generic point (possibly not in the dataset itself); in chapter 9, we introduce k-d trees, a data structure specifically invented to solve this problem.

In this chapter, we follow up on those topics, and so we won’t introduce a new problem, but pick up the “closest hub” example from chapter 8, and show a different option to solve it, using an evolution of k-d trees, SS-trees[39].

Let’s recap where we left off in previous chapters: we are designing software for an e-commerce company, an application to find the closest warehouse selling a given product, for any point on a very large map: see figure 9.1 to visualize. To have a ballpark idea of the kind of scale we need, we want to serve millions of clients per day across the Country, taking products from thousands of warehouses, also spread across the whole map.

In section 8.2, we have already established that a brute-force solution where we skim through the whole list of points to compare differences can’t work for a live solution.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest