Chapter 2. K-d Trees: Multidimensional data indexing

 

Chapter 9 from Advanced Algorithms and Data Structures by Marcello La Rocca

This chapter covers

  • Efficiently indexing a 2-D (and in general k-D) dataset
  • Implementing nearest neighbor search with k-d trees
  • Discussing k-d trees strengths and flaws

This chapter will be structured slightly differently from our book’s standard, simply because we will continue here a discussion started in chapter 8. Back there, we introduced a problem, searching multidimensional data for the nearest neighbor(s) of a generic point (possibly not in the dataset itself); in chapter 9, we introduce k-d trees, a data structure specifically invented to solve this problem.

In this chapter, we follow up on those topics, and so we won’t introduce a new problem, but pick up the “closest hub” example from chapter 8, and show a different option to solve it, using an evolution of k-d trees, SS-trees[1].

9.1      Right Where We Left

Let’s recap where we left off in previous chapters: we are designing software for an e-commerce company, an application to find the closest warehouse selling a given product, for any point on a very large map: see figure 9.1 to visualize. To have a ballpark idea of the kind of scale we need, we want to serve millions of clients per day across the Country, taking products from thousands of warehouses, also spread across the whole map.

9.2      Moving to k-D Spaces: Cycle Through Dimensions

9.2.1   Constructing the BST

9.2.2   Invariants

9.2.3   The Importance of Being Balanced

9.3      Methods

9.3.1   Search

9.3.2   Insert

9.3.3   Balanced Tree

9.3.4   Remove

9.3.5   Nearest Neighbor

9.3.6   Region Search

9.3.7   A Recap of all Methods

9.4      Limits and Possible Improvements

9.5      Summary