chapter two

Chapter 2. K-d Trees: Multidimensional data indexing

Chapter 9 from Advanced Algorithms and Data Structures by Marcello La Rocca

This chapter covers

Efficiently indexing a 2-D (and in general k-D) dataset
Implementing nearest neighbor search with k-d trees
Discussing k-d trees strengths and flaws

This chapter will be structured slightly differently from our book’s standard, simply because we will continue here a discussion started in chapter 8. Back there, we introduced a problem, searching multidimensional data for the nearest neighbor(s) of a generic point (possibly not in the dataset itself); in chapter 9, we introduce k-d trees, a data structure specifically invented to solve this problem.

In this chapter, we follow up on those topics, and so we won’t introduce a new problem, but pick up the “closest hub” example from chapter 8, and show a different option to solve it, using an evolution of k-d trees, SS-trees[1].

9.1 Right Where We Left

Let’s recap where we left off in previous chapters: we are designing software for an e-commerce company, an application to find the closest warehouse selling a given product, for any point on a very large map: see figure 9.1 to visualize. To have a ballpark idea of the kind of scale we need, we want to serve millions of clients per day across the Country, taking products from thousands of warehouses, also spread across the whole map.

9.2 Moving to k-D Spaces: Cycle Through Dimensions

9.2.1 Constructing the BST

9.2.2 Invariants

Chapter 2. K-d Trees: Multidimensional data indexing

This chapter covers

9.1 Right Where We Left

9.2 Moving to k-D Spaces: Cycle Through Dimensions

9.2.1 Constructing the BST

9.2.2 Invariants

9.2.3 The Importance of Being Balanced

9.3 Methods

9.3.1 Search

9.3.2 Insert

9.3.3 Balanced Tree

9.3.4 Remove

9.3.5 Nearest Neighbor

9.3.6 Region Search

9.3.7 A Recap of all Methods

9.4 Limits and Possible Improvements

9.5 Summary