This chapter will be structured slightly differently from our book’s standard, simply because we will continue here a discussion started in chapter 8. We introduced a problem: searching multidimensional data for the nearest neighbor(s) of a generic point (possibly not in the dataset itself).
In this chapter, we follow up on those topics, so we won’t introduce a new problem, but pick up the “closest hub” example from chapter 8 and show a different option to solve it, using k-d trees.
Let’s recap where we left off in previous chapters. We are designing software for an e-commerce company, an application to find the closest warehouse selling a given product for any point on a very large map. See figure 9.1 to visualize. To have a ballpark idea of the kind of scale we need, we want to serve millions of clients per day across the country, taking products from thousands of warehouses, also spread across the map.
In section 8.2, we established that a brute-force solution where we skim through the whole list of points to compare differences can’t work for a live solution.