The common thread for the central part of this book is nearest neighbor search. It is first introduced as yet another special case in search, then used as a building block of more advanced algorithms.
This section opens with a description of the issues and challenges that are found when dealing with multi-dimensional data: indexing these data and performing spatial queries. We will once again show how ad hoc data structures can provide drastic improvements over using basic search algorithms.
Next, this section describes two advanced data structures that can be used to search multi-dimensional data.
In the second half of this part, we’ll check out applications of nearest neighbor search, starting with some practical examples, and then focusing on clustering, which heavily leverages spatial queries. Talking about clustering also allows us to introduce distributed computing, in particular the MapReduce programming model, which can be used to process volumes of data that are too large to be handled by any single machine.