So far in this book we have worked with containers that were holding unidimensional data: the entries that we stored in queues, trees, and hash tables were always assumed to be (or to be translatable to) numbers—simple values that could be compared in the most intuitive mathematical sense.
In this chapter, we will see how this simplification doesn’t always hold true in real datasets, and we’ll examine the issues connected to handling more complex, multidimensional data. Do not despair, though, because in the next chapters we will also describe data structures that can help handle this data and see real applications that leverage efficient nearest neighbor search as part of their workflow, such as clustering.
As we’ll see, the discussion on this area is particularly rich, and there is no way that we can cover it, not even just its crucial bits, in a single chapter. Therefore, while in part 1 each chapter was following the same pattern to explain topics, we’ll have to stretch this pattern to the whole part 2, where each chapter will cover only a single piece of our usual discussion: