5 Disjoint Set: Sub-linear-time processing of disjoint sets

This chapter covers

Solving a problem: how to keep a set partitioned into disjoint sets, merging partitions dynamically
Describing an API for a data structure for disjoint sets
Providing a simple linear-time solution for all methods
Improving the running time by using the right underlying data structure
Adding easy-to-implement heuristics to get quasi-constant running time
Recognizing use cases where the best solution is needed for performance

In this chapter we are going to introduce a problem that seems quite trivial. So trivial that many developers wouldn't even consider it worth a performance analysis, and just implement the obvious solution to it. Nevertheless, if the expression "wolf in sheep's clothing" was to be applied to data structures, then this would be the best heading for this chapter.

We will use a disjoint-set every time that, starting with a set of objects, we would like to account for the partitioning of this set into disjoint groups (i.e. subsets without any element in common between them), for instance: we might start with a list of wines, which would be the starting set, and we partition those wines depending on their flavor, creating a disjoint set where wines with a similar flavor are grouped together, and groups have no intersections with each other. A trivial example of disjoint-set is shown in figure 5.1.

5.1 The Distinct Subsets Problem

5.2 Reasoning on Solutions

5.3 Describing the Data Structure API: Disjoint Set

5.4 Naïve Solution[4]

5.4.1 Implementing Naïve Solution

5.5 Using a Tree-like Structure[10]

5.5.1 From List to Trees

5.5.2 Implementing the Tree Version

5.6 Heuristics to Improve the Running Time[12]

5.6.1 Path Compression

5.6.2 Implementing Balancing and Path Compression

5.7 Applications

5.7.1 Graphs: Connected Components

5.7.2 Graphs[14]: Kruskal Algorithm for Minimum Spanning Tree

5.7.3 Clustering

5.7.4 Unification