Data Structures for Disjoint Sets

Some applications involve grouping n distinct objects into a collection of disjoint sets. Two important operations are then finding which set a given object belongs to and uniting the two sets.
A disjoint set data structure maintains a collection S={ S1 , S2 ,, Sk } of disjoint dynamic sets. Each set is identified by a representative, which usually is a member in the set.

1  Operations on Sets

Let x be an object. We wish to support the following operations.

2  Applications of Disjoint-set Data Structures

Here we show two application examples.

2.1  Algorithm for Connected Components

CONNECTED-COMPONENTS (G)
 1    for each vV
 2             do  MAKE-SET (v)
 3    for every edge (u,v)E
 4             do if  FIND-SET (u) FIND-SET (v)
 5                         then  UNION (u,v)
SAME-COMPONENTS (u,v)
 1    if  FIND-SET (u) = FIND-SET (v)
 2          then return TRUE
 3    return FALSE
images/DisjointSets_ConnectedComponents.png
Figure 1: A graph with four connected components: {a,b,c,d}, {e,f,g}, {h,i}, and {j}.
Edge processed Collection of disjoint sets
initial sets {a} {b} {c} {d} {e} {f} {g} {h} {i} {j}
(b,d) {a} {b,d} {c} {e} {f} {g} {h} {i} {j}
(e,g) {a} {b,d} {c} {e,g} {f} {h} {i} {j}
(a,c) {a,c} {b,d} {e,g} {f} {h} {i} {j}
(h,i) {a,c} {b,d} {e,g} {f} {h,i} {j}
(a,b) {a,b,c,d} {e,g} {f} {h,i} {j}
(e,f) {a,b,c,d} {e,f,g} {h,i} {j}
(b,c) {a,b,c,d} {e,f,g} {h,i} {j}
Table 1: This table shows the state of the collection of disjoint sets as each edge is processed. The processed edge is listed on the left and the rest of the columns show the state of the collection.

3  The Disjoint Set Representation

3.1  The Linked-list Representation

A set can be represented by a linked list. In this representation, each node has a pointer to the next node and a pointer to the first node.
images/DisjointSets_LinkedList.png
Figure 2: Linked-list representation. Each disjoint set can be represented by a linked-list. Each node has a pointer to the head node and a pointer to the next node.
Consider the following operation sequence:
MAKE-SET ( x1 ),
MAKE-SET ( x2 ),
:,
MAKE-SET ( xn-1 ),
MAKE-SET ( xn ),
UNION ( x1 , x2 ),
UNION ( x2 , x3 ),
:,
UNION ( xn-1 , xn ).
What's the total time complexity of the above operations?
A weighted-union heuristic: Assume that the representative of each set maintains the number of objects in the set, and always merge the smaller list to the larger list, then
Theorem: Using the linked list representation of disjoint sets and the weighted-union heuristic, a sequence of m MAKE-SET, Union, and Find_Set operations, n of which are MAKE-SET, takes O(m+nlogn) time.
Hint: observe that for any kn, after x's representative pointer has been updated logk times, the resulting set containing x must have at least k members.

4  Disjoint Set Forests

A faster implementation of disjoint sets is through the rooted trees, with each node containing one member and each tree representing one set. Each member in the tree has only one parent.
images/DisjointSets_RootedTree.png
Figure 3: Rooted tree representation of disjoint sets. Each tree has a "representative" h for the left tree and a for the right tree.

4.1  The Heuristics for Disjoint Set Operations

  1. union by rank.
  2. path compression
The idea of union by rank is to make the root of the tree with fewer nodes point to the root of the tree with more nodes. Rather than explicitly keeping track of the size of the subtree rooted at each node, for each node we maintain a rank that approximates the logarithm of the subtree size and is also an upper bound on the height of the node.
Time complexity: O(mlogn), assuming that there are m union operations.
Path compression is quite simple and very effective. We use this approach during FIND-SET operations to make each node on the path point directly to the root. Path compression does not change any ranks.
images/DisjointSets_PathCompression.png
Figure 4: Path compression takes place during the FIND-SET operation. This works by recursing from the given input node up to the root of the tree, forming a path. Then the root is returned and assigned as the parent for each node in path. The parent of c after FIND-SET (c) is h.
Time complexity: Θ(f log1+f/n) n) if fn and Θ(n+flogn) otherwise, assume that there are n MAKE-SET operations and f FIND-SET operations.
MAKE-SET (x)
 1     p[x]   x
 2     rank[x]   0
FIND-SET (x)
 1    if  xp[x]
 2          then  p[x]   FIND-SET (p[x])
 3    return  p[x]
UNION (x,y)
 1     LINK( FIND-SET (x), FIND-SET (y))
LINK (x,y)
 1    if  rank[x]>rank[y]
 2          then  p[y]   x
 3          else   p[x]   y
 4                      if  rank[x]=rank[y]
 5                            then  rank[y]   rank[y]+1
where rank[x] is the height of x in the tree. If both of the above methods are used together, the time complexity is O(mα(m,n)).
The Rank properties



File translated from TEX by TTM, version 3.67.
On 31 Mar 2006, 18:12.