Some applications involve grouping
$n$ distinct objects into a
collection of disjoint sets. Two important operations are then
finding which set a given object belongs to and uniting the two
sets.
A
Figure 1: A graph with four connected components:
$\{a,b,c,d\}$,
$\{e,f,g\}$,
$\{h,i\}$, and
$\{j\}$.

Table 1: This table shows the state of the collection of disjoint
sets as each edge is processed. The processed edge is listed on the
left and the rest of the columns show the state of the collection.
Figure 2: Linked-list representation. Each disjoint set can be
represented by a linked-list. Each node has a pointer to the
head node and a pointer to the next node.
Consider the following operation sequence:
Figure 3: Rooted tree representation of disjoint sets. Each tree has
a "representative"
$h$ for the left tree and
$a$ for the
right tree.
Figure 4: Path compression takes place during the
FIND-SET operation. This works by recursing from the
given input node up to the root of the tree, forming a
Time complexity:
$\Theta (f{\mathrm{log}}_{1+f/n)}n)$ if
$f\ge n$ and
$\Theta (n+f\mathrm{log}n)$ otherwise, assume that there are
$n$
MAKE-SET operations and
$f$ FIND-SET operations.

**disjoint set data structure**maintains a collection $S=\{{S}_{1},{S}_{2},\dots ,{S}_{k}\}$ of disjoint*dynamic*sets. Each set is identified by a representative, which usually is a member in the set.## 1 Operations on Sets

Let $x$ be an object. We wish to support the following operations.- MAKE-SET( $x$) creates a new set whose only member is pointed by $x$; Note that $x$ is not in the other sets.
- UNION( $x,y$) unites two dynamic sets containing objects $x$ and $y$, say ${S}_{x}$ and ${S}_{y}$, into a new set that ${S}_{x}\cup {S}_{y}$, assuming that ${S}_{x}\cap {S}_{y}=\varnothing $;
- FIND-SET( $x$) returns a pointer to the representative of the set containing $x$.
- INSERT( $a,S$) inserts an object $a$ to $S$, and returns $S\cup \{a\}$.
- DELETE( $a,S$) deletes an object $a$ from $S$, and returns $S-\{a\}$.
- SPLIT( $a,S$) partitions the objects of $S$ into two sets ${S}_{1}$ and ${S}_{2}$ such that ${S}_{1}=\{b\hspace{0.5em}|\hspace{0.5em}b\le a\hspace{0.5em}\&\hspace{0.5em}b\in S\}$, and ${S}_{2}=S-{S}_{1}$.
- MINIMUM( $S$) returns the minimum object in $S$.

## 2 Applications of Disjoint-set Data Structures

Here we show two application examples.- Connected components (CCs)
- Minimum Spanning Trees (MSTs)

### 2.1 Algorithm for Connected Components

CONNECTED-COMPONENTS $(G)$

1foreach $v\in V$

2doMAKE-SET $(v)$

3forevery edge $(u,v)\in E$

4doifFIND-SET $(u)$ $\ne $ FIND-SET $(v)$

5thenUNION $(u,v)$

SAME-COMPONENTS $(u,v)$

1ifFIND-SET $(u)$ $=$ FIND-SET $(v)$

2thenreturnTRUE

3returnFALSE

Edge processed | Collection of disjoint sets | |||||||||

initial sets | $\{a\}$ | $\{b\}$ | $\{c\}$ | $\{d\}$ | $\{e\}$ | $\{f\}$ | $\{g\}$ | $\{h\}$ | $\{i\}$ | $\{j\}$ |

$(b,d)$ | $\{a\}$ | $\{b,d\}$ | $\{c\}$ | $\{e\}$ | $\{f\}$ | $\{g\}$ | $\{h\}$ | $\{i\}$ | $\{j\}$ | |

$(e,g)$ | $\{a\}$ | $\{b,d\}$ | $\{c\}$ | $\{e,g\}$ | $\{f\}$ | $\{h\}$ | $\{i\}$ | $\{j\}$ | ||

$(a,c)$ | $\{a,c\}$ | $\{b,d\}$ | $\{e,g\}$ | $\{f\}$ | $\{h\}$ | $\{i\}$ | $\{j\}$ | |||

$(h,i)$ | $\{a,c\}$ | $\{b,d\}$ | $\{e,g\}$ | $\{f\}$ | $\{h,i\}$ | $\{j\}$ | ||||

$(a,b)$ | $\{a,b,c,d\}$ | $\{e,g\}$ | $\{f\}$ | $\{h,i\}$ | $\{j\}$ | |||||

$(e,f)$ | $\{a,b,c,d\}$ | $\{e,f,g\}$ | $\{h,i\}$ | $\{j\}$ | ||||||

$(b,c)$ | $\{a,b,c,d\}$ | $\{e,f,g\}$ | $\{h,i\}$ | $\{j\}$ |

## 3 The Disjoint Set Representation

### 3.1 The Linked-list Representation

A set can be represented by a linked list. In this representation, each node has a pointer to the next node and a pointer to the first node.MAKE-SET $({x}_{1})$,What's the total time complexity of the above operations?

MAKE-SET $({x}_{2})$,

$:$,

MAKE-SET $({x}_{n-1})$,

MAKE-SET $({x}_{n})$,

UNION $({x}_{1},{x}_{2})$,

UNION $({x}_{2},{x}_{3})$,

$:$,

UNION $({x}_{n-1},{x}_{n})$.

**A weighted-union heuristic:**Assume that the representative of each set maintains the number of objects in the set, and always merge the smaller list to the larger list, then**Theorem:**Using the linked list representation of disjoint sets and the weighted-union heuristic, a sequence of $m$ MAKE-SET, Union, and Find_Set operations, $n$ of which are MAKE-SET, takes $O(m+n\mathrm{log}n)$ time.*Hint:*observe that for any $k\le n$, after $x$'s representative pointer has been updated $\lceil \mathrm{log}k\rceil $ times, the resulting set containing $x$ must have at least $k$ members.## 4 Disjoint Set Forests

A faster implementation of disjoint sets is through the rooted trees, with each node containing one member and each tree representing one set. Each member in the tree has only one parent.### 4.1 The Heuristics for Disjoint Set Operations

- union by rank.
- path compression

**union by rank**is to make the root of the tree with fewer nodes point to the root of the tree with more nodes. Rather than explicitly keeping track of the size of the subtree rooted at each node, for each node we maintain a*rank*that approximates the logarithm of the subtree size and is also an upper bound on the height of the node. Time complexity: $O(m\mathrm{log}n)$, assuming that there are $m$ union operations.**Path compression**is quite simple and very effective. We use this approach during FIND-SET operations to make each node on the path point directly to the root. Path compression does not change any ranks.*path*. Then the root is returned and assigned as the parent for each node in

*path*. The parent of $c$ after FIND-SET $(c)$ is $h$.

MAKE-SET $(x)$

1 $p[x]$$\leftarrow $$x$

2 $\mathrm{rank}[x]$$\leftarrow $$0$

FIND-SET $(x)$

1if$x\ne p[x]$

2then$p[x]$$\leftarrow $FIND-SET $(p[x])$

3return$p[x]$

UNION $(x,y)$

1 LINK( FIND-SET $(x)$, FIND-SET $(y)$)

LINK $(x,y)$where $\mathrm{rank}[x]$ is the height of $x$ in the tree. If both of the above methods are used together, the time complexity is $O(m\alpha (m,n))$.

1if$\mathrm{rank}[x]>\mathrm{rank}[y]$

2then$p[y]$$\leftarrow $$x$

3else$p[x]$$\leftarrow $$y$

4if$\mathrm{rank}[x]=\mathrm{rank}[y]$

5then$\mathrm{rank}[y]$$\leftarrow $$\mathrm{rank}[y]+1$

**The Rank properties**- $\mathrm{rank}[x]\le \mathrm{rank}[p[x]]$
- for any tree root $x$, $\mathrm{size}(x)\ge {2}^{\mathrm{rank}[x]}$ (Link operation)
- for any integer $r$, there are at most $n/{2}^{r}$ nodes of rank $r$
- each node has rank at most $\lfloor \mathrm{log}n\rfloor $, assuming there are at $n$ objects involved.