|
| 1 | +# A Deep Dive into Weighted Quick-Union with Path Compression |
| 2 | + |
| 3 | +In the world of computer science, some algorithms are so efficient and elegant that they feel like magic. The **Weighted Quick-Union with Path Compression** algorithm is one of them. It's the gold standard for solving a class of problems known as "dynamic connectivity" problems. |
| 4 | + |
| 5 | +Let's break it down. |
| 6 | + |
| 7 | +## The Problem: Dynamic Connectivity |
| 8 | + |
| 9 | +Imagine you have a set of objects. Over time, you're told that certain pairs of these objects are now connected. The fundamental question you want to answer, at any point, is: "Are object A and object B connected?" |
| 10 | + |
| 11 | +This "connection" could mean anything: |
| 12 | +- **Social Networks:** Are two people in the same network of friends? |
| 13 | +- **Computer Networks:** Can two computers on a network send messages to each other? |
| 14 | +- **Image Processing:** Are two pixels part of the same contiguous region of color? |
| 15 | +- **Maze Solving:** Is there a path from the start to the end? |
| 16 | + |
| 17 | +The data structure that handles this is often called a **Union-Find** or **Disjoint-Set Union (DSU)**. It has two primary operations: |
| 18 | +1. `union(p, q)`: Connect object `p` and object `q`. |
| 19 | +2. `find(p)`: Find the "identifier" of the group that `p` belongs to. If `find(p)` equals `find(q)`, then `p` and `q` are connected. |
| 20 | + |
| 21 | +## The Journey to Optimization |
| 22 | + |
| 23 | +Let's build up to the final, optimized algorithm by looking at simpler versions first. We'll represent our objects as nodes in a forest (a collection of trees). Each tree represents a connected component. The root of the tree is the unique identifier for that component. |
| 24 | + |
| 25 | +### Attempt 1: Quick-Union |
| 26 | + |
| 27 | +In this approach, each node has a pointer to its parent. To find the root of a node, you just follow the parent pointers until you reach a node that points to itself. |
| 28 | + |
| 29 | +- `find(p)`: Follow `parent[p]` until you reach the root. |
| 30 | +- `union(p, q)`: Find the root of `p` (let's call it `rootP`) and the root of `q` (`rootQ`). Then, simply set the parent of `rootP` to be `rootQ`. |
| 31 | + |
| 32 | +**The Problem:** This can lead to very tall, skinny trees. Imagine connecting items in a line: `union(0,1)`, `union(1,2)`, `union(2,3)`, ... |
| 33 | +The tree becomes a long chain. A `find` operation on the deepest node would have to traverse the entire chain, making it slow (O(N) in the worst case). |
| 34 | + |
| 35 | +``` |
| 36 | +union(0,1), union(1,2), union(2,3) |
| 37 | + |
| 38 | + 3 |
| 39 | + | |
| 40 | + 2 |
| 41 | + | |
| 42 | + 1 |
| 43 | + | |
| 44 | + 0 |
| 45 | +``` |
| 46 | + |
| 47 | +### Attempt 2: Weighted Quick-Union (Union by Size/Rank) |
| 48 | + |
| 49 | +To avoid creating long chains, we can be smarter about our `union` operation. Instead of arbitrarily connecting one root to another, let's keep track of the "size" (number of nodes) of each tree. |
| 50 | + |
| 51 | +When we perform `union(p, q)`, we find their roots (`rootP` and `rootQ`). We then connect the *smaller* tree to the root of the *larger* tree. |
| 52 | + |
| 53 | +This simple change has a profound impact. It ensures our trees stay relatively short and bushy, preventing the worst-case scenario of a long chain. The maximum height of any tree is now guaranteed to be at most log(N), which makes our `find` operation much faster (O(log N)). |
| 54 | + |
| 55 | +**Example:** |
| 56 | +- Tree A has 5 nodes. |
| 57 | +- Tree B has 2 nodes. |
| 58 | +- To union them, we make the root of Tree B a child of the root of Tree A. The new combined tree has a size of 7. |
| 59 | + |
| 60 | +### The Final Touch: Path Compression |
| 61 | + |
| 62 | +We can do even better. This optimization is applied during the `find` operation and is incredibly clever. |
| 63 | + |
| 64 | +When we call `find(p)`, we traverse a path of nodes from `p` up to the root. After we find the root, we can go back along that same path and make every node we visited point *directly* to the root. |
| 65 | + |
| 66 | +**Before Path Compression:** |
| 67 | +`find(0)` requires traversing 0 -> 1 -> 2 -> 3 -> 4 (root) |
| 68 | + |
| 69 | +``` |
| 70 | + 4 |
| 71 | + | |
| 72 | + 3 |
| 73 | + / \ |
| 74 | + 2 5 |
| 75 | + | |
| 76 | + 1 |
| 77 | + | |
| 78 | + 0 |
| 79 | +``` |
| 80 | + |
| 81 | +**After Path Compression on `find(0)`:** |
| 82 | +Now, nodes 0, 1, 2, and 3 all point directly to the root, 4. |
| 83 | + |
| 84 | +``` |
| 85 | + 4 |
| 86 | + / | | \ |
| 87 | + / | | \ |
| 88 | + 3 2 1 0 |
| 89 | + | |
| 90 | + 5 |
| 91 | +``` |
| 92 | + |
| 93 | +The next time we call `find` on any of those nodes (0, 1, 2, or 3), we'll get to the root in a single step! Over many operations, this keeps the trees incredibly flat. |
| 94 | + |
| 95 | +## The Result: Nearly Constant Time |
| 96 | + |
| 97 | +When you combine **Weighted Quick-Union** with **Path Compression**, the performance becomes astonishingly good. The amortized time complexity for both `union` and `find` is nearly constant, often written as O(α(N)), where α(N) is the Inverse Ackermann function. |
| 98 | + |
| 99 | +This function grows so slowly that for any input size you could possibly encounter in the real world (even larger than the number of atoms in the universe), α(N) is never greater than 5. For all practical purposes, the algorithm runs in constant time per operation. |
| 100 | + |
| 101 | +## Golang Implementation |
| 102 | + |
| 103 | +Here is a full implementation in Go that combines both optimizations. |
| 104 | + |
| 105 | +```go |
| 106 | +package main |
| 107 | + |
| 108 | +import "fmt" |
| 109 | + |
| 110 | +// WeightedQuickUnionPathCompression implements the union-find data structure with |
| 111 | +// both weighting and path compression optimizations. |
| 112 | +type WeightedQuickUnionPathCompression struct { |
| 113 | + // parent[i] = parent of i |
| 114 | + parent []int |
| 115 | + // size[i] = number of nodes in the subtree rooted at i |
| 116 | + size []int |
| 117 | + // count is the number of disjoint sets |
| 118 | + count int |
| 119 | +} |
| 120 | + |
| 121 | +// New initializes a new union-find data structure with n elements. |
| 122 | +// Each element initially is in its own set. |
| 123 | +func New(n int) *WeightedQuickUnionPathCompression { |
| 124 | + parent := make([]int, n) |
| 125 | + size := make([]int, n) |
| 126 | + for i := 0; i < n; i++ { |
| 127 | + parent[i] = i |
| 128 | + size[i] = 1 |
| 129 | + } |
| 130 | + return &WeightedQuickUnionPathCompression{ |
| 131 | + parent: parent, |
| 132 | + size: size, |
| 133 | + count: n, |
| 134 | + } |
| 135 | +} |
| 136 | + |
| 137 | +// Find returns the root of the component/set containing element p. |
| 138 | +// It uses path compression to flatten the tree structure. |
| 139 | +func (uf *WeightedQuickUnionPathCompression) Find(p int) int { |
| 140 | + // Find the root |
| 141 | + root := p |
| 142 | + for root != uf.parent[root] { |
| 143 | + root = uf.parent[root] |
| 144 | + } |
| 145 | + // Path compression: make every node on the path point to the root |
| 146 | + for p != root { |
| 147 | + newp := uf.parent[p] |
| 148 | + uf.parent[p] = root |
| 149 | + p = newp |
| 150 | + } |
| 151 | + return root |
| 152 | +} |
| 153 | + |
| 154 | +// Connected returns true if elements p and q are in the same set. |
| 155 | +func (uf *WeightedQuickUnionPathCompression) Connected(p, q int) bool { |
| 156 | + return uf.Find(p) == uf.Find(q) |
| 157 | +} |
| 158 | + |
| 159 | +// Union merges the set containing element p with the set containing element q. |
| 160 | +// It uses weighting (union by size) to keep the trees flat. |
| 161 | +func (uf *WeightedQuickUnionPathCompression) Union(p, q int) { |
| 162 | + rootP := uf.Find(p) |
| 163 | + rootQ := uf.Find(q) |
| 164 | + |
| 165 | + if rootP == rootQ { |
| 166 | + return |
| 167 | + } |
| 168 | + |
| 169 | + // Weighted union: attach the smaller tree to the root of the larger tree. |
| 170 | + if uf.size[rootP] < uf.size[rootQ] { |
| 171 | + uf.parent[rootP] = rootQ |
| 172 | + uf.size[rootQ] += uf.size[rootP] |
| 173 | + } else { |
| 174 | + uf.parent[rootQ] = rootP |
| 175 | + uf.size[rootP] += uf.size[rootQ] |
| 176 | + } |
| 177 | + uf.count-- |
| 178 | +} |
| 179 | + |
| 180 | +// Count returns the number of disjoint sets. |
| 181 | +func (uf *WeightedQuickUnionPathCompression) Count() int { |
| 182 | + return uf.count |
| 183 | +} |
| 184 | + |
| 185 | +func main() { |
| 186 | + // Example Usage: |
| 187 | + // Consider 10 elements, 0 through 9. |
| 188 | + uf := New(10) |
| 189 | + fmt.Printf("Initial components: %d\n", uf.Count()) // 10 |
| 190 | + |
| 191 | + uf.Union(4, 3) |
| 192 | + uf.Union(3, 8) |
| 193 | + uf.Union(6, 5) |
| 194 | + uf.Union(9, 4) |
| 195 | + uf.Union(2, 1) |
| 196 | + |
| 197 | + fmt.Printf("Are 8 and 9 connected? %t\n", uf.Connected(8, 9)) // true |
| 198 | + fmt.Printf("Are 5 and 4 connected? %t\n", uf.Connected(5, 4)) // false |
| 199 | + |
| 200 | + uf.Union(5, 0) |
| 201 | + uf.Union(7, 2) |
| 202 | + uf.Union(6, 1) |
| 203 | + uf.Union(1, 8) |
| 204 | + |
| 205 | + fmt.Printf("Are 5 and 4 connected now? %t\n", uf.Connected(5, 4)) // true |
| 206 | + fmt.Printf("Final components: %d\n", uf.Count()) // 1 |
| 207 | +} |
| 208 | + |
| 209 | +``` |
| 210 | + |
| 211 | +## Conclusion |
| 212 | + |
| 213 | +The Weighted Quick-Union with Path Compression algorithm is a testament to how clever optimizations can turn a slow, impractical solution into one that is breathtakingly fast. It's a fundamental tool in a programmer's arsenal, perfect for any problem that can be modeled as a set of objects with evolving connections. Its elegance and efficiency make it a classic and beautiful piece of computer science. |
0 commit comments