Minimum Spanning Trees
A spanning tree is a connected, acyclic subgraph of a graph . That is it is the subset of edges that are connected and acyclic. If G itself is not connected, then we can generalize this to a spanning forest.
A minimum spanning tree (MST) of a graph with weight function for each , is a spanning tree that has the smallest total weight.
How to find a minimum spanning tree
To find an MST, we can either start with every edge in the graph and remove edges till we get an MST or we can select edges till we form an MST.
For the removal method, we can start out with every edge in the graph and start eliminating the edges with the highest weights. An edge can be removed as long as we don't disconnect some vertex (ie if that is the only edge to a vertex, we can't remove it). We keep removing edges until we have an MST. However, this is not practical. If there are n nodes in a graph, a spanning tree has n-1 edges. potentially there are edges in total. This means that any algorithm that finds an MST in this manner would end up with worst case
Instead of removing edges, we can find the MST by building the MST instead. The general generic algorithm for doing this is as follows:
greedyMST(G=(V,E)){
T=[]; //empty set of edges
while(T is not an MST){
find an edge e from E that is safe for T
add e to T
}
}
Now... this is a very generic algorithm and there are parts not yet defined. Lets start with looking for an being safe for T. An edge e is safe for T iff adding e to T makes T subset of some MST of G. Of course... problem is how do we know if e is part of some MST of G when that is exactly what this function is suppose to find?
Theorem: If G is a connected, undirected weighted graph and T is a subset of some MST of G and e is any edge of minimum weight that are in different connected components of T, then adding e is safe for T (See text for proof)
Kruskal's Algorithm
Kruskal's algorithm starts by sorting edges according to weights. It then picks the smallest edge out and adds it to T only if the end points are in different connected components. If we use something like a BFS or DFS to find connected components, it will be very slow. Instead, what we can do is use a disjoint set
KruskalsMST(G=(V,E)){
T=[]; //set of edges that form MST, initially empty
sort edges in E by its weight
//create a set for each vertex
for(each vertex v in V) {
makeSet(v);
}
for(each edge e=(u,v) in E){
//find reps for endpoints of edge
ep1=findSet(u);
ep2=findSet(v);
//if different reps then they are not in same set
if(ep1!=ep2){
//union them the set together
union(ep1,ep2);
//add e to the result
add e to T;
}
}
}
Example
So let us consider the following graph:
Given the graph above, the edges sorted in non-descending order by weight are: (C,D), (F,G), (A,B),(D,E),(A,C), (B,E), (E,F),(B,G). We will exam the edges in this order in the example below
Step # | graph, T shown in yellow | Disjoint Sets | Comments |
---|---|---|---|
Initial | T= {} | {A}, {B}, {C}, {D},{E},{F},{G} | every vert is in its own disjoint set, initial MST is empty |
1 | T={(C,D)} | {A}, {B}, {CD}, {E},{F},{G} | (C,D) is added first. They were in different disjoint sets |
2 | T={(C,D), (F,G)} | {A}, {B}, {C,D}, {E},{F,G} | (F,G) is added next. They were not in the same disjoint set |
3 | T={(C,D), (F,G),(A,B)} | {A,B}, {C,D}, {E},{F,G} | (A,B) is added next as they were not in the same disjoint set |
4 | T={(C,D), (F,G),(A,B),(D,E)} | {A,B}, {C,D,E}, {F,G} | (D,E) is added next as they were not in the same disjoint set |
5 | T={(C,D), (F,G),(A,B),(D,E),(A,C)} | {A,B,C,D,E}, {F,G} | (A,C) is added next, they were not in the same disjoint set |
6 | T={(C,D), (F,G),(A,B),(D,E),(A,C)} | {A,B,C,D,E,F,G} | (B,E) was next but not added because B and E are in same disjoint instead. |
We next consider (E,F) and as they are in different sets we connect them.
(B,G) are in same disjoint set so we are now done |
Prim's Algorithm
We pick a vertex to be the "root" of the MST. After that we simply grow the tree by joining isolated vertices one at a time. An isolated vertex is any vertex that isn't part of the MST yet picking the smallest edge weight. To support this, we will use a MinHeap. We queue into this heap edges that will connect an isolated vertex with the current MST. We use infinity if there is no direct edge yet to any vertex in the MST
Step | Graph | Heap (vertex, parent, weight to parent), listed with by priority (smaller weight) | Comments |
---|---|---|---|
Initial | (A, NIL,0),(B, NIL,-),(C, NIL,-),(D, NIL,-),(E, NIL,-),(F, NIL,-),(G, NIL,-) | ||
Initial state. pick A as root | |||
1 | T={} | (B, A,2),(C, A,3),(D, NIL,-),(E, NIL,-),(F, NIL,-),(G, NIL,-) | update edge weights, parents of B and C |
2 | T={(A,B)} | (C, A, 3),(E, B, 4),(G, B, 5),(D, NIL,-),(F, NIL,-) | Take out vertex with smallest weight(B) and add edge to its parent to the MST. update edge weights and parents of E and G |
3 | T={(A,B), (A,C)} | (D, C, 1),(E, B, 4),(G, B, 5),(F, NIL,-) | Take out vertex with smallest weight (C) and add edge to its parent to the MST. update edge weights and parent of D. |
4 | T={(A,B),(A,C),(C,D)} | (E, D, 2),(G, B, 5),(F, NIL,-) | Take out vertex with smallest weight (D) and add edge to its parent to the MST. update edge weights and parent of E because cost to E is less going through D and not B. |
5 | T={(A,B),(A,C),(C,D),(D,E)} | (F, E, 4),(G, B, 5) | Take out vertex with smallest weight (E) and add edge to its parent to the MST. update edge weights and parent of F |
6 | T={(A,B),(A,C),(C,D),(D,E),(E,F)} | (G, F, 1) | Take out vertex with smallest weight (F) and add edge to its parent to the MST. update edge weights and parent of G because cost to G is less going through F and not B. |
7 | T={(A,B),(A,C),(C,D),(D,E),(E,F),(F,G)} | empty | Take out vertex with smallest weight (G) and add edge to its parent to the MST. |