Algorithms
Topological Sort
Strongly connected components
Minimum Spanning Trees
Example DAG
Watch
Socks
Shoes
Undershorts
Pants
Belt Tie
Shirt
Jacket
a DAG implies an
ordering on events
Example DAG
Watch
Socks
Shoes
Undershorts
Pants
Belt Tie
Shirt
Jacket
In a complex DAG, it
can be hard to find a
schedule that obeys
all the constraints.
Directed Acyclic Graphs
● A directed acyclic graph or DAG is a directed
graph with no directed cycles:
DFS and DAGs
● Theorem: a directed graph G is acyclic iff a DFS of G
yields no back edges:
■ => if G is acyclic, will be no back edges
○ Trivial: a back edge implies a cycle
■ <= if no back edges, G is acyclic
○ Proof by contradiction: G has a cycle a back edge
 Let v be the vertex on the cycle first discovered, and u be the
predecessor of v on the cycle
 When v discovered, whole cycle is white
 Must visit everything reachable from v before returning from DFS-
Visit()
 So path from u v is gray gray, thus (u, v) is a back edge
Topological Sort
Topological Sort
● For a directed acyclic graph G = (V,E)
● A topological sort is an ordering of all of G’s
vertices v1, v2, …, vn such that...
● vertex u comes before vertex v if edge (u, v) G
Formally: for every edge (vi,vk) in E, i<k.
Visually: all arrows are pointing to the right
Topological sort
● There are often many possible topological
sorts of a given DAG
● Topological orders for this DAG :
○ 1,2,5,4,3,6,7
○ 2,1,5,4,7,3,6
○ 2,5,1,4,7,3,6
○ Etc.
● Each topological order is a feasible schedule.
1
4
76
3 5
2
Topological Sorts for Cyclic
Graphs?
Impossible!
1 2
3
• If v and w are two vertices on a cycle, there
exist paths from v to w and from w to v.
• Any ordering will contradict one of these paths
A Topological Sort Algorithm
Topological-Sort()
{
1. Call DFS to compute finish time f[v] for
each vertex
2. As each vertex is finished, insert it onto
the front of a linked list
3. Return the linked list of vertices
}
● Time: O(V+E)
● Correctness: need to prove that
(u,v) G f[u]>f[v]
Correctness of Topological Sort
● Lemma: (u,v) G f[u] > f[v]
■ When (u,v) is explored, u is gray, consider the
following cases:
1. v is gray (u,v) is back edge. Can’t happen, if G is
a DAG.
2. v if white v becomes descendent of u
f[v] < f[u]
(since must finish v before backtracking and
finishing u)
3. v is black v already finished f[v] < f[u]
Strongly Connected Directed graphs
● Every pair of vertices are reachable from each other
a
b
d
c
e
f
g
Strongly-Connected
Graph G is strongly connected if, for every u
and v in V, there is some path from u to v and
some path from v to u.
Strongly
Connected
Not Strongly
Connected
Strongly-Connected Components
A strongly connected component of a graph is a
maximal subset of nodes (along with their
associated edges) that is strongly connected.
Nodes share a strongly connected component
if they are inter-reachable.
Strongly Connected Components
a
b
d
c
e
f
g
{ a , c , g }
{ f , d , e , b }
Reduced Component Graph of
Strongly Connected Components
a
b
d
c
e
f
g
{ a , c , g }
{ f , d , e , b }
● Component graph GSCC=(VSCC, ESCC): one
vertex for each component
■ (u, v) ESCC if there exists at least one directed
edge from the corresponding components
Strongly-Connected Components
Strongly-Connected-Components(G)
1. call DFS(G) to compute finishing times f[u] for each
vertex u.
2. compute GT
3. call DFS(GT), but in the main loop of DFS, consider
the vertices in order of decreasing f[u]
4. output the vertices of each tree in the depth-first forest
of step 3 as a separate strongly connected component.
The graph GT is the transpose of G, which is visualized
by reversing the arrows on the digraph.
Runtime
Lines 1 and 3 are (E+V) due to DFS
Line 2 involves creating an adjacency list or
matrix, and it is also O(E+V)
Line 4 is constant time
So, SCC(G) is (E+V)
Strongly connected components
● Definition: the strongly connected components
(SCC) C1, …, Ck of a directed graph G = (V,E)
are the largest disjoint sub-graphs (no common
vertices or edges) such that for any two vertices
u and v in Ci, there is a path from u to v and from
v to u.
● Equivalence classes of the binary path(u,v)
relation, denoted by u ~ v.
● Applications: networking, communications.
● Problem: compute the strongly connected
components of G in linear time Θ(|V|+|E|).
Example: strongly connected
components
d
b
f
e
a
c
g
h
Example: strongly connected
components
d
b
f
e
a
c
g
h
Example: transpose graph GT
d
b
f
e
a
c
g
h
d
b
f
e
a
c
g
h
G
GT
Example: SCC graph
C
C
C
C
d
b
f
e
a
c
g
h
Graph Traversals
Graph Traversals
•Both take time: O(V+E)
Graph Searching ???
● Graph as state space (node = state, edge = action)
● For example, game trees, mazes, ...
● BFS and DFS each search the state space for a best move. If
the search is exhaustive they will find the same solution, but if
there is a time limit and the search space is large...
● DFS explores a few possible moves, looking at the effects far
in the future
● BFS explores many solutions but only sees effects in the near
future (often finds shorter solutions)
Minimum Spanning Trees
Problem: Laying Telephone Wire
Central office
Wiring: Naïve Approach
Central office
Expensive!
Wiring: Better Approach
Central office
Minimize the total length of wire connecting the customers
Minimum Spanning Tree (MST)
• it is a tree (i.e., it is acyclic)
• it covers all the vertices V
– contains |V| - 1 edges
• the total cost associated with tree edges is the
minimum among all possible spanning trees
• not necessarily unique
A minimum spanning tree is a subgraph of an
undirected weighted graph G, such that
How Can We Generate a MST?
a
c
e
d
b
2
45
9
6
4
5
5
a
c
e
d
b
2
45
9
6
4
5
5
Minimum Spanning Tree: Prim's
Algorithm
● Prim's algorithm for finding an MST is a
greedy algorithm.
● Start by selecting an arbitrary vertex, include it
into the current MST.
● Grow the current MST by inserting into it the
vertex closest to one of the vertices already in
current MST.
Prim‘s Algorithm
1. All vertices are marked as not visited
2. Any vertex v you like is chosen as starting vertex and
is marked as visited (define a cluster C)
3. The smallest- weighted edge e = (v,u), which connects
one vertex v inside the cluster C with another vertex u
outside of C, is chosen and is added to the MST.
4. The process is repeated until a spanning tree is formed
C
FE
A B
D
5
64
3
4
2
1 2
3
2
C
FE
A B
D
5
64
3
4
2
1 2
3
2
C
FE
A B
D
5
64
3
4
2
1 2
3
2
We could delete these edges
because of Dijkstra„s label D[u] for
each vertex outside of the cluster
C
FE
A B
D
3
4
2
1 2
3
2
C
FE
A B
D
3
2
1 2
3
2
C
FE
A B
D
3
2
1 2
2
3
C
FE
A B
D
3
2
1 2
2
C
FE
A B
D
3
2
1 2
2
C
FE
A B
D
3
2
1 2
2
minimum- spanning tree
Prim’s algorithm
a
c
e
d
b
2
45
9
6
4
5
5
d b c a
4 5 5
Vertex Parent
e -
b e
c e
d e
The MST initially consists of the vertex e, and we update
the distances and parent for its adjacent vertices
Vertex Parent
e -
b -
c -
d -
d b c ae
0
Prim’s algorithm
a
c
e
d
b
2
45
9
6
4
5
5
a c b
2 4 5
Vertex Parent
e -
b e
c d
d e
a d
d b c a
4 5 5
Vertex Parent
e -
b e
c e
d e
Prim’s algorithm
a
c
e
d
b
2
45
9
6
4
5
5
c b
4 5
Vertex Parent
e -
b e
c d
d e
a d
a c b
2 4 5
Vertex Parent
e -
b e
c d
d e
a d
Prim’s algorithm
a
c
e
d
b
2
45
9
6
4
5
5
b
5
Vertex Parent
e -
b e
c d
d e
a d
c b
4 5
Vertex Parent
e -
b e
c d
d e
a d
Prim’s algorithm
Vertex Parent
e -
b e
c d
d e
a d
a
c
e
d
b
2
45
9
6
4
5
5
The final minimum spanning tree
b
5
Vertex Parent
e -
b e
c d
d e
a d
Running time of Prim’s algorithm
(without heaps)
Initialization of priority queue (array): O(|V|)
Update loop: |V| calls
• Choosing vertex with minimum cost edge: O(|V|)
• Updating distance values of unconnected
vertices: each edge is considered only once
during entire execution, for a total of O(|E|)
updates
Overall cost without heaps:
When heaps are used, apply same analysis as for
Dijkstra‟s algorithm (p.469) (good exercise)
O(|E| + |V| 2)
Prim’s Algorithm Invariant
● At each step, we add the edge (u,v) s.t. the weight of
(u,v) is minimum among all edges where u is in the
tree and v is not in the tree
● Each step maintains a minimum spanning tree of the
vertices that have been included thus far
● When all vertices have been included, we have a MST
for the graph!
Correctness of Prim’s
● This algorithm adds n-1 edges without creating a cycle, so
clearly it creates a spanning tree of any connected graph
(you should be able to prove this).
But is this a minimum spanning tree?
Suppose it wasn't.
● There must be point at which it fails, and in particular there
must a single edge whose insertion first prevented the
spanning tree from being a minimum spanning tree.
Correctness of Prim’s
• Let V' be the vertices incident with edges in S
• Let T be a MST of G containing all edges in S, but not (x,y).
• Let G be a connected,
undirected graph
• Let S be the set of
edges chosen by Prim‟s
algorithm before
choosing an errorful
edge (x,y)
x
y
Correctness of Prim’s
x
y
v
w
• There is exactly one edge on this cycle with exactly
one vertex in V’, call this edge (v,w)
• Edge (x,y) is not in T, so
there must be a path in
T from x to y since T is
connected.
• Inserting edge (x,y) into
T will create a cycle
Correctness of Prim’s
● Since Prim’s chose (x,y) over (v,w), w(v,w) >= w(x,y).
● We could form a new spanning tree T’ by swapping (x,y)
for (v,w) in T (prove this is a spanning tree).
● w(T’) is clearly no greater than w(T)
● But that means T’ is a MST
● And yet it contains all the edges in S, and also (x,y)
...Contradiction
Another Approach
a
c
e
d
b
2
45
9
6
4
5
5
• Create a forest of trees from the vertices
• Repeatedly merge trees by adding “safe edges”
until only one tree remains
• A “safe edge” is an edge of minimum weight which
does not create a cycle
forest: {a}, {b}, {c}, {d}, {e}
Kruskal’s algorithm
Initialization
a. Create a set for each vertex v V
b. Initialize the set of “safe edges” A
comprising the MST to the empty set
c. Sort edges by increasing weight
a
c
e
d
b
2
45
9
6
4
5
5
F = {a}, {b}, {c}, {d}, {e}
A =
E = {(a,d), (c,d), (d,e), (a,c),
(b,e), (c,e), (b,d), (a,b)}
Kruskal’s algorithm
For each edge (u,v) E in increasing order
while more than one set remains:
If u and v, belong to different sets U and V
a. add edge (u,v) to the safe edge set
A = A {(u,v)}
b. merge the sets U and V
F = F - U - V + (U V)
Return A
● Running time bounded by sorting (or findMin)
● O(|E|log|E|), or equivalently, O(|E|log|V|) (why???)
Kruskal’s algorithm
E = {(a,d), (c,d), (d,e), (a,c),
(b,e), (c,e), (b,d), (a,b)}
Forest
{a}, {b}, {c}, {d}, {e}
{a,d}, {b}, {c}, {e}
{a,d,c}, {b}, {e}
{a,d,c,e}, {b}
{a,d,c,e,b}
A
{(a,d)}
{(a,d), (c,d)}
{(a,d), (c,d), (d,e)}
{(a,d), (c,d), (d,e), (b,e)}
a
c
e
d
b
2
45
9
6
4
5
5
● After each iteration, every tree in the forest is a MST of the
vertices it connects
● Algorithm terminates when all vertices are connected into one
tree
Kruskal’s Algorithm Invariant
Correctness of Kruskal’s
● This algorithm adds n-1 edges without creating a cycle, so
clearly it creates a spanning tree of any connected graph
(you should be able to prove this).
But is this a minimum spanning tree?
Suppose it wasn't.
● There must be point at which it fails, and in particular there
must a single edge whose insertion first prevented the
spanning tree from being a minimum spanning tree.
Correctness of Kruskal’s
● Let e be this first errorful edge.
● Let K be the Kruskal spanning tree
● Let S be the set of edges chosen by Kruskal’s algorithm
before choosing e
● Let T be a MST containing all edges in S, but not e.
K T
S
e
Correctness of Kruskal’s
Proof (by contradiction):
● Assume there exists some
edge e’ in T - S, w(e’) < w(e)
● Kruskal’s must have
considered e’ before e
K T
S
e
Lemma: w(e’) >= w(e) for all edges e’ in T - S
• However, since e’ is not in K (why??), it must have
been discarded because it caused a cycle with some of
the other edges in S.
• But e’ + S is a subgraph of T, which means it cannot
form a cycle ...Contradiction
Correctness of Kruskal’s
● Inserting edge e into T will create a cycle
● There must be an edge on this cycle which is not in K
(why??). Call this edge e’
● e’ must be in T - S, so (by our lemma) w(e’) >= w(e)
● We could form a new spanning tree T’ by swapping e for e’ in
T (prove this is a spanning tree).
● w(T’) is clearly no greater than w(T)
● But that means T’ is a MST
● And yet it contains all the edges in S, and also e
...Contradiction
Greedy Approach
● Like Dijkstra’s algorithm, both Prim’s and Kruskal’s
algorithms are greedy algorithms
● The greedy approach works for the MST problem;
however, it does not work for many other problems!
That’s All!