Data Structures
Graphs
Introduction
• A Graph is an abstract data structure that is used to implement the graph concept
from mathematics.
• A graph is basically, a collection of vertices (also called nodes) and edges that
connect these vertices.
• A graph is often viewed as a generalization of the tree structure, where instead of
a having a purely parent-to-child relationship between tree nodes, any kind of
complex relationships between the nodes can be represented.
Why graphs are useful?
• Graphs are widely used to model any situation where entities or things are related
to each other in pairs; for example, the following information can be represented
by graphs:
• Family trees in which the member nodes have an edge from parent to
each of their children.
• Transportation networks in which nodes are airports, intersections, ports,
etc. The edges can be airline flights, one-way roads, shipping routes, etc.
Definition
• A graph G is defined as an ordered set (V, E),
• where V(G) represent the set of vertices and
• E(G) represents the edges that connect the vertices.
• The figure given shows a graph with V(G) = { A, B, C, D and E}
and E(G) = { (A, B), (B, C), (A, D), (B, D), (D, E), (C, E) }.
A B C
D E
Note that there are 5 vertices or nodes and 6 edges in the graph.
• A graph can be directed or undirected.
• In an undirected graph, the edges do not have any direction
associated with them. That is, if an edge is drawn between nodes
A and B, then the nodes can be traversed from A to B as well as
from B to A.
Definition
• The given figure shows a directed graph. In a directed graph, edges
form an ordered pair. If there is an edge from A to B, then there is a path
from A to B but not from B to A.
• The edge (A, B) is said to initiate from node A (also known as initial
node) and terminate at node B (terminal node).
A B C
D E
Graph Terminology
• Adjacent Nodes or Neighbors: For every edge, e = (u, v) that
connects nodes u and v; the nodes u and v are the end-points and are
said to be the adjacent nodes or neighbors.
• Degree of a node: Degree of a node u, deg(u), is the total number of
edges containing the node u. If deg(u) = 0, it means that u does not
belong to any edge and such a node is known as an isolated node.
• Regular graph: Regular graph is a graph where each vertex has the
same number of neighbors. That is every node has the same degree.
A regular graph with vertices of degree k is called a k-regular graph
or regular graph of degree k.
Graph Terminology
• Path: A path P, written as P = {v0, v1, v2,….., vn), of length n from a node u to v
is defined as a sequence of (n+1) nodes. Here, u = v0, v = vn and vi-1 is adjacent
to vi for i = 1, 2, 3, …, n.
• Closed path: A path P is known as a closed path if the edge has the same end-
points. That is, if v0 = vn.
• Simple path: A path P is known as a simple path if all the nodes in the path are
distinct with an exception that v0 may be equal to vn. If v0 = vn, then the path
is called a closed simple path.
• Cycle: A path in which the first and the last vertices are same.
• Connected graph: A graph in which there exists a path between any two of its nodes
is called a connected graph. That is to say that there are no isolated nodes in a
connected graph. A connected graph that does not have any cycle is called a tree.
• Complete graph: A graph G is said to be a complete, if all its nodes are fully
connected, that is, there is a path from one node to every other node in the graph. A
complete graph has n(n-1)/2 edges, where n is the number of nodes in G.
Graph Terminology
• Labeled graph or weighted graph: A graph is said to be labeled if every
edge in the graph is assigned some data. In a weighted graph, the edges of
the graph are assigned some weight or length. Weight of the edge, denoted
by w(e) is a positive value which indicates the cost of traversing the edge.
• Multiple edges: Distinct edges which connect the same end points are called
multiple edges.
That is, e = (u, v) and e’ = (u, v) are known as multiple edges of G.
• Loop: An edge that has identical end-points is called a loop.
That is, e = (u, u).
• Multi- graph: A graph with multiple edges and/or a loop is called a multi-
graph.
• Size of the graph: The size of a graph is the total number of edges in it.
Graph Terminology
Examples:
<A, B, C, D, B, A, C> is a walk. B
<A, B, D> is a path. A
D
<A, B, D, C, A> is a cycle. C
<B, C, D, A> is not a walk. Why?
Directed Graph
• A directed graph G, also known as a digraph, is a graph in which
every edge has a direction assigned to it. An edge of a directed
graph is given as an ordered pair (u, v) of nodes in G. For an edge
(u, v).
• The edge begins at u and terminates at v
• u is known as the origin or initial point of e. Correspondingly, v is
known as the destination or terminal point of e
• u is the predecessor of v. Correspondingly, v is the successor of u
nodes u and v are adjacent to each other.
A B C
D E
Terminology of a Directed Graph
• Out-degree of a node: The out degree of a node u, written as
outdeg(u), is the number of edges that originate at u.
• In-degree of a node: The in degree of a node u, written as indeg(u), is
the number of edges that terminate at u.
• Degree of a node: Degree of a node written as deg(u) is equal to the
sum of in-degree and out-degree of that node. Therefore, deg(u) =
indeg(u) + outdeg(u)
• Isolated vertex: A vertex with degree zero. Such a vertex is not an
end-point of any edge.
• Pendant vertex: (also known as leaf vertex) A vertex with degree one.
• Cut vertex: A vertex which when deleted would disconnect the
remaining graph.
• Source: A node u is known as a source if it has a positive out-degree
but an in-degree = 0.
• Sink: A node u is known as a sink if it has a positive in degree but a
zero out-degree.
Terminology of a Directed Graph
• Reachability: A node v is said to be reachable from node u, if and
only if there exists a (directed) path from node u to node v.
• Strongly connected directed graph: A digraph is said to be strongly
connected if and only if there exists a path from every pair of nodes
in G. That is, if there is a path from node u to v, then there must be a
path from node v to u.
• Unilaterally connected graph: A digraph is said to be unilaterally
connected if there exists a path from any pair of nodes u, v in G such
that there is a path from u to v or a path from v to u but not both.
• Parallel / Multiple edges: Distinct edges which connect the same
end points are called multiple edges. That is, e = (u, v) and e’ = (u, v)
are known as multiple edges of G.
Graph Implementation
Suppose there are n vertices in the graph. This gives us the vertex set {1, …, n}.
Graphs can be implemented using the following methods:
1. Adjacency Matrix representation
Consider a graph G(V,E) having n nodes.
• We store an n × n size 2D array A (a matrix, essentially).
• Ai,j = 1 if there is an edge between i and j in the graph.
• Ai,j = 0 if there is no edge between i and j in the graph.
• Size of the representation is O(n2).
2. Path Matrix
Consider a graph G(V,E) having n nodes. The path matrix of graph G is defined as nxn matrix P, where
i. Pi,j =1 If there is a path from Vi,to Vj
ii. Pi,j =0 If there is no path from Vi,to Vj
P can be deduced using adjacency matrix A as depicted below:
P=A+A^2+A^3+ . . . +A^n
3. Adjacency List representation
• We store a linked list for every vertex.
• The linked list for vertex i consists of the neighbors of i in the graph.
• Size of the representation is O(n + m) where m is the number of edges in the graph.
Representations of Graphs
2
1
4
3
n=4
Adjacency matrix: Adjacency list:
Aij 1 2 3 4 1 2 3 X
1 1 1 1 0
2
2 3 4 X
0 0 1 1
3 0 0 0 1
3 4 X
4 0 0 0 0
4 X
Graph Adjacency Matrix
representation
Graph Adjacency List representation
Terminology of a Directed Graph
• Simple directed graph: A directed graph G is said to be a simple
directed graph if and only if it has no parallel edges. However, a
simple directed graph may contain cycle with an exception that it
cannot have more than one loop at a given node.
• Transitive Closure of a Directed graph : A transitive closure of a
graph is constructed to answer reachability questions.
That is, is there a path from a node A to node E in one or more hops? A
binary relation indicates only whether the node A is connected to node B, whether
node B is connected to node C, etc.
A graph G
Transitive closure G*
Transitive Closure of a Directed Graph
• Definition: For a directed graph G = (V,E), where V is the set of vertices and E is
the set of edges, the transitive closure of G is a graph G* = (V,E*).
• In G*, for every vertex pair v, w in V there is an edge (v, w) in E* if and
only if there is a valid path from v to w in G.
• Why and where is it needed? Finding the transitive closure of a directed graph is
an important problem in many computational tasks that listed below.
➢ Transitive closure is used to find the reachability analysis of transition
networks representing distributed and parallel systems
➢ It is used in the construction of parsing automata in compiler construction
➢ Recently, transitive closure computation is being used to evaluate recursive
database queries (because almost all practical recursive queries are transitive
in nature).
Transitive Closure of a Directed Graph
Algorithm:
• In order to determine the transitive closure of a graph, we define a
matrix T where , (for i, j, k = 1, 2, 3, …n) if there exists a path in G
from the vertex i to vertex j with intermediate vertices in the set (1,
2, 3, .., k) and 0 otherwise. That is, G* is constructed by adding an
edge (i, j) into E* if and only if .
T=
Transitive Closure of a Directed Graph
T0ij = 0 if (i, j) is not in E When k ≥ 1 Tkij = Tk-1ij OR (Tk-1ik AND T k-1 kj)
When k = 0
1 if (i, j) is in E
Transitive_Closure( A, t, n)
Step 1: SET i=1, j=1, k=1
Step 2: Repeat Steps 3 and 4 while i<=n
Step 3: Repeat Step 4 while j<=n
Step 4: IF (A[i][j] = 1), then
SET t[i][j] = 1
ELSE
SET t[i][j] = 0
INCREMENT j
INCREMENT i
Step 5: Repeat Steps 6 to 11 while k<=n
Step 6: Repeat Step 7 to 10 while i<=n
Step 7: Repeat Step 8 and 9 while j<=n
Step 8: SET t[i,j] = t[i][j] OR ( t[i][k] AND t[k][j])
Step 9: INCREMENT j
Step 10: INCREMENT i
Step 11: INCREMENT k
Step 12: END
Bi Connected Graphs
• A vertex v of G is called an articulation point if removing v along
with the edges incident to v, results in a graph that has at least
two connected components.
• A bi-connected graph is defined as a connected graph that has
no articulation vertices. That is, a bi-connected graph is connected
and non-separable, in the sense that even if we remove any vertex
from the graph, the resultant graph is still connected.
• A bi-connected undirected graph is a connected graph that is not
broken into disconnected pieces by deleting any single vertex
• In a bi-connected directed graph for any two vertices v and w there
are two directed paths from v to w which have no vertices in
common other than v and w.
Bi Connected Graphs
A H
A H
B C I
B I
D J
D J
E F
E F
Note that the graph shown on the left is not a bi-connected graph as
deleting vertex C from the graphs results in two connected
components of the original graph (figure on right)
Bi Connected Graphs
A B
Bi-connected graph. Removal of any vertex from the graph will not disconnect
the graph
D C
Like graphs, there is a related concept for edges. An edge in a graph is called a bridge, if
removing that edge results in a disconnected graph. Also, an edge on the graph that does not
lie on a cycle is a bridge. This means that a bridge has at least one articulation point at
its end, although it is not necessary that the articulation point is linked in a bridge.
A B
C D E
In this graph:
D is a articulation point
CD and DE are bridges.
Graph Traversal Algorithms
• By traversing a graph, we mean the method of examining the nodes
and edges of the graph.
• There are two standard methods of graph traversal which we will
discuss in this section. These two methods are:
– Breadth first search
– Depth first search
• While breadth first search will use a queue as an auxiliary data
structure to store nodes for further processing, the depth-first search
scheme will use a stack.
Graph Traversal Algorithms
•But both these algorithms will make use of a variable STATUS. During the
execution of the algorithm, every node in the graph will have the variable STATUS
set to 1, 2 or depending on its current state. Below table shows the value of status and
its significance.
STATE of the
STATUS DESCRIPTION
NODE
1 Ready The initial state of the node N
Node N is placed on the queue or stack and
2 Waiting
waiting to be processed
3 Processed Node N has been completely processed
Breadth First Search
• Breadth-first search (BFS) is a graph search algorithm that begins at
the root node and explores all the neighboring nodes. Then for each
of those nearest nodes, the algorithm explores their unexplored
neighbor nodes, and so on, until it finds the goal.
• That is, we start examining the node A and then all the neighbors of
A are examined. In the next step we examine the neighbors of
neighbors of A, so on and so forth
Algorithm for breadth-first search in a graph G beginning at a starting node A
Step 1: SET STATUS = 1 (ready state) for each node in G.
Step 2: Enqueue the starting node A and set its STATUS = 2 (waiting state)
Step 3: Repeat Steps 4 and 5 until QUEUE is empty
Step 4: Dequeue a node N. Process it and set its STATUS = 3 (processed state).
Step 5: Enqueue all the neighbors of N that are in the ready state
(whose STATUS = 1) and set their STATUS = 2 (waiting state)
[END OF LOOP]
Step 6: EXIT
Breadth First Search
Example: Consider the graph G given below. The adjacency list of G is also given. Assume that G
represents the daily flights between different cities and we want to fly from city A to I with minimum
[Link] is, find the minimum path P from A to I given that every edge has length = 1.
A
•The minimum path P can be found by applying the breadth-
first search algorithm that begins at city A and ends when I is
encountered.
•During the execution of the algorithm, we use two arrays:
B C D
QUEUE and ORIG.
E F G •While QUEUE is used to hold the nodes that have to be
processed, ORIG is used to keep track of the origin of each
H I edge.
•Initially, FRONT = REAR = –1.
Adjacency Lists
A: B, C, D
B: E The algorithm for this is as follows:
C: B, G
D: C, G a)Initially add A to QUEUE and add NULL to ORIG, so
E: C, F
FRONT = 0 QUEUE = A
F: C, H
G: F, H, I
H: E, I REAR = 0 ORIG = \0
I: F
Breadth First Search
Dequeue a node by setting FRONT = FRONT + 1 (remove the FRONT element of
QUEUE) and enqueue the neighbors of A. Also add A as the ORIG of its neighbors, so
FRONT = 1 QUEUE = A B C D
REAR = 3 ORIG = \0 A A A
Dequeue a node by setting FRONT = FRONT + 1 and enqueue the neighbors of B. Also
add B as the ORIG of its neighbors, so
FRONT = 2 QUEUE = A B C D E
REAR = 4 ORIG = \0 A A A B
Dequeue a node by setting FRONT = FRONT + 1 and enqueue the neighbors of C. Also
add C as the ORIG of its neighbors. Note that C has two neighbors B and G. Since B has
already been added to the queue and it is not in the Ready state, we will not add B and add
only G, so FRONT = 3 QUEUE = A B C D E G
REAR = 5 ORIG = \0 A A A B C
Dequeue a node by setting FRONT = FRONT + 1 and enqueue the neighbors of D. Also add D as the
ORIG of its neighbors. Note that D has two neighbors C and G. Since both of them have already been
added to the queue and they are not in the Ready state, we will not add them again, so
FRONT = 4 QUEUE = A B C D E G
REAR = 5 ORIG = \0 A A A B C
Breadth First Search
Dequeue a node by setting FRONT = FRONT + 1 and enqueue the neighbors of E. Also add E as the
ORIG of its neighbors. Note that E has two neighbors C and F. Since C has already been added to the
queue and it is not in the Ready state, we will not add C and add only F, so
FRONT = 5 QUEUE = A B C D E G F
REAR = 6 ORIG = \0 A A A B C E
Dequeue a node by setting FRONT = FRONT + 1 and enqueue the neighbors of G. Also add G as the
ORIG of its neighbors. Note that G has three neighbors F, H and I.
FRONT = 6 QUEUE = A B C D E G F H I
REAR = 8 ORIG = \0 A A A B C E G G
Since I is our final destination, we stop the execution of this algorithm as soon as it is
encountered and added to the QUEUE.
Now backtrack from I using ORIG to find the minimum path P.
thus, we have P as A -> C -> G -> I.
Depth First Search Algorithm
•The Depth First Search algorithm progresses by expanding the starting node of G and thus
going deeper and deeper until a goal node is found, or until a node that has no children is
encountered. When a dead- end is reached, the algorithm backtracks, returning to the most
recent node that has not been completely explored.
•In other words, the Depth- First Search algorithm begins at a starting node A which becomes
the current node. Then it examines each node N along a path P which begins at A. That is, we
process a neighbor of A, then a neighbor of neighbor of A and so on. During the execution of
the algorithm, if we reach a path that has a node N that has already been processed, then we
backtrack to the current node. Otherwise, the un-visited (un-processed node) becomes the
current node.
Depth First Search Algorithm
Algorithm for depth-first search in a graph G beginning at a starting node A
Step 1: SET STATUS = 1 (ready state) for each node in G.
Step 2: Push the starting node A on the stack and set its STATUS = 2
(waiting state)
Step 3: Repeat Steps 4 and 5 until STACK is empty
Step 4: Pop the top node N. Process it and set its STATUS = 3
(processed state)
Step 5: Push on to the stack all the neighbors of N that are in
the ready state (whose STATUS = 1) and set their
STATUS = 2 (waiting state)
[END OF LOOP]
Step 6: EXIT
Depth First Search Algorithm
Example: Consider the graph G given [Link] adjacency list of G is also given.
Suppose we want to print all nodes that can be reached from the node H. One
alternative is to use a Depth- First Search of G starting at node H. the procedure can be
explained as below.
A
Adjacency Lists
A: B, C, D
B: E
B C D
C: B, G
D: C, G
E: C, F
F: C, H
E F G G: F, H, I
H: E, I
I: F
H I
Depth First Search Algorithm
a)Push H on to the stack
STACK: H
Pop and Print the top element of the STACK, that is, H. Push all the neighbors of H on to the
stack that are in the ready state. The STACK now becomes:
PRINT: H STACK: E, I
Pop and Print the top element of the STACK, that is, I. Push all the neighbors of I on to the
stack that are in the ready state. The STACK now becomes:
PRINT: I STACK: E, F
Pop and Print the top element of the STACK, that is, F. Push all the neighbors of F on to the
stack that are in the ready state. (Note F has two neighbors C and H. but only C will be
added as H is not in the ready state). The STACK now becomes:
PRINT: F STACK: E, C
Pop and Print the top element of the STACK, that is, C. Push all the neighbors of C on to
the stack that are in the ready state. The STACK now becomes:
Depth First Search Algorithm
PRINT: C STACK: E, B, G
Pop and Print the top element of the STACK, that is, G. Push all the neighbors of G on to the
stack that are in the ready state. Since there are no neighbors of G that are in the ready state
no push operation is performed. The STACK now becomes:
PRINT : G
STACK: E, B
Pop and Print the top element of the STACK, that is, B. Push all the neighbors of B on to the
stack that are in the ready state. Since there are no neighbors of B that are in the ready state
no Push operation is performed. The STACK now becomes:
PRINT : B
STACK: E
Pop and Print the top element of the STACK, that is, E. Push all the neighbors of E on to the
stack that are in the ready state. Since there are no neighbors of E that are in the ready state no
Push operation is performed. The STACK now becomes empty: PRINT : E
Since the STACK is now empty, the depth-first search of G starting at node H is complete and
the nodes which were printed are-H, I, F, C, G, B E.
• These are the nodes which are reachable from the node H.
• Consider the Graphs given below. Find out its depth-first (DFS)and breadth-
first (BFS) traversal scheme.
Prob1: Prob2:
•Applications of Breadth-First Search Algorithm
Breadth-first search can be used to solve many problems such as:
➢Finding all connected components in a graph G.
➢Finding all nodes within an individual connected component.
➢Finding the shortest path between two nodes u and v, of an unweighted graph.
➢Finding the shortest path between two nodes, u and v, of a weighted graph.
•Applications of Depth-First Search Algorithm
Depth-first search is useful for:
➢Finding a path between two specified nodes, u and v, of an unweighted graph.
➢Finding a path between two specified nodes, u and v, of a weighted graph.
➢Finding whether a graph is connected or not.
➢Computing the spanning tree of a connected graph.
MINIMUM SPANNING TREE
• Given a connected and undirected graph, a spanning tree of that graph is a subgraph that
is a tree and connects all the vertices together. A single graph can have many different
spanning trees.
• A minimum spanning tree (MST) or minimum weight spanning tree for a weighted,
connected, undirected graph is a spanning tree with a weight less than or equal to the
weight of every other spanning tree.
• The weight of a spanning tree is the sum of weights given to each edge of the spanning
tree. A minimum spanning tree has (V – 1) edges where V is the number of vertices in the
given graph.
Applications of Minimum Spanning Tree:
Network design.– telephone, electrical, hydraulic, TV cable, computer, road
• The standard application is to a problem like phone network design. You have a business
with several offices; you want to lease phone lines to connect them up with each other;
and the phone company charges different amounts of money to connect different pairs of
cities. You want a set of lines that connects all your offices with a minimum total cost. It
should be a spanning tree, since if a network isn’t a tree you can always remove some
edges and save money.
• max bottleneck paths, Cluster analysis
• LDPC codes for error correction
• learning salient features for real-time face verification
• reducing data storage in sequencing amino acids in a protein
• model locality of particle interactions in turbulent fluid flows
• autoconfig protocol for Ethernet bridging to avoid cycles in a network
steps for finding MST using Kruskal’s algorithm
1. Sort all the edges in non-decreasing order of their weight.
2. Pick the smallest edge. Check if it forms a cycle with the spanning tree formed so far.
If cycle is not formed, include this edge. Else, discard it.
3. Repeat step#2 until there are (V-1) edges in the spanning tree.
The graph contains 9 vertices and 14 edges. So, the minimum spanning tree formed will be
having (9 – 1) = 8 edges.
Weight Src Dest
After sorting:
1 7 6
2 8 2
2 6 5
4 0 1
4 2 5
6 8 6
7 2 3
7 7 8
8 0 7
8 1 2
9 3 4
10 5 4
11 1 7
14 3 5
Now pick all edges one by one from the sorted list of edges
1. Pick edge 7-6: No cycle is formed, include it.
2. Pick edge 8-2: No cycle is formed, include it.
3. Pick edge 6-5: No cycle is formed, include it.
5. Pick edge 2-5: No cycle is formed, include it.
4. Pick edge 0-1: No cycle is formed, include it
6. Pick edge 8-6: Since including this edge results in the cycle, discard it.
7. Pick edge 2-3: No cycle is formed, include it.
8. Pick edge 7-8: Since including this edge results in the cycle, discard it.
9. Pick edge 0-7: No cycle is formed, include it.
10. Pick edge 1-2: Since including this edge results in the cycle, discard it.
11. Pick edge 3-4: No cycle is formed, include it.
Since the number of edges[8] included equals (V – 1=9-1=8), the algorithm stops here.