DSA-UNIT V Lecture Notes
DSA-UNIT V Lecture Notes
UNIT V (Graphs)
Topics to be covered:
Graph Terminology, Graph Traversal, Topological sorting - Minimum spanning tree –
Prim’s Algorithm - Kruskal’s Algorithm - Network flow problem - Shortest Path Algorithm:
Dijkstra - Graph Search: Depth First Search, Breadth First Search - Hashing: Hash
functions, Collision avoidance, Separate chaining - Open addressing: Linear probing,
Quadratic Probing, Double hashing, Rehashing, Extensible Hashing
Graph Terminology
Graph is defined as an ordered set <V, E>, where V(G) represents the set of vertices in the graph G
and E(G) represents the set of Edges in the graph G. Before proceeding further, let's discuss some
frequent terms used in graph concept –
● Vertex
Each node of the graph is represented as a vertex. In example given below, labelled circle rep
● Edge
Edge represents a path between two vertices or a line between two vertices. In example given b
● Adjacency
Two node or vertices are adjacent if they are connected to each other through an edge. In
● Path
Path represents a sequence of edges between two vertices. In example given below, ABCD rep
1
Data Structures and Algorithms18CSC201J
Fig.1: Graph
A graph is a pictorial representation of a set of objects where some pairs of objects are connected
by links. The interconnected objects are represented by points termed as vertices, and the links that
connect the vertices are called edges.
Formally, a graph is a pair of sets (V, E), where V is the set of vertices and E is the set of edges,
connecting the pairs of vertices. Take a look at the following graph –
Fig.2: Graph
In the above graph,
V = {a, b, c, d, e}
E = {ab, ac, bd, cd, de}
Basic Operations
Following are basic primary operations of a Graph which are following.
● Add Vertex add a vertex to a graph.
● Add Edge add an edge between two vertices of a graph.
● Display Vertex display a vertex of a graph.
Graph Traversal
1. Depth First Traversal
Depth First Search algorithm(DFS) traverses a graph in a depth ward motion and uses a stack to
remember to get the next vertex to start a search when a dead end occurs in any iteration.
2
Data Structures and Algorithms18CSC201J
As in example given above, DFS algorithm traverses from A to B to C to D first then to E, then to F
and lastly to G. It employs following rules.
● Rule 1
Visit adjacent unvisited vertex. Mark it visited. Display it. Push it in a stack.
● Rule 2
If no adjacent vertex found, pop up a vertex from stack. (It will pop up all the vertices from t
3
Data Structures and Algorithms18CSC201J
Mark S as visited and put it onto the
stack. Explore any unvisited
adjacent node from S. We have
4
Data Structures and Algorithms18CSC201J
As C does not have any unvisited adjacent node so we keep popping the stack until we find a node
which has unvisited adjacent node. In this case, there's none and we keep popping until stack is
empty.
● Rule 2 If no adjacent vertex found, remove the first vertex from queue.
● Rule 3 Repeat Rule 1 and Rule 2 until queue is empty.
5
Data Structures and Algorithms18CSC201J
it.
6
Data Structures and Algorithms18CSC201J
Topological sort
Topological sort: an ordering of the vertices in a directed acyclic graph, such that:
If there is a path from u to v, then v appears after u in the ordering.
7
Data Structures and Algorithms18CSC201J
Types of graphs:
The graphs should be directed: otherwise for any edge (u,v) there would be a path from u to v and
also from v to u, and hence they cannot be ordered.
The graphs should be acyclic: otherwise for any two vertices u and v on a cycle u would precede v
and v would precede u. The ordering may not be unique:
V1, V2, V3, V4 and V1, V3, V2, V4 are legal orderings
Outdegree of a vertex U: the number of edges (U,V) - outgoing edges
Indegree of a vertex U: the number of edges (V,U) - incoming edges
Degree of a vertex U: total number of edges associated with u.
The algorithm for topological sort uses "indegrees" of vertices.
Algorithm
1. Compute the indegrees of all vertices
2. Find a vertex U with indegree 0 and print it (store it in the ordering)
If there is no such vertex then there is a cycle and the vertices cannot be ordered. Stop.
3. Remove U and all its edges (U,V) from the graph.
4. Update the indegrees of the remaining vertices.
5. Repeat steps 2 through 4 while there are vertices to be processed.
Example
8
Data Structures and Algorithms18CSC201J
Fig.6: Graph
1. Compute the indegrees: V1: 0, V2: 1, V3: 2, V4: 2, V5: 2
2. Find a vertex with indegree 0: V1
3. Output V1, remove V1 and update the indegrees:
Sorted: V1
Remove edges: (V1,V2) , (V1, V3) and (V1,V4)
Updated indegrees: V2: 0, V3: 1, V4: 1, V5: 2
The process is depicted in the following table:
Indegree
V1 0
V2 1 0
V3 2 1 1 0
V4 2 1 0
V5 2 2 1 0 0
9
Data Structures and Algorithms18CSC201J
We now understand that one graph can have more than one spanning trees. Below are few properties is spann
11
Data Structures and Algorithms18CSC201J
12
Data Structures and Algorithms18CSC201J
The least cost is 2 and edges involved are B,D and D,T so we add them. Adding them does not
violate spanning tree properties so we continue to our next edge selection.
Next cost is 3, and associated edges are A,C and C,D. So we add them –
13
Data Structures and Algorithms18CSC201J
Prims Algorithm
Prim's algorithm to find minimum cost spanning tree (as Kruskal's algorithm) uses greedy
approach. Prim's algorithm shares similarity with shortest path first algorithms. Prim's algorithm,
in contrast with Kruskal's algorithm, treats the nodes as a single tree and keeps on adding new
nodes to the spanning tree from the given graph.
To contrast with Kruskal's algorithm and to understand Prim's algorithm better, we shall use the
same example –
14
Data Structures and Algorithms18CSC201J
15
Data Structures and Algorithms18CSC201J
16
Data Structures and Algorithms18CSC201J
c) Update key value of all adjacent vertices of u. To update the key values, iterate through
all adjacent vertices. For every adjacent vertex v, if weight of edge u-v is less than the
previous key value of v, update the key value as weight of u-v
Augmenting Path
The idea behind the algorithm is as follows: as long as there is a path from the source (start node) to
the sink (end node), with available capacity on all edges in the path, we send flow along one of the
paths. Then we find another path, and so on. A path with available capacity is called an augmenting
path.
Ford-Fulkerson Algorithm
The following is simple idea of Ford-Fulkerson algorithm:
1) Start with initial flow as 0.
2) While there is a augmenting path from source to sink.
Add this path-flow to flow.
3) Return flow.
Example:
In the following graph each edge is labelled with <allocated flow, maximum allowed flow>
17
Data Structures and Algorithms18CSC201J
18
Data Structures and Algorithms18CSC201J
19
Data Structures and Algorithms18CSC201J
Fig.32: No more augmenting path can be found. Label all the vertices that can be reached via non-
saturated edges as “belonging to the source s”.
Fig.34: The edges on the boundary of this labelling form a minimum s-t cut.
As we can find from above figure that maximum possible flow is 40 for the given network graph.
The edges are associated with source‘s’ and sink ‘t’ having the flow is 40.
20
Data Structures and Algorithms18CSC201J
Shortest Path Algorithm
Dijkstra's algorithm - is a solution to the single-source shortest path problem in graph theory.
Works on both directed and undirected graphs. However, all edges must have non-negative
weights.
Input: Weighted graph G={E,V} and source vertex v∈V, such that all edge weights are
nonnegative
Output: Lengths of shortest paths (or the shortest paths themselves) from a given source vertex
s∈V to all other vertices
Approach:
⚫ The algorithm computes for each vertex u the distance to u from the start vertex v, that is,
the weight of a shortest path between v and u.
⚫ the algorithm keeps track of the set of vertices for which the distance has been computed,
called the cloud C
⚫ Every vertex has a label D associated with it. For any vertex u, D[u] stores an
approximation of the distance between v and u. The algorithm will update a D[u] value
when it finds a shorter path from v to u.
⚫ When a vertex u is added to the cloud, its label D[u] is equal to the actual (final) distance
between the starting vertex v and vertex u.
Algorithm:
21
Data Structures and Algorithms18CSC201J
Example:
Fig.36: Updating the distance of neighbour vertices and set of discovered vertices
22
Data Structures and Algorithms18CSC201J
Fig.37: Updating the distance of neighbour vertices and set of discovered vertices
Fig.38: Updating the distance of neighbour vertices and set of discovered vertices
23
Data Structures and Algorithms18CSC201J
Fig.39: Updating the distance of neighbour vertices and set of discovered vertices
Fig.40: Updating the distance of neighbour vertices and set of discovered vertices
24
Data Structures and Algorithms18CSC201J
Fig.41: Updating the distance of neighbour vertices and set of discovered vertices
Fig.42: Updating the distance of neighbour vertices and set of discovered vertices. As all vertices
have been discovered from Q and added to S.
Hashing
Hash Table
Hash Table is a data structure which store data in associative manner. In hash table, data is stored in
array format where each data value has its own unique index value. Access of data becomes very
fast if we know the index of desired data.
25
Data Structures and Algorithms18CSC201J
Thus, it becomes a data structure in which insertion and search operations are very fast irrespective
of size of data. Hash Table uses array as a storage medium and uses hash technique to generate
index where an element is to be inserted or to be located from.
Hashing
Hashing is a technique to convert a range of key values into a range of indexes of an array. We're
going to use modulo operator to get a range of key values. Consider an example of hash table of
size 20, and following items are to be stored. Item are in (key,value) format.
● (1,20)
● (2,70)
● (42,80)
● (4,25)
● (12,44)
● (14,32)
● (17,11)
● (13,78)
● (37,98)
S.
Key Hash Array Index
No.
1 1 1%20 = 1 1
2 2 2%20 = 2 2
3 42 42%20 = 2 2
26
Data Structures and Algorithms18CSC201J
4 4 4%20 = 4 4
5 12 12%20 = 12 12
6 14 14%20 = 14 14
7 17 17%20 = 17 17
8 13 13%20 = 13 13
9 37 37%20 = 17 17
1 1 1%20 = 1 1 1
2 2 2%20 = 2 2 2
3 42 42%20 = 2 2 3
4 4 4%20 = 4 4 4
5 12 12%20 = 12 12 12
6 14 14%20 = 14 14 14
7 17 17%20 = 17 17 17
8 13 13%20 = 13 13 13
9 37 37%20 = 17 17 18
27
Data Structures and Algorithms18CSC201J
delete delete an element from a hash table.
DataItem
Define a data item having some data, and key based on which search is to be conducted in hash
table.
struct DataItem
{
int data;
int key;
};
Hash Method
Define a hashing method to compute the hash code of the key of the data item.
int hashCode(int key)
{
return key % SIZE;
}
Search Operation
Whenever an element is to be searched. Compute the hash code of the key passed and locate the
element using that hash code as index in the array. Use linear probing to get element ahead if
element not found at computed hash code.
DataItem *search(int key)
{
//get the hash
int hashIndex = hashCode(key);
//move in array until an empty
while(hashArray[hashIndex] != NULL)
{
if(hashArray[hashIndex]->key == key)
return hashArray[hashIndex];
//go to next cell
++hashIndex;
//wrap around the table
hashIndex %= SIZE;
}
return NULL;
28
Data Structures and Algorithms18CSC201J
}
Insert Operation
Whenever an element is to be inserted. Compute the hash code of the key passed and locate the
index using that hashcode as index in the array. Use linear probing for empty location if an element
is found at computed hash code.
void insert(int key,int data)
{
struct DataItem *item = (struct DataItem*) malloc(sizeof(struct DataItem));
item->data = data;
item->key = key;
//get the hash
int hashIndex = hashCode(key);
//move in array until an empty or deleted cell
while(hashArray[hashIndex] != NULL && hashArray[hashIndex]->key != -1)
{
//go to next cell
++hashIndex;
//wrap around the table
hashIndex %= SIZE;
}
hashArray[hashIndex] = item;
}
Delete Operation
Whenever an element is to be deleted. Compute the hash code of the key passed and locate the
index using that hashcode as index in the array. Use linear probing to get element ahead if an
element is not found at computed hash code. When found, store a dummy item there to keep
performance of hashtable intact.
struct DataItem* delete(struct DataItem* item)
{
int key = item->key;
//get the hash
int hashIndex = hashCode(key);
//move in array until an empty
while(hashArray[hashIndex] !=NULL)
29
Data Structures and Algorithms18CSC201J
{
if(hashArray[hashIndex]->key == key)
{
struct DataItem* temp = hashArray[hashIndex];
//assign a dummy item at deleted position
hashArray[hashIndex] = dummyItem;
return temp;
}
//go to next cell
++hashIndex;
//wrap around the table
hashIndex %= SIZE;
}
return NULL;
}
Collision
Since a hash function gets us a small number for a key which is a big integer or string, there is
possibility that two keys result in same value. The situation where a newly inserted key maps to an
already occupied slot in hash table is called collision and must be handled using some collision
handling technique.
Collisions are very likely even if we have big table to store keys. An important observation is
Birthday Paradox. With only 23 persons, the probability that two people have same birthday is
50%.
There are mainly two methods to handle collision:
1) Separate Chaining
2) Open Addressing
Separate Chaining:
The idea is to make each cell of hash table point to a linked list of records that have same hash
function value.
Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92,
73, 101.
30
Data Structures and Algorithms18CSC201J
31
Data Structures and Algorithms18CSC201J
Insert can insert an item in a deleted slot, but search doesn’t stop at a deleted slot.
Open Addressing is done following ways:
a) Linear Probing:
In linear probing, we linearly probe for next slot. For example, typical gap between two probes is 1
as taken in below example also.
let hash(x) be the slot index computed using hash function and S be the table size
If slot hash(x) % S is full, then we try (hash(x) + 1) % S
If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92,
73, 101.
b) Quadratic Probing
We look for i2‘th slot in i’th iteration.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
32
Data Structures and Algorithms18CSC201J
c) Double Hashing
We use another hash function hash2(x) and look for i*hash2(x) slot in i’th rotation.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S
Comparison of above three:
Linear probing has the best cache performance, but suffers from clustering. One more advantage of
linear probing is easy to compute.
Quadratic probing lies between the two in terms of cache performance and clustering.
Double hashing has poor cache performance but no clustering. Double hashing requires more
computation time as two hash functions need to be computed.
Open Addressing vs. Separate Chaining
Advantages of Chaining:
1) Chaining is Simpler to implement.
2) In chaining, Hash table never fills up, we can always add more elements to chain. In open
addressing, table may become full.
3) Chaining is Less sensitive to the hash function or load factors.
4) Chaining is mostly used when it is unknown how many and how frequently keys may be inserted
or deleted.
5) Open addressing requires extra care for to avoid clustering and load factor.
Advantages of Open Addressing
1) Cache performance of chaining is not good as keys are stored using linked list. Open addressing
provides better cache performance as everything is stored in same table.
2) Wastage of Space (Some Parts of hash table in chaining are never used). In Open
addressing, a slot can be used even if an input doesn’t map to it.
3) Chaining uses extra space for links.
Rehashing:
• It is a closed hashing technique.
• If the table gets too full, then the rehashing method builds new table that is about twice as big and
scan down the entire original hash table, comparing the new hash value for each element and
inserting it in the new table.
• Rehashing is very expensive since the running time is O(N), since there are N elements to rehash
and the table size is roughly 2N.
Rehashing can be implemented in several ways like
1. Rehash, as soon as the table is half full
33
Data Structures and Algorithms18CSC201J
2. Rehash only when an insertion fails
Example: Suppose the elements 13, 15, 24, 6 are inserted into an open addressing hash table of size
7 and if linear probing is used when collision occurs.
0 6
1 15
3 24
6 13
1 15
2 23
3 24
6 13
34
Data Structures and Algorithms18CSC201J
5
6 6
7 23
8 24
10
11
12
13 13
14
15 15
16
Extendible Hashing
Extendible Hashing is a dynamic hashing method wherein directories, and buckets are used to hash
data. It is an aggressively flexible method in which the hash function also experiences dynamic
changes.
Main features of Extendible Hashing: The main features in this hashing technique are:
● Directories: The directories store addresses of the buckets in pointers. An id is assigned to
each directory which may change each time when Directory Expansion takes place.
● Buckets: The buckets are used to hash the actual data.
Basic Structure of Extendible Hashing:
35
Data Structures and Algorithms18CSC201J
Fig.46
Frequently used terms in Extendible Hashing:
● Directories: These containers store pointers to buckets. Each directory is given a unique id
which may change each time when expansion takes place. The hash function returns this
directory id which is used to navigate to the appropriate bucket. Number of Directories =
2^Global Depth.
● Buckets: They store the hashed keys. Directories point to buckets. A bucket may contain
more than one pointers to it if its local depth is less than the global depth.
● Global Depth: It is associated with the Directories. They denote the number of bits which are
used by the hash function to categorize the keys. Global Depth = Number of bits in directory
id.
● Local Depth: It is the same as that of Global Depth except for the fact that Local Depth is
associated with the buckets and not the directories. Local depth in accordance with the global
depth is used to decide the action that to be performed in case an overflow occurs. Local
Depth is always less than or equal to the Global Depth.
● Bucket Splitting: When the number of elements in a bucket exceeds a particular size, then
the bucket is split into two parts.
● Directory Expansion: Directory Expansion Takes place when a bucket overflows. Directory
Expansion is performed when the local depth of the overflowing bucket is equal to the global
depth.
Basic Working of Extendible Hashing:
36
Data Structures and Algorithms18CSC201J
● Step 1 – Analyze Data Elements: Data elements may exist in various forms eg. Integer,
String, Float, etc.. Currently, let us consider data elements of type integer. eg: 49.
● Step 2 – Convert into binary format: Convert the data element in Binary form. For string
elements, consider the ASCII equivalent integer of the starting character and then convert the
integer into binary form. Since we have 49 as our data element, its binary form is 110001.
● Step 3 – Check Global Depth of the directory. Suppose the global depth of the Hash-
directory is 3.
● Step 4 – Identify the Directory: Consider the ‘Global-Depth’ number of LSBs in the binary
number and match it to the directory id. For example binary obtained is: 110001 and the
global-depth is 3. So, the hash function will return 3 LSBs of 110001 viz. 001.
● Step 5 – Navigation: Now, navigate to the bucket pointed by the directory with directory-id
001.
● Step 6 – Insertion and Overflow Check: Insert the element and check if the bucket
overflows. If an overflow is encountered, go to step 7 followed by Step 8, otherwise, go
to step 9.
● Step 7 – Tackling Overflow Condition during Data Insertion: Many times, while inserting
data in the buckets, it might happen that the Bucket overflows. In such cases, we need to
follow an appropriate procedure to avoid mishandling of data. First, Check if the local depth
is less than or equal to the global depth. Then choose one of the cases below.
● Case1: If the local depth of the overflowing Bucket is equal to the global depth, then
Directory Expansion, as well as Bucket Split, needs to be performed. Then increment
the global depth and the local depth value by 1. And, assign appropriate pointers.
Directory expansion will double the number of directories present in the hash structure.
37
Data Structures and Algorithms18CSC201J
● Case2: In case the local depth is less than the global depth, then only Bucket Split takes
place. Then increment only the local depth value by 1. And, assign appropriate pointers.
38
Data Structures and Algorithms18CSC201J
Fig.51: Insert 22
As directed by Step 7-Case 1, Since Local Depth = Global Depth, the bucket splits and directory
expansion takes place. Also, rehashing of numbers present in the overflowing bucket takes place
after the split. And, since the global depth is incremented by 1, now, the global depth is 2. Hence,
16, 4, 6, 22 are now rehashed w.r.t 2 LSBs. [ 16(10000), 4(100), 6(110), 22(10110) ]
39
Data Structures and Algorithms18CSC201J
Fig.54: insert 20
20 is inserted in bucket pointed out by 00. As directed by Step 7-Case 1, since the local depth of
the bucket = global-depth, directory expansion (doubling) takes place along with bucket splitting.
Elements present in overflowing bucket are rehashed with the new global depth. Now, the new
Hash table looks like this:
40
Data Structures and Algorithms18CSC201J
Fig.56: Insert 26
The bucket overflows, and, as directed by Step 7-Case 2, since the local depth of bucket < Global
depth (2<3), directories are not doubled but, only the bucket is split and elements are rehashed.
Finally, the output of hashing the given list of numbers is obtained.
41
Data Structures and Algorithms18CSC201J
Key Observations:
1. A Bucket will have more than one pointers pointing to it if its local depth is less than the
global depth.
2. When overflow condition occurs in a bucket, all the entries in the bucket are rehashed with a
new local depth.
3. If Local Depth of the overflowing bucket is equal to the global depth, only then the directories
are doubled and the global depth is incremented by 1.
4. The size of a bucket cannot be changed after the data insertion process begins.
Advantages:
1. Data retrieval is less expensive (in terms of computing).
2. No problem of Data-loss since the storage capacity increases dynamically.
3. With dynamic changes in hashing function, associated old values are rehashed w.r.t the new
hash function.
Limitations of Extendible Hashing:
1. The directory size may increase significantly if several records are hashed on the same
directory while keeping the record distribution non-uniform.
2. Size of every bucket is fixed.
3. Memory is wasted in pointers when the global depth and local depth difference becomes
drastic.
4. This method is complicated to code.
42
Data Structures and Algorithms18CSC201J
References
1. Reema Thareja, “Data Structures Using C”, Oxford Higher Education, First Edition,
2011.
2. [Link]
3. [Link]
4. [Link] 85908260#:~:text=The
%20Bellman%E2%80%93Ford%20algorithm%20is,the%20usefulness%20of%20this
%20algorithm.
43