0% found this document useful (0 votes)
5 views43 pages

DSA-UNIT V Lecture Notes

This document covers various topics related to graphs in data structures, including terminology, traversal methods (Depth First Search and Breadth First Search), and algorithms for topological sorting, minimum spanning trees (Kruskal's and Prim's algorithms), and shortest path problems. It explains fundamental concepts such as vertices, edges, adjacency, and paths, as well as basic operations on graphs. The document also discusses the properties and applications of spanning trees in network planning and routing.

Uploaded by

Coron Henth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views43 pages

DSA-UNIT V Lecture Notes

This document covers various topics related to graphs in data structures, including terminology, traversal methods (Depth First Search and Breadth First Search), and algorithms for topological sorting, minimum spanning trees (Kruskal's and Prim's algorithms), and shortest path problems. It explains fundamental concepts such as vertices, edges, adjacency, and paths, as well as basic operations on graphs. The document also discusses the properties and applications of spanning trees in network planning and routing.

Uploaded by

Coron Henth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Data Structures and Algorithms18CSC201J

UNIT V (Graphs)
Topics to be covered:
Graph Terminology, Graph Traversal, Topological sorting - Minimum spanning tree –
Prim’s Algorithm - Kruskal’s Algorithm - Network flow problem - Shortest Path Algorithm:
Dijkstra - Graph Search: Depth First Search, Breadth First Search - Hashing: Hash
functions, Collision avoidance, Separate chaining - Open addressing: Linear probing,
Quadratic Probing, Double hashing, Rehashing, Extensible Hashing

Graph Terminology
Graph is defined as an ordered set <V, E>, where V(G) represents the set of vertices in the graph G
and E(G) represents the set of Edges in the graph G. Before proceeding further, let's discuss some
frequent terms used in graph concept –
● Vertex

Each node of the graph is represented as a vertex. In example given below, labelled circle rep

● Edge

Edge represents a path between two vertices or a line between two vertices. In example given b

● Adjacency
Two node or vertices are adjacent if they are connected to each other through an edge. In

● Path
Path represents a sequence of edges between two vertices. In example given below, ABCD rep

1
Data Structures and Algorithms18CSC201J

Fig.1: Graph
A graph is a pictorial representation of a set of objects where some pairs of objects are connected
by links. The interconnected objects are represented by points termed as vertices, and the links that
connect the vertices are called edges.
Formally, a graph is a pair of sets (V, E), where V is the set of vertices and E is the set of edges,
connecting the pairs of vertices. Take a look at the following graph –

Fig.2: Graph
In the above graph,
V = {a, b, c, d, e}
E = {ab, ac, bd, cd, de}
Basic Operations
Following are basic primary operations of a Graph which are following.
● Add Vertex add a vertex to a graph.
● Add Edge add an edge between two vertices of a graph.
● Display Vertex display a vertex of a graph.

Graph Traversal
1. Depth First Traversal
Depth First Search algorithm(DFS) traverses a graph in a depth ward motion and uses a stack to
remember to get the next vertex to start a search when a dead end occurs in any iteration.

2
Data Structures and Algorithms18CSC201J

Fig.3: Graph for DFS

As in example given above, DFS algorithm traverses from A to B to C to D first then to E, then to F
and lastly to G. It employs following rules.
● Rule 1
Visit adjacent unvisited vertex. Mark it visited. Display it. Push it in a stack.

● Rule 2
If no adjacent vertex found, pop up a vertex from stack. (It will pop up all the vertices from t

● Rule 3 Repeat Rule 1 and Rule 2 until stack is empty.

Step Traversal Description

1 Initialize the stack

3
Data Structures and Algorithms18CSC201J
Mark S as visited and put it onto the
stack. Explore any unvisited
adjacent node from S. We have

2 three nodes and we can pick any of


them.

For this example, we shall take the


node in alphabetical order.

Mark A as visited and put it onto the


stack. Explore any unvisited
3 adjacent node from A. Both Sand D
are adjacent to A but we are
concerned for unvisited nodes only.

Visit D and mark it visited and put


onto the stack. Here we have B and
4 C nodes which are adjacent to D and
both are unvisited. But we shall
again choose in alphabetical order.

We choose B, mark it visited and


put onto stack. Here B does not have
5
any unvisited adjacent node. So we
pop B from the stack.

4
Data Structures and Algorithms18CSC201J

We check stack top for return to


previous node and check if it has
6
any unvisited nodes. Here, we find
D to be on the top of stack.

Only unvisited adjacent node is

from D is C now. So we visit C,


7
mark it visited and put it onto the
stack.

Table1: Steps of DFS

As C does not have any unvisited adjacent node so we keep popping the stack until we find a node
which has unvisited adjacent node. In this case, there's none and we keep popping until stack is
empty.

2. Breadth First Traversal


Breadth First Search algorithm(BFS) traverses a graph in a breadth wards motion and uses a queue
to remember to get the next vertex to start a search when a dead end occurs in any iteration.
As in example given below, BFS algorithm traverses from A to B to E to F first then to C and
G lastly to D. It employs following rules.
● Rule 1
Visit adjacent unvisited vertex. Mark it visited. Display it. Insert it in a queue.

● Rule 2 If no adjacent vertex found, remove the first vertex from queue.
● Rule 3 Repeat Rule 1 and Rule 2 until queue is empty.

5
Data Structures and Algorithms18CSC201J

Fig.4: Graph for BFS

Step Traversal Description

1 Initialize the stack

We start from visiting S(starting


2
node), and mark it visited

We then see unvisited adjacent


node from S. In this example,
we have three nodes but
3
alphabetically we choose A
mark it visited and enqueue

it.

6
Data Structures and Algorithms18CSC201J

Next unvisited adjacent node

4 from S is B. We mark it visited


and enqueue it.

Next unvisited adjacent node

5 from S is C. We mark it visited


and enqueue it.

Now S is left with no unvisited

6 adjacent nodes. So we dequeue


and find A.

From A we have D as unvisited

7 adjacent node. We mark it


visited and enqueue it.

Table2: Steps of BFS


At this stage we are left with no unmarked (unvisited) nodes. But as per algorithm we keep on de-
queuing in order to get all unvisited nodes. When the queue gets emptied the program is over.

Topological sort
Topological sort: an ordering of the vertices in a directed acyclic graph, such that:
If there is a path from u to v, then v appears after u in the ordering.

7
Data Structures and Algorithms18CSC201J
Types of graphs:
The graphs should be directed: otherwise for any edge (u,v) there would be a path from u to v and
also from v to u, and hence they cannot be ordered.
The graphs should be acyclic: otherwise for any two vertices u and v on a cycle u would precede v
and v would precede u. The ordering may not be unique:

Fig.5: Directed Graph

V1, V2, V3, V4 and V1, V3, V2, V4 are legal orderings
Outdegree of a vertex U: the number of edges (U,V) - outgoing edges
Indegree of a vertex U: the number of edges (V,U) - incoming edges
Degree of a vertex U: total number of edges associated with u.
The algorithm for topological sort uses "indegrees" of vertices.
Algorithm
1. Compute the indegrees of all vertices
2. Find a vertex U with indegree 0 and print it (store it in the ordering)
If there is no such vertex then there is a cycle and the vertices cannot be ordered. Stop.
3. Remove U and all its edges (U,V) from the graph.
4. Update the indegrees of the remaining vertices.
5. Repeat steps 2 through 4 while there are vertices to be processed.

Example

8
Data Structures and Algorithms18CSC201J
Fig.6: Graph
1. Compute the indegrees: V1: 0, V2: 1, V3: 2, V4: 2, V5: 2
2. Find a vertex with indegree 0: V1
3. Output V1, remove V1 and update the indegrees:
Sorted: V1
Remove edges: (V1,V2) , (V1, V3) and (V1,V4)
Updated indegrees: V2: 0, V3: 1, V4: 1, V5: 2
The process is depicted in the following table:

Indegree

V1, V2, V4, V1, V2, V4, V3,


Sorted V1 V1, V2 V1, V2, V4 V3 V5

V1 0

V2 1 0

V3 2 1 1 0

V4 2 1 0

V5 2 2 1 0 0

Table3: Process of topological sorting


One possible sorting: V1, V2, V4, V3, V5
Another sorting: V1, V2, V4, V5, V3
Complexity of this algorithm: O(|V|2), |V| - the number of vertices. To find a vertex of indegree 0
we scan all the vertices - |V| operations. We do this for all vertices: |V|2.

Minimum Spanning Tree


A spanning tree is a subset of Graph G, which has all the vertices covered with minimum possible
number of edges. Hence, a spanning tree does not have cycles and it can not be disconnected. By
this definition we can draw a conclusion that every connected & undirected Graph G has at least
one spanning tree. A disconnected graph does not have any spanning tree, as it can not spanned to
all its vertices.
We found three spanning trees off one complete graph. A complete undirected graph can have
maximum nn-2 number of spanning trees, where n is number of nodes. In addressed example, n is
3, hence 33-2 = 3 spanning trees are possible.

9
Data Structures and Algorithms18CSC201J

Fig.7: Example of Graph and its possible spanning trees

General properties of spanning tree

We now understand that one graph can have more than one spanning trees. Below are few properties is spann

● A connected graph G can have more than one spanning tree.


● All possible spanning trees of graph G, have same number of edges and vertices.
● Spanning tree does not have any cycle (loops)
● Removing one edge from spanning tree will make the graph disconnected i.e. spanning tree
is minimally connected.
● Adding one edge to a spanning tree will create a circuit or loop i.e. spanning tree is
maximally acyclic.
Mathematical properties of spanning tree
● Spanning tree has n-1 edges, where n is number of nodes (vertices)
● From a complete graph, by removing maximum e-n+1 edges, we can construct a spanning
tree.
● A complete graph can have maximum nn-2 number of spanning trees.
So we can conclude here that spanning trees are subset of a connected Graph G and disconnected
Graphs do not have spanning tree.

Application of Spanning Tree


Spanning tree is basically used to find minimum paths to connect all nodes in a graph.
Common application of spanning trees are –
● Civil Network Planning
10
Data Structures and Algorithms18CSC201J
● Computer Network Routing Protocol
● Cluster Analysis
Let’s understand this by a small example. Consider city network as a huge graph and now plan to
deploy telephone lines such a way that in minimum lines we can connect to all city nodes. This is
where spanning tree comes in the picture.
Minimum Spanning Tree (MST)
In a weighted graph, a minimum spanning tree is a spanning tree that has minimum weight that all
other spanning trees of the same graph. In real world situations, this weight can be measured as
distance, congestion, traffic load or any arbitrary value denoted to the edges.
Minimum Spanning-Tree Algorithm
We shall learn about two most important spanning tree algorithms here
● Kruskal's Algorithm
● Prim's Algorithm
Kruskal’s Algorithm
Kruskal's algorithm to find minimum cost spanning tree uses greedy approach. This algorithm treats
the graph as a forest and every node it as an individual tree. A tree connects to another only and
only if it has least cost among all available options and does not violate MST properties.
To understand Kruskal's algorithm we shall take the following example –

Fig.8: Example graph for kruskal’s Algorithm


Step 1 - Remove all loops & Parallel Edges
Remove all loops and parallel edges from the given graph.

11
Data Structures and Algorithms18CSC201J

Fig.9: Remove the parallel edges and self loops


In case of parallel edges, keep the one which has least cost associated and remove all others.

Fig.10: Graph after step 1


Step 2 - Arrange all edges in their increasing order of weight
Next step is to create a set of edges & weight and arrange them in ascending order of weightage
(cost).

Fig.11: Edges are arranged according to their weight

Step 3 - Add the edge which has least weightage


Now we start adding edges to graph beginning from the one which has least weight. At all time, we
shall keep checking that the spanning properties are remain intact. In case, by adding one edge, the
spanning tree property does not hold then we shall consider not to include the edge in graph.

Fig.12: Adding edges of least weight

12
Data Structures and Algorithms18CSC201J
The least cost is 2 and edges involved are B,D and D,T so we add them. Adding them does not
violate spanning tree properties so we continue to our next edge selection.
Next cost is 3, and associated edges are A,C and C,D. So we add them –

Fig.13: Adding edges of least weight


Next cost in the table is 4, and we observe that adding it will create a circuit in the graph and we
ignore it. And in the process we shall ignore/avoid all edges which create circuit.

Fig.14: Remove the edges which make circuits

Fig.15: After removing the edges that makes circuit


We observe that edges with cost 5 and 6 also create circuits and we ignore them and move on.

Fig.16: After removing the edges that makes circuit


Now we are left with only one node to be added. Between two least cost edges available 7, 8, we
shall add the edge with cost 7.

13
Data Structures and Algorithms18CSC201J

Fig.17: Adding edges of least weight


By adding edge S,A we have included all the nodes of the graph and we have minimum cost
spanning tree.
Algorithm
1. Sort all the edges in non-decreasing order of their weight.
2. Pick the smallest edge. Check if it forms a cycle with the spanning tree formed so far. If cycle is
not formed, include this edge. Else, discard it.
3. Repeat step#2 until there are (V-1) edges in the spanning tree.

Prims Algorithm
Prim's algorithm to find minimum cost spanning tree (as Kruskal's algorithm) uses greedy
approach. Prim's algorithm shares similarity with shortest path first algorithms. Prim's algorithm,
in contrast with Kruskal's algorithm, treats the nodes as a single tree and keeps on adding new
nodes to the spanning tree from the given graph.
To contrast with Kruskal's algorithm and to understand Prim's algorithm better, we shall use the
same example –

Fig.18: Example graph for Prim’s algorithm

Step 1 - Remove all loops & Parallel Edges

14
Data Structures and Algorithms18CSC201J

Fig.19: Removing the Parellel edges and self loops


Remove all loops and parallel edges from the given graph. In case of parallel edges, keep the one
which has least cost associated and remove all others.

Fig.20: After step1


Step 2 - Choose any arbitrary node as root node
In this case, we choose S node as root node of Prim's spanning tree. This node is arbitrarily chosen,
so any node can be root node.
Step 3 - Check outgoing edges and select the one with less cost
After choosing root node S, we see that <S,A> and <S,C> are two edges with weight 7 and 8
respectively. And we choose <S,A> edge as it is lesser than the other.

Fig.21: Choose the least cost edge associated with node S


Now, the tree <SA> is treated as one node and we check for all edges going out from it. We select
the one which has the lowest cost and include it in the tree.

15
Data Structures and Algorithms18CSC201J

Fig.22: Choose the least cost edge associated with node A


After this step, <SAC> tree is formed. Now we'll again treat is as a node and will check all the
edges again and will choose only the least cost edge one. Here in this case <C,D> is the new edge
which is less than other edges' cost 8, 6, 4 etc.

Fig.23: Choose the least cost edge associated with node C


After adding node D to the spanning tree, we now have two edges going out of it have same cost,
i.e. <D,T> and <D,B>. So we can add either one. But the next step will again yield the edge 2 as the
least cost. So here we are showing spanning tree with both edges included.

Fig.24: Choose the least cost edge associated with node D


We may find that the output spanning tree of the same graph using two different algorithms is
same.
Algorithm
1) Create a set mstSet that keeps track of vertices already included in MST.
2) Assign a key value to all vertices in the input graph. Initialize all key values as INFINITE.
Assign key value as 0 for the first vertex so that it is picked first.
3) While mstSet doesn’t include all vertices
a) Pick a vertex u which is not there in mstSet and has minimum key value.
b) Include u to mstSet.

16
Data Structures and Algorithms18CSC201J
c) Update key value of all adjacent vertices of u. To update the key values, iterate through
all adjacent vertices. For every adjacent vertex v, if weight of edge u-v is less than the
previous key value of v, update the key value as weight of u-v

Network Flow Problem


Given a graph which represents a flow network where every edge has a capacity. Also given two
vertices source‘s’ and sink ‘t’ in the graph, find the maximum possible flow from s to t with
following constraints:
a) Flow on an edge doesn’t exceed the given capacity of the edge.
b) Incoming flow is equal to outgoing flow for every vertex except s and t.

Augmenting Path
The idea behind the algorithm is as follows: as long as there is a path from the source (start node) to
the sink (end node), with available capacity on all edges in the path, we send flow along one of the
paths. Then we find another path, and so on. A path with available capacity is called an augmenting
path.
Ford-Fulkerson Algorithm
The following is simple idea of Ford-Fulkerson algorithm:
1) Start with initial flow as 0.
2) While there is a augmenting path from source to sink.
Add this path-flow to flow.
3) Return flow.
Example:
In the following graph each edge is labelled with <allocated flow, maximum allowed flow>

Fig.25: initial flow is 0


Flow = 0

17
Data Structures and Algorithms18CSC201J

Fig.26: Augmented path from s to t

Fig.27: Send flow along the path


Flow = 0+20 = 20

Fig.28: Augmented path from s to t

18
Data Structures and Algorithms18CSC201J

Fig.29: Send flow along the path


Flow = 20+10 = 30

Fig.30: Augment Path from s to t

Fig.31: Send flow along path


Flow = 30+10 = 40

19
Data Structures and Algorithms18CSC201J

Fig.32: No more augmenting path can be found. Label all the vertices that can be reached via non-
saturated edges as “belonging to the source s”.

Fig.33: Label all remaining vertices as “belonging to the sink t”

Fig.34: The edges on the boundary of this labelling form a minimum s-t cut.

As we can find from above figure that maximum possible flow is 40 for the given network graph.
The edges are associated with source‘s’ and sink ‘t’ having the flow is 40.

20
Data Structures and Algorithms18CSC201J
Shortest Path Algorithm
Dijkstra's algorithm - is a solution to the single-source shortest path problem in graph theory.
Works on both directed and undirected graphs. However, all edges must have non-negative
weights.
Input: Weighted graph G={E,V} and source vertex v∈V, such that all edge weights are
nonnegative
Output: Lengths of shortest paths (or the shortest paths themselves) from a given source vertex
s∈V to all other vertices
Approach:
⚫ The algorithm computes for each vertex u the distance to u from the start vertex v, that is,
the weight of a shortest path between v and u.
⚫ the algorithm keeps track of the set of vertices for which the distance has been computed,
called the cloud C
⚫ Every vertex has a label D associated with it. For any vertex u, D[u] stores an
approximation of the distance between v and u. The algorithm will update a D[u] value
when it finds a shorter path from v to u.
⚫ When a vertex u is added to the cloud, its label D[u] is equal to the actual (final) distance
between the starting vertex v and vertex u.
Algorithm:

21
Data Structures and Algorithms18CSC201J

Example:

Fig.35: A is source vertex and Q contains all the vertices of graph

Fig.36: Updating the distance of neighbour vertices and set of discovered vertices

22
Data Structures and Algorithms18CSC201J

Fig.37: Updating the distance of neighbour vertices and set of discovered vertices

Fig.38: Updating the distance of neighbour vertices and set of discovered vertices

23
Data Structures and Algorithms18CSC201J

Fig.39: Updating the distance of neighbour vertices and set of discovered vertices

Fig.40: Updating the distance of neighbour vertices and set of discovered vertices

24
Data Structures and Algorithms18CSC201J

Fig.41: Updating the distance of neighbour vertices and set of discovered vertices

Fig.42: Updating the distance of neighbour vertices and set of discovered vertices. As all vertices
have been discovered from Q and added to S.

Hashing
Hash Table
Hash Table is a data structure which store data in associative manner. In hash table, data is stored in
array format where each data value has its own unique index value. Access of data becomes very
fast if we know the index of desired data.

25
Data Structures and Algorithms18CSC201J
Thus, it becomes a data structure in which insertion and search operations are very fast irrespective
of size of data. Hash Table uses array as a storage medium and uses hash technique to generate
index where an element is to be inserted or to be located from.

Fig.43: Hash Mapping

Hashing
Hashing is a technique to convert a range of key values into a range of indexes of an array. We're
going to use modulo operator to get a range of key values. Consider an example of hash table of
size 20, and following items are to be stored. Item are in (key,value) format.

● (1,20)
● (2,70)
● (42,80)
● (4,25)
● (12,44)
● (14,32)
● (17,11)
● (13,78)
● (37,98)

S.
Key Hash Array Index
No.

1 1 1%20 = 1 1

2 2 2%20 = 2 2

3 42 42%20 = 2 2
26
Data Structures and Algorithms18CSC201J

4 4 4%20 = 4 4

5 12 12%20 = 12 12

6 14 14%20 = 14 14

7 17 17%20 = 17 17

8 13 13%20 = 13 13

9 37 37%20 = 17 17

Table4: Simple Hashing


Linear Probing
As we can see, it may happen that the hashing technique used here, creates already used index of
the array. In such case, we can search the next empty location in the array by looking into the next
cell until we found an empty cell. This technique is called linear probing.
After Linear
S.
Key Hash Array Index Probing,
No.
Array Index

1 1 1%20 = 1 1 1

2 2 2%20 = 2 2 2

3 42 42%20 = 2 2 3

4 4 4%20 = 4 4 4

5 12 12%20 = 12 12 12

6 14 14%20 = 14 14 14

7 17 17%20 = 17 17 17

8 13 13%20 = 13 13 13

9 37 37%20 = 17 17 18

Table5: Hashing with linear probing


Basic Operations
Following are basic primary operations of a hash table which are following.
 Search search an element in a hash table.
 Insert insert an element in a hash table.

27
Data Structures and Algorithms18CSC201J
 delete delete an element from a hash table.
DataItem
Define a data item having some data, and key based on which search is to be conducted in hash
table.
struct DataItem
{
int data;
int key;
};

Hash Method
Define a hashing method to compute the hash code of the key of the data item.
int hashCode(int key)
{
return key % SIZE;
}

Search Operation
Whenever an element is to be searched. Compute the hash code of the key passed and locate the
element using that hash code as index in the array. Use linear probing to get element ahead if
element not found at computed hash code.
DataItem *search(int key)
{
//get the hash
int hashIndex = hashCode(key);
//move in array until an empty
while(hashArray[hashIndex] != NULL)
{
if(hashArray[hashIndex]->key == key)
return hashArray[hashIndex];
//go to next cell
++hashIndex;
//wrap around the table
hashIndex %= SIZE;
}
return NULL;
28
Data Structures and Algorithms18CSC201J
}

Insert Operation
Whenever an element is to be inserted. Compute the hash code of the key passed and locate the
index using that hashcode as index in the array. Use linear probing for empty location if an element
is found at computed hash code.
void insert(int key,int data)
{
struct DataItem *item = (struct DataItem*) malloc(sizeof(struct DataItem));
item->data = data;
item->key = key;
//get the hash
int hashIndex = hashCode(key);
//move in array until an empty or deleted cell
while(hashArray[hashIndex] != NULL && hashArray[hashIndex]->key != -1)
{
//go to next cell
++hashIndex;
//wrap around the table
hashIndex %= SIZE;
}
hashArray[hashIndex] = item;
}

Delete Operation
Whenever an element is to be deleted. Compute the hash code of the key passed and locate the
index using that hashcode as index in the array. Use linear probing to get element ahead if an
element is not found at computed hash code. When found, store a dummy item there to keep
performance of hashtable intact.
struct DataItem* delete(struct DataItem* item)
{
int key = item->key;
//get the hash
int hashIndex = hashCode(key);
//move in array until an empty
while(hashArray[hashIndex] !=NULL)
29
Data Structures and Algorithms18CSC201J
{
if(hashArray[hashIndex]->key == key)
{
struct DataItem* temp = hashArray[hashIndex];
//assign a dummy item at deleted position
hashArray[hashIndex] = dummyItem;
return temp;
}
//go to next cell
++hashIndex;
//wrap around the table
hashIndex %= SIZE;
}
return NULL;
}

Collision
Since a hash function gets us a small number for a key which is a big integer or string, there is
possibility that two keys result in same value. The situation where a newly inserted key maps to an
already occupied slot in hash table is called collision and must be handled using some collision
handling technique.
Collisions are very likely even if we have big table to store keys. An important observation is
Birthday Paradox. With only 23 persons, the probability that two people have same birthday is
50%.
There are mainly two methods to handle collision:
1) Separate Chaining
2) Open Addressing
Separate Chaining:
The idea is to make each cell of hash table point to a linked list of records that have same hash
function value.
Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92,
73, 101.

30
Data Structures and Algorithms18CSC201J

Fig.44: Hashing with separate chaining


Advantages:
1) Simple to implement.
2) Hash table never fills up, we can always add more elements to chain.
3) Less sensitive to the hash function or load factors.
4) It is mostly used when it is unknown how many and how frequently keys may be inserted or
deleted.
Disadvantages:
1) Cache performance of chaining is not good as keys are stored using linked list. Open addressing
provides better cache performance as everything is stored in same table.
2) Wastage of Space (Some Parts of hash table are never used)
3) If the chain becomes long, then search time can become O(n) in worst case.
4) Uses extra space for links.
Open Addressing
Like separate chaining, open addressing is a method for handling collisions. In Open
Addressing, all elements are stored in the hash table itself. So at any point, size of table must be
greater than or equal to total number of keys (Note that we can increase table size by copying old
data if needed).
Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert k.
Search(k): Keep probing until slot’s key doesn’t become equal to k or an empty slot is reached.
Delete(k): Delete operation is interesting. If we simply delete a key, then search may fail. So slots
of deleted keys are marked specially as “deleted”.

31
Data Structures and Algorithms18CSC201J
Insert can insert an item in a deleted slot, but search doesn’t stop at a deleted slot.
Open Addressing is done following ways:
a) Linear Probing:
In linear probing, we linearly probe for next slot. For example, typical gap between two probes is 1
as taken in below example also.
let hash(x) be the slot index computed using hash function and S be the table size
If slot hash(x) % S is full, then we try (hash(x) + 1) % S
If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92,
73, 101.

Fig.45: Hashing Linear Probing


Clustering:
The main problem with linear probing is clustering, many consecutive elements form groups and it
starts taking time to find a free slot or to search an element.

b) Quadratic Probing
We look for i2‘th slot in i’th iteration.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S

32
Data Structures and Algorithms18CSC201J
c) Double Hashing
We use another hash function hash2(x) and look for i*hash2(x) slot in i’th rotation.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S
Comparison of above three:
Linear probing has the best cache performance, but suffers from clustering. One more advantage of
linear probing is easy to compute.
Quadratic probing lies between the two in terms of cache performance and clustering.
Double hashing has poor cache performance but no clustering. Double hashing requires more
computation time as two hash functions need to be computed.
Open Addressing vs. Separate Chaining
Advantages of Chaining:
1) Chaining is Simpler to implement.
2) In chaining, Hash table never fills up, we can always add more elements to chain. In open
addressing, table may become full.
3) Chaining is Less sensitive to the hash function or load factors.
4) Chaining is mostly used when it is unknown how many and how frequently keys may be inserted
or deleted.
5) Open addressing requires extra care for to avoid clustering and load factor.
Advantages of Open Addressing
1) Cache performance of chaining is not good as keys are stored using linked list. Open addressing
provides better cache performance as everything is stored in same table.
2) Wastage of Space (Some Parts of hash table in chaining are never used). In Open
addressing, a slot can be used even if an input doesn’t map to it.
3) Chaining uses extra space for links.
Rehashing:
• It is a closed hashing technique.
• If the table gets too full, then the rehashing method builds new table that is about twice as big and
scan down the entire original hash table, comparing the new hash value for each element and
inserting it in the new table.
• Rehashing is very expensive since the running time is O(N), since there are N elements to rehash
and the table size is roughly 2N.
Rehashing can be implemented in several ways like
1. Rehash, as soon as the table is half full
33
Data Structures and Algorithms18CSC201J
2. Rehash only when an insertion fails
Example: Suppose the elements 13, 15, 24, 6 are inserted into an open addressing hash table of size
7 and if linear probing is used when collision occurs.
0 6

1 15

3 24

6 13

Table6: Open addressing Hash table


If 23 is inserted, the resulting table will be over 70 percent full.
0 6

1 15

2 23

3 24

6 13

Table7: Open addressing Hash table


A new table is created. The size of the new table is 17, as this is the first prime number that is twice
as large as the old table size.
0

34
Data Structures and Algorithms18CSC201J
5

6 6

7 23

8 24

10

11

12

13 13

14

15 15

16

Table8: Open addressing Hash table


Advantages
• Programmer doesn‟t worry about the table size
• Simple to implement

Extendible Hashing
Extendible Hashing is a dynamic hashing method wherein directories, and buckets are used to hash
data. It is an aggressively flexible method in which the hash function also experiences dynamic
changes.
Main features of Extendible Hashing: The main features in this hashing technique are:
● Directories: The directories store addresses of the buckets in pointers. An id is assigned to
each directory which may change each time when Directory Expansion takes place.
● Buckets: The buckets are used to hash the actual data.
Basic Structure of Extendible Hashing:

35
Data Structures and Algorithms18CSC201J

Fig.46
Frequently used terms in Extendible Hashing:
● Directories: These containers store pointers to buckets. Each directory is given a unique id
which may change each time when expansion takes place. The hash function returns this
directory id which is used to navigate to the appropriate bucket. Number of Directories =
2^Global Depth.
● Buckets: They store the hashed keys. Directories point to buckets. A bucket may contain
more than one pointers to it if its local depth is less than the global depth.
● Global Depth: It is associated with the Directories. They denote the number of bits which are
used by the hash function to categorize the keys. Global Depth = Number of bits in directory
id.
● Local Depth: It is the same as that of Global Depth except for the fact that Local Depth is
associated with the buckets and not the directories. Local depth in accordance with the global
depth is used to decide the action that to be performed in case an overflow occurs. Local
Depth is always less than or equal to the Global Depth.
● Bucket Splitting: When the number of elements in a bucket exceeds a particular size, then
the bucket is split into two parts.
● Directory Expansion: Directory Expansion Takes place when a bucket overflows. Directory
Expansion is performed when the local depth of the overflowing bucket is equal to the global
depth.
Basic Working of Extendible Hashing:

36
Data Structures and Algorithms18CSC201J

Fig.47: Working of Extendible Hashing

● Step 1 – Analyze Data Elements: Data elements may exist in various forms eg. Integer,
String, Float, etc.. Currently, let us consider data elements of type integer. eg: 49.
● Step 2 – Convert into binary format: Convert the data element in Binary form. For string
elements, consider the ASCII equivalent integer of the starting character and then convert the
integer into binary form. Since we have 49 as our data element, its binary form is 110001.
● Step 3 – Check Global Depth of the directory. Suppose the global depth of the Hash-
directory is 3.
● Step 4 – Identify the Directory: Consider the ‘Global-Depth’ number of LSBs in the binary
number and match it to the directory id. For example binary obtained is: 110001 and the
global-depth is 3. So, the hash function will return 3 LSBs of 110001 viz. 001.
● Step 5 – Navigation: Now, navigate to the bucket pointed by the directory with directory-id
001.
● Step 6 – Insertion and Overflow Check: Insert the element and check if the bucket
overflows. If an overflow is encountered, go to step 7 followed by Step 8, otherwise, go
to step 9.
● Step 7 – Tackling Overflow Condition during Data Insertion: Many times, while inserting
data in the buckets, it might happen that the Bucket overflows. In such cases, we need to
follow an appropriate procedure to avoid mishandling of data. First, Check if the local depth
is less than or equal to the global depth. Then choose one of the cases below.
● Case1: If the local depth of the overflowing Bucket is equal to the global depth, then
Directory Expansion, as well as Bucket Split, needs to be performed. Then increment
the global depth and the local depth value by 1. And, assign appropriate pointers.
Directory expansion will double the number of directories present in the hash structure.

37
Data Structures and Algorithms18CSC201J
● Case2: In case the local depth is less than the global depth, then only Bucket Split takes
place. Then increment only the local depth value by 1. And, assign appropriate pointers.

Fig.48: Handle Overflow


● Step 8 – Rehashing of Split Bucket Elements: The Elements present in the overflowing
bucket that is split are rehashed with respect to the new global depth of the directory.
● Step 9 – The element is successfully hashed.
Example: Now, let us consider a prominent example of hashing the following elements:
16, 4, 6, 22, 24, 10, 31, 7, 9, 20, 26.
Bucket Size: 3(Assume)
Hash Function: Suppose the global depth is X. Then the Hash Function returns X LSBs.
● Solution: First, calculate the binary forms of each of the given numbers.
16- 10000, 4- 00100, 6- 00110, 22- 10110, 24- 11000, 10- 01010, 31- 11111, 7- 00111, 9-
01001, 20- 0100, 26- 01101
● Initially, the global-depth and local-depth is always 1. Thus, the hashing frame looks like this:

Fig.49: Initial Hashing Frame


● Inserting 16: The binary format of 16 is 10000 and global-depth is 1. The hash function
returns 1 LSB of 10000 which is 0. Hence, 16 is mapped to the directory with id=0.
● Inserting 4 and 6: Both 4(100) and 6(110) have 0 in their LSB. Hence, they are hashed as
follows:

38
Data Structures and Algorithms18CSC201J

Fig.50: Insert 16, 4, 6


● Inserting 22: The binary form of 22 is 10110. Its LSB is 0. The bucket pointed by directory
0 is already full. Hence, Over Flow occurs.

Fig.51: Insert 22
As directed by Step 7-Case 1, Since Local Depth = Global Depth, the bucket splits and directory
expansion takes place. Also, rehashing of numbers present in the overflowing bucket takes place
after the split. And, since the global depth is incremented by 1, now, the global depth is 2. Hence,
16, 4, 6, 22 are now rehashed w.r.t 2 LSBs. [ 16(10000), 4(100), 6(110), 22(10110) ]

Fig.52: After handling Overflow Condition


● Inserting 24 and 10: 24 (11000) and 10 (1010) can be hashed based on directories with id 00
and 10. Here, we encounter no overflow condition.

39
Data Structures and Algorithms18CSC201J

Fig.53: Insert 24 and 10


● Inserting 20: Insertion of data element 20 (10100) will again cause the overflow problem.

Fig.54: insert 20
20 is inserted in bucket pointed out by 00. As directed by Step 7-Case 1, since the local depth of
the bucket = global-depth, directory expansion (doubling) takes place along with bucket splitting.
Elements present in overflowing bucket are rehashed with the new global depth. Now, the new
Hash table looks like this:

40
Data Structures and Algorithms18CSC201J

Fig.55: Handling Overflow


● Inserting 26: Global depth is 3. Hence, 3 LSBs of 26(11010) are considered. Therefore 26

best fits in the bucket pointed out by directory 010.

Fig.56: Insert 26

The bucket overflows, and, as directed by Step 7-Case 2, since the local depth of bucket < Global
depth (2<3), directories are not doubled but, only the bucket is split and elements are rehashed.
Finally, the output of hashing the given list of numbers is obtained.

41
Data Structures and Algorithms18CSC201J

Fig.57: Handling Overflow


● Hashing of 11 Numbers is Thus Completed.

Key Observations:
1. A Bucket will have more than one pointers pointing to it if its local depth is less than the
global depth.
2. When overflow condition occurs in a bucket, all the entries in the bucket are rehashed with a
new local depth.
3. If Local Depth of the overflowing bucket is equal to the global depth, only then the directories
are doubled and the global depth is incremented by 1.
4. The size of a bucket cannot be changed after the data insertion process begins.
Advantages:
1. Data retrieval is less expensive (in terms of computing).
2. No problem of Data-loss since the storage capacity increases dynamically.
3. With dynamic changes in hashing function, associated old values are rehashed w.r.t the new
hash function.
Limitations of Extendible Hashing:
1. The directory size may increase significantly if several records are hashed on the same
directory while keeping the record distribution non-uniform.
2. Size of every bucket is fixed.
3. Memory is wasted in pointers when the global depth and local depth difference becomes
drastic.
4. This method is complicated to code.

42
Data Structures and Algorithms18CSC201J

References
1. Reema Thareja, “Data Structures Using C”, Oxford Higher Education, First Edition,
2011.
2. [Link]
3. [Link]
4. [Link] 85908260#:~:text=The
%20Bellman%E2%80%93Ford%20algorithm%20is,the%20usefulness%20of%20this
%20algorithm.

43

You might also like