0% found this document useful (0 votes)
54 views191 pages

Algorithm Analysis and Complexity Basics

The document provides an overview of algorithm analysis, including concepts of time and space complexity, asymptotic notations, and characteristics of algorithms. It discusses the advantages and disadvantages of algorithms, different approaches to algorithm design, and methods for writing algorithms. Additionally, it covers algorithm complexity, analysis types, and mathematical notations, particularly focusing on Big-O notation and its limitations.

Uploaded by

pavanbeemeneni
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views191 pages

Algorithm Analysis and Complexity Basics

The document provides an overview of algorithm analysis, including concepts of time and space complexity, asymptotic notations, and characteristics of algorithms. It discusses the advantages and disadvantages of algorithms, different approaches to algorithm design, and methods for writing algorithms. Additionally, it covers algorithm complexity, analysis types, and mathematical notations, particularly focusing on Big-O notation and its limitations.

Uploaded by

pavanbeemeneni
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT-I

Introduction to Algorithm Analysis, Space and Time Complexity analysis, Asymptotic


Notations. Review of Binary Search Trees: Binary Search Tree – Insertion, Deletion &
Traversal ,AVL Trees – Creation, Insertion, Deletion operations and Applications
B-Trees – Creation, Insertion, Deletion operations and Applications

ALGORITHM

Algorithm is a step-by-step procedure, which defines a set of instructions to be executed


in a certain order to get the desired output. Algorithms are generally created independent
of underlying languages, i.e. an algorithm can be implemented in more than one
programming language.
From the data structure point of view, following are some important categories of
algorithms −
• Search − Algorithm to search an item in a data structure.
• Sort − Algorithm to sort items in a certain order.
• Insert − Algorithm to insert item in a data structure.
• Update − Algorithm to update an existing item in a data structure.
• Delete − Algorithm to delete an existing item from a data structure.
Characteristics of an Algorithm
Not all procedures can be called an algorithm. An algorithm should have the following
characteristics −
An algorithm should have the following characteristics −

• Clear and Unambiguous: Algorithm should be clear and unambiguous. Each of


its steps should be clear in all aspects and must lead to only one meaning.
• Well-Defined Inputs: If an algorithm says to take inputs, it should be well-
defined inputs.
• Well-Defined Outputs: The algorithm must clearly define what output will be
yielded and it should be well-defined as well.
• Finite-ness: The algorithm must be finite, i.e. it should not end up in an infinite
loops or similar.
• Feasible: The algorithm must be simple, generic and practical, such that it can be
executed upon will the available resources. It must not contain some future
technology, or anything.

• Language Independent: The Algorithm designed must be language-


independent, i.e. it must be just plain instructions that can be implemented in any
language, and yet the output will be same, as expected

ADSA UNIT-I, AITS, TPT JSR


1
Advantages and Disadvantages of Algorithm

Advantages of Algorithms:

• It is easy to understand.
• Algorithm is a step-wise representation of a solution to a given problem.
• In Algorithm the problem is broken down into smaller pieces or steps hence, it
is easier for the programmer to convert it into an actual program.

Disadvantages of Algorithms:

• Writing an algorithm takes a long time so it is time-consuming.


• Branching and Looping statements are difficult to show in Algorithms.
Different approach to design an algorithm
1. Top-Down Approach: A top-down approach starts with identifying major
components of system or program decomposing them into their lower level components
& iterating until desired level of module complexity is achieved . In this we start with
topmost module & incrementally add modules that is calls.

2. Bottom-Up Approach: A bottom-up approach starts with designing most basic or


primitive component & proceeds to higher level components. Starting from very bottom ,
operations that provide layer of abstraction are implemented
How to Write an Algorithm?
There are no well-defined standards for writing algorithms. Rather, it is problem and
resource dependent. Algorithms are never written to support a particular programming
code.
As we know that all programming languages share basic code constructs like loops (do,
for, while), flow-control (if-else), etc. These common constructs can be used to write an
algorithm.

We write algorithms in a step-by-step manner, but it is not always the case. Algorithm
writing is a process and is executed after the problem domain is well-defined. That is, we
should know the problem domain, for which we are designing a solution.
Example
Let's try to learn algorithm-writing by using an example.
Problem − Design an algorithm to add two numbers and display the result.
Step 1 − START
Step 2 − declare three integers a, b & c
Step 3 − define values of a & b

ADSA UNIT-I, AITS, TPT JSR


2
Step 4 − add values of a & b
Step 5 − store output of step 4 to c
Step 6 − print c
Step 7 − STOP
Algorithms tell the programmers how to code the program. Alternatively, the algorithm
can be written as −
Step 1 − START ADD
Step 2 − get values of a & b
Step 3 − c ← a + b
Step 4 − display c
Step 5 − STOP
In design and analysis of algorithms, usually the second method is used to describe an
algorithm. It makes it easy for the analyst to analyze the algorithm ignoring all unwanted
definitions. He can observe what operations are being used and how the process is flowing.
Writing step numbers, is optional.
We design an algorithm to get a solution of a given problem. A problem can be solved
in more than one ways.

Hence, many solution algorithms can be derived for a given problem. The next step is to
analyze those proposed solution algorithms and implement the best suitable solution

ALGORITHM COMPLEXITY
Suppose X is an algorithm and n is the size of input data, the time and space used by the
algorithm X are the two main factors, which decide the efficiency of X.
• Time Factor − Time is measured by counting the number of key operations such
as comparisons in the sorting algorithm.
• Space Factor − Space is measured by counting the maximum memory space
required by the algorithm.
The complexity of an algorithm f(n) gives the running time and/or the storage space
required by the algorithm in terms of n as the size of input data

ADSA UNIT-I, AITS, TPT JSR


3
Space Complexity
Space complexity of an algorithm represents the amount of memory space required by the
algorithm in its life cycle. The space required by an algorithm is equal to the sum of the
following two components −
• A fixed part that is a space required to store certain data and variables, that are
independent of the size of the problem. For example, simple variables and
constants used, program size, etc.
• A variable part is a space required by variables, whose size depends on the size of
the problem. For example, dynamic memory allocation, recursion stack space, etc.
Space complexity S(P) of any algorithm P is S(P) = C + SP(I), where C is the fixed part
and S(I) is the variable part of the algorithm, which depends on instance characteristic I.
Following is a simple example that tries to explain the concept −
Algorithm: SUM(A, B)
Step 1 - START
Step 2 - C ← A + B + 10
Step 3 - Stop
Here we have three variables A, B, and C and one constant. Hence S(P) = 1 + 3. Now,
space depends on data types of given variables and constant types and it will be multiplied
accordingly.
Time Complexity
Time complexity of an algorithm represents the amount of time required by the algorithm
to run to completion. Time requirements can be defined as a numerical function T(n),
where T(n) can be measured as the number of steps, provided each step consumes constant
time.
For example, addition of two n-bit integers takes n steps. Consequently, the total
computational time is T(n) = c ∗ n, where c is the time taken for the addition of two bits.
Here, we observe that T(n) grows linearly as the input size increases.

ALGORITHM ANALYSIS
Efficiency of an algorithm can be analyzed at two different stages, before implementation
and after implementation. They are the following –
• A Priori Analysis or Performance or Asymptotic Analysis − This is a theoretical
analysis of an algorithm. Efficiency of an algorithm is measured by assuming that
all other factors, for example, processor speed, are constant and have no effect on
the implementation.
• A Posterior Analysis or Performance Measurement − This is an empirical analysis
of an algorithm. The selected algorithm is implemented using programming language.
This is then executed on target computer machine. In this analysis, actual statistics
like running time and space required, are collected.

ADSA UNIT-I, AITS, TPT JSR


4
We shall learn about a priori algorithm analysis. Algorithm analysis deals with the execution
or running time of various operations involved. The running time of an operation can be
defined as the number of computer instructions executed per operation.
Analysis of an algorithm is required to determine the amount of resources such as time and
storage necessary to execute the algorithm. Usually, the efficiency or running time of an
algorithm is stated as a function which relates the input length to the time complexity or space
complexity.

Algorithm analysis framework involves finding out the time taken and the memory space
required by a program to execute the program. It also determines how the input size of a
program influences the running time of the program.

In theoretical analysis of algorithms, it is common to estimate their complexity in the


asymptotic sense, i.e., to estimate the complexity function for arbitrarily large input. Big-
O notation, Omega notation, and Theta notation are used to estimate the complexity
function for large arbitrary input.
Types of Analysis
The efficiency of some algorithms may vary for inputs of the same size. For such algorithms,
we need to differentiate between the worst case, average case and best case efficiencies.
Best Case Analysis
If an algorithm takes the least amount of time to execute a specific set of input,then it is
called the best case time complexity. The best case efficiency of an algorithm is the efficiency
for the best case input of size n. Because of this input, the algorithm runs the fastest among
all the possible inputs of the same size.
Average Case Analysis
If the time complexity of an algorithm for certain sets of inputs are on an average, then such
a time complexity is called average case time complexity.
Average case analysis provides necessary information about an algorithm’s behavior on a
typical or random input. You must make some assumption about the possible inputs of size n
to analyze the average case efficiency of algorithm.

Worst Case Analysis


If an algorithm takes maximum amount of time to execute for a specific set of input, then
it is called the worst case time complexity. The worst case efficiency of an algorithm is the
efficiency for the worst case input of size n. The algorithm runs the longest among all the
possible inputs of the similar size because of this input of size n.

ADSA UNIT-I, AITS, TPT JSR


5
MATHEMATICAL NOTATION

Algorithms are widely used in various areas of study. We can solve different problems
using the same algorithm. Therefore, all algorithms must follow a standard. The
mathematical notations use symbols or symbolic expressions, which have a precise
semantic meaning.
Asymptotic Notations
A problem may have various algorithmic solutions. In order to choose the best algorithm
for a particular process, you must be able to judge the time taken to run a particular solution.
More accurately, you must be able to judge the time taken to run two solutions, and choose
the better among the two.
To select the best algorithm, it is necessary to check the efficiency of each algorithm. The
efficiency of each algorithm can be checked by computing its time complexity. The
asymptotic notations help to represent the time complexity in a shorthand way. It can
generally be represented as the fastest possible, slowest possible or average possible.

The notations such as O (Big-O), Ώ (Omega), and θ (Theta) are called as asymptotic
notations. These are the mathematical notations that are used in three different cases of
time complexity.
Big-O Notation
‘O’ is the representation for Big-O notation. Big -O is the method used to express the upper
bound of the running time of an algorithm. It is used to describe the performance or time
complexity of the algorithm. Big-O specifically describes the worst-case scenario and can
be used to describe the execution time required or the space used by the algorithm.

Table 2.1 gives some names and examples of the common orders used to
describe functions. These orders are ranked from top to bottom.

Table 2.1: Common Orders


Time complexity Examples
1 O(1) Constant Adding to the front of a linked list
2 O(log n) Logarithmic Finding an entry in a sorted array
3 O(n) Linear Finding an entry in an unsorted array
4 O(n log n) Linearithmic Sorting ‘n’ items by ‘divide-and-conquer’
5 O(n2) Quadratic Shortest path between two nodes in a graph
6 O(n3) Cubic Simultaneous linear equations
7 O(2n) Exponential The Towers of Hanoi problem

ADSA UNIT-I, AITS, TPT JSR


6
Big-O notation is generally used to express an ordering property among the functions. This
notation helps in calculating the maximum amount of time taken by an algorithm to compute
a problem. Big-O is defined as:

f(n) ≤ c ∗ g(n)

where, n can be any number of inputs or outputs and f(n) as well as g(n) are two non-negative
functions. These functions are true only if there is a constant c and a non-negative integer n0
such that,
n ≥ n0.

The Big-O can also be denoted as f(n) = O(g(n)), where f(n) and g(n) are two non -negative
functions and f(n) < g(n) if g(n) is multiple of some constant c. The graphical representation
of f(n) = O(g(n)) is shown in figure 2.1, where the running time increases considerably when
n increases.

Example: Consider f(n)=15n3+40n2+2nlog n+2n. As the value of n increases, n3 becomes


much larger than n2, nlog n, and n. Hence, it dominates the function f(n) and we can consider
the running time to grow by the order of n3. Therefore, it can be written as f(n)=O(n3).

The values of n for f(n) and C* g(n) will not be less than n0. Therefore, the
values less than n0 are not considered relevant.

Figure 1.8: Big-O Notation f(n) = O(g(n))


Let us take an example to understand the Big-O notation more clearly.
Example:
Consider function f(n) = 2(n)+2 and g(n) = n2.

We need to find the constant c such that f(n) ≤ c ∗ g(n).


Let n = 1, then
f(n) = 2(n)+2 = 2(1)+2 = 4
g(n) = n2 = 12 = 1
Here, f(n)>g(n)

ADSA UNIT-I, AITS, TPT JSR


7
Let n = 2, then
f(n) = 2(n)+2 = 2(2)+2 = 6
g(n) = n2 = 22 = 4
Here, f(n)>g(n)
Let n = 3, then
f(n) = 2(n)+2 = 2(3)+2 = 8
g(n) = n2 = 32 = 9
Here, f(n)<g(n)

Thus, when n is greater than 2, we get f(n)<g(n). In other words, as n becomes larger,
the running time increases considerably. This concludes that the Big-O helps to
determine the ‘upper bound’ of the algorithm’s run-time.
Limitations of Big O Notation
There are certain limitations with the Big O notation of expressing the complexity of
algorithms. These limitations are as follows:
• Many algorithms are simply too hard to analyse mathematically.
• There may not be sufficient information to calculate the behaviour of the
algorithm in the average case.
• Big O analysis only tells us how the algorithm grows with the size of the problem,
not how efficient it is, as it does not consider the programming effort.
It ignores important constants. For example, if one algorithm takes O(n2 ) time to execute and the other
takes O(100000n2 ) time to execute, then as per Big O, both algorithm have equal time complexity. In
real-time systems, this may be a serious consideration
Omega Notation
‘Ω’ is the representation for Omega notation. Omega describes the manner in which an
algorithm performs in the best case time complexity. This notation provides the minimum
amount of time taken by an algorithm to compute a problem. Thus, it is considered that
omega gives the "lower bound" of the algorithm's run-time. Omega is defined as:

f(n) ≥ c ∗ g(n)

Where, n is any number of inputs or outputs and f(n) and g(n) are two non-negative
functions. These functions are true only if there is a constant c and a non-negative integer
n0 such that n>n0.

Omega can also be denoted as f(n) = Ώ (g(n)) where, f of n is equal to Omega of g of n .


The graphical representation of f(n) = Ώ (g(n)) is shown in figure 2.2. The function f(n) is
said to be in Ώ (g(n)), if f(n) is bounded below by some constant multiple of g(n) for all
large values of n, i.e., if there exists some positive constant c and some non-negative integer
n0, such that f(n) ≥ c ∗ g(n) for all n ≥n0.

ADSA UNIT-I, AITS, TPT JSR


8
Figure 1.9 Omega Notation f(n) = Ώ (g(n))

Let us take an example to understand the Omega notation more clearly.

Example:
Consider function f(n) = 2n2+5 and g(n) = 7n.
We need to find the constant c such that f(n) ≥ c ∗ g(n).

Let n = 0, then
f(n) = 2n2+5 = 2(0)2+5 = 5
g(n) = 7(n) = 7(0) = 0
Here, f(n)>g(n)
Let n = 1, then

f(n) = 2n2+5 = 2(1)2+5 = 7


g(n) = 7(n) = 7(1) = 7
Here, f(n)=g(n)
Let n = 2, then
f(n) = 2n2+5 = 2(2)2+5 = 13

g(n) = 7(n) = 7(2) = 14


Here, f(n)<g(n)

Thus, for n=1, we get f(n) ≥ c ∗ g(n). This concludes that Omega helps to determine the "lower bound"
of the algorithm's run-time.
Theta Notation
'θ' is the representation for Theta notation. Theta notation is used when the upper
bound and lower bound of an algorithm are in the same order of magnitude.

ADSA UNIT-I, AITS, TPT JSR


9
Theta can be defined as:
c1 ∗ g(n) ≤ f(n) ≤ c2 ∗ g(n) for all n>n0

Where, n is any number of inputs or outputs and f(n) and g(n) are two
non- negative functions. These functions are true only if there are two
constants namely, c1, c2, and a non-negative integer n0.

Theta can also be denoted as f(n) = θ(g(n)) where, f of n is equal to Theta


of g of n. The graphical representation of f(n) = θ(g(n)) is shown in figure
2.3. The function f(n) is said to be in θ (g(n)) if f(n) is bounded both above
and below by some positive constant multiples of g(n) for all large values
of n, i.e., if there exists some positive constant c1 and c2 and some non-
negative integer n0, such that C2g(n)≤f(n)≤ C1g(n) for all n≥n0.
Figure shows Theta notation.

Figure 1.10: Theta Notation f(n) = θ(g(n))


Let us take an example to understand the Theta notation more clearly.

Example: Consider function f(n) = 4n + 3 and g(n) = 4n for all n ≥ 3; and f(n) = 4n
+ 3 and g(n) = 5n for all n ≥ 3.

Then the result of the function will be:


Let n = 3
f(n) = 4n + 3 = 4(3)+3 = 15
g(n) = 4n =4(3) = 12 and

f(n) = 4n + 3 = 4(3)+3 = 15

ADSA UNIT-I, AITS, TPT JSR


10
g(n) = 5n =5(3) = 15 and
here, c1 is 4, c2 is 5 and n0 is 3
Thus, from the above equation we get c1 g(n) f(n) c2 g(n). This concludes that
Theta notation depicts the running time between the upper bound and lower
bound.

BINARY SEARCH TREE

Binary Search Tree is a data structure used in computer science for organizing and storing data in
a sorted manner. Binary search tree follows all properties of binary tree and for every nodes,
its left subtree contains values less than the node and the right subtree contains values greater than
the node. This hierarchical structure allows for efficient Searching, Insertion,
and Deletion operations on the data stored in the tree.

Properties of Binary Search Tree:


• The left subtree of a node contains only nodes with keys less than the node's key.
• The right subtree of a node contains only nodes with keys greater than the node's key.
• The left and right subtree each must also be a binary search tree.
• There must be no duplicate nodes (BST may have duplicate values with different handling
approaches).

Fig: Binary Search Tree


Important Points about BST

• A Binary Search Tree is useful for maintaining sorted stream of data. It allows search, insert,

ADSA UNIT-I, AITS, TPT JSR


11
delete, ceiling, max and min in O(h) time. Along with these, we can always traverse the tree
items in sorted order.
• With Self Balancing BSTs, we can ensure that the height of the BST is bound be Log n.
Hence we achieve, the above mentioned O(h) operations in O(Log n) time.
• When we need only search, insert and delete and do not need other operations, we
prefer Hash Table over BST as a Hash Table supports these operations in O(1) time on
average.

Applications of BST

Binary Search Tree (BST) is a data structure that is commonly used to implement efficient
searching, insertion, and deletion operations along with maintaining sorted sequence of data. The
following properties are:
• The left subtree of a node contains only nodes with keys lesser than the node's key.
• The right subtree of a node contains only nodes with keys greater than the node's key.
• The left and right subtree each must also be a binary search tree. There must be no duplicate
nodes.

Binary Search Tree - Structure


A BST supports operations like search, insert, delete, maximum, minimum, floor, ceil, greater,
smaller, etc in O(h) time where h is height of the BST. To keep height less, self balancing BSTs
(like AVL and Red Black Trees) are used in practice. These Self-Balancing BSTs maintain the
height as O(Log n). Therefore all of the above mentioned operations become O(Log n). Together
with these, BST also allows sorted order traversal of data in O(n) time.
1. A Self-Balancing Binary Search Tree is used to maintain sorted stream of data. For example,
suppose we are getting online orders placed and we want to maintain the live data (in RAM)
in sorted order of prices. For example, we wish to know number of items purchased at cost
below a given cost at any moment. Or we wish to know number of items purchased at higher

ADSA UNIT-I, AITS, TPT JSR


12
cost than given cost.
2. A Self-Balancing Binary Search Tree is used to implement doubly ended priority queue.
With a Binary Heap, we can either implement a priority queue with support of extractMin()
or with extractMax(). If we wish to support both the operations, we use a Self-Balancing
Binary Search Tree to do both in O(Log n)
3. There are many more algorithm problems where a Self-Balancing BST is the best suited data
structure, like count smaller elements on right, Smallest Greater Element on Right Side, etc.
4. A BST can be used to sort a large dataset. By inserting the elements of the dataset into a
BST and then performing an in-order traversal, the elements will be returned in sorted order.
When compared to normal sorting algorithms, the advantage here is, we can later insert /
delete items in O(Log n) time.
5. Variations of BST like B Tree and B+ Tree are used in Database indexing.
6. TreeMap and TreeSet in Java, and set and map in C++ are internally implemented using
self-balancing BSTs, more formally a Red-Black Tree.

Advantages of Binary Search Tree (BST):

• Efficient searching: O(log n) time complexity for searching with a self balancing BST
• Ordered structure: Elements are stored in sorted order, making it easy to find the next or
previous element
• Dynamic insertion and deletion: Elements can be added or removed efficiently
• Balanced structure: Balanced BSTs maintain a logarithmic height, ensuring efficient
operations
• Doubly Ended Priority Queue: In BSTs, we can maintain both maximum and minimum
efficiently

Disadvantages of Binary Search Tree (BST):

• Not self-balancing: Unbalanced BSTs can lead to poor performance


• Worst-case time complexity: In the worst case, BSTs can have a linear time complexity
for searching and insertion
• Memory overhead: BSTs require additional memory to store pointers to child nodes
• Not suitable for large datasets: BSTs can become inefficient for very large datasets
• Limited functionality: BSTs only support searching, insertion, and deletion operations

Insertion in Binary Search Tree (BST)

Definition:
Insertion in a Binary Search Tree is the process of adding a new node at the correct position so that
the BST properties are maintained.
BST Property:
• Left subtree contains values less than the node.
• Right subtree contains values greater than the node.

ADSA UNIT-I, AITS, TPT JSR


13
How it works:

• Start at the root.


• Recursively compare the key to be inserted with the current node.
• If the key is smaller, go to the left subtree; if greater, go to the right.
• When the correct NULL position is found, insert the node.

Syntax:
//Insert a node
struct Node* insert(struct Node* root, int key)
{
if (root == NULL)
return createNode(key);
if (key < root->key)
root->left = insert(root->left, key);
else if (key > root->key)
root->right = insert(root->right, key);
return root;
}

Given a BST, the task is to insert a new node in this BST.


Example:

How to Insert a value in a Binary Search Tree:


A new key is always inserted at the leaf by maintaining the property of the binary search tree. We
start searching for a key from the root until we hit a leaf node. Once a leaf node is found, the new
node is added as a child of the leaf node. The below steps are followed while we try to insert a
node into a binary search tree:
• Initilize the current node (say, currNode or node) with root node
• Compare the key with the current node.
• Move left if the key is less than or equal to the current node value.
• Move right if the key is greater than current node value.
• Repeat steps 2 and 3 until you reach a leaf node.
• Attach the new key as a left or right child based on the comparison with the leaf node's

ADSA UNIT-I, AITS, TPT JSR


14
value.

ADSA UNIT-I, AITS, TPT JSR


15
Deletion in Binary Search Tree (BST)

Definition:
Deletion in a BST means removing a node from the tree while maintaining the BST properties.
Three cases when deleting a node:
1. Leaf Node (no children): Just remove the node.
2. One Child: Replace the node with its child.
3. Two Children:
o Find the in-order successor (smallest in right subtree) or in-order predecessor
(largest in left subtree).
o Replace the node’s value with it.
o Delete the successor/predecessor node.
Syntax:
// Delete a node
struct Node* deleteNode(struct Node* root, int key) {
if (root == NULL)
return root;

if (key < root->key)


root->left = deleteNode(root->left, key);
else if (key > root->key)
root->right = deleteNode(root->right, key);
else {
// Node with one child or no child
if (root->left == NULL) {
struct Node* temp = root->right;
free(root);
return temp;
}
else if (root->right == NULL) {
struct Node* temp = root->left;
free(root);

ADSA UNIT-I, AITS, TPT JSR


16
return temp;
}

// Node with two children


struct Node* temp = findMin(root->right);
root->key = temp->key;
root->right = deleteNode(root->right, temp->key);
}
return root;
}

A BST, The task is to delete a node in this BST, which can be broken down into 3 scenarios:
Case 1. Delete a Leaf Node in BST

Case 2. Delete a Node with Single Child in BST

Deleting a single child node is also simple in BST. Copy the child to the node and delete the
node.

ADSA UNIT-I, AITS, TPT JSR


17
ADSA UNIT-I, AITS, TPT JSR
18
Case 3. Delete a Node with Both Children in BST

Deleting a node with both children is not so simple. Here we have to delete the node is such a way,
that the resulting tree follows the properties of a BST.
The trick is to find the inorder successor of the node. Copy contents of the inorder successor to the
node, and delete the inorder successor.

ADSA UNIT-I, AITS, TPT JSR


19
ADSA UNIT-I, AITS, TPT JSR
20
Note: Inorder successor is needed only when the right child is not empty. In this particular case, the
inorder successor can be obtained by finding the minimum value in the right child of the node.

Searching in Binary Search Tree

Given a BST, the task is to search a node in this BST. For searching a value in BST, consider it as
a sorted array. Now we can easily perform search operation in BST using Binary Search Algorithm.
Definition:
Search in a BST is used to find whether a given key exists in the tree.
How it works:
• Start at the root.
• Compare the key with the current node:
o If equal → found.
o If smaller → search left subtree.
o If larger → search right subtree.
• If NULL is reached, the key does not exist.
Syntax:
// Search a node
struct Node* search(struct Node* root, int key) {
if (root == NULL || root->key == key)
return root;
if (key < root->key)
return search(root->left, key);
else
return search(root->right, key);
}

Input: Root of the below BST

Output: True
Explanation: 8 is present in the BST as right child of root

Input: Root of the below BST

ADSA UNIT-I, AITS, TPT JSR


21
Output: False
Explanation: 14 is not present in the BST
Example:

ADSA UNIT-I, AITS, TPT JSR


22
AVL Tree
An AVL tree data structure is a self-balancing binary search tree (BST) that maintains balance to
ensure efficient operations. The full form of AVL tree is Adelson-Velsky and Landis. It is named

ADSA UNIT-I, AITS, TPT JSR


23
after its inventors.
The AVL tree in data structure ensures that the heights of subtrees differ by at most one.
This balance factor helps keep the tree height logarithmic, providing O(log n) time complexity for
search, insertion, and deletion operations.

What is AVL Tree?


An AVL tree is a type of binary search tree used in computer science. It is special because it keeps
itself balanced. This means it makes sure that the tree doesn't get too tall and skinny, which would
make finding things slow.
In an AVL data structure, each node has a balance factor. This balance factor is the difference height
between the left and right subtrees. The balance factor can only be -1, 0, or 1. If the balance factor
gets outside this range, the tree does a rotation to fix itself. This keeps the tree short and wide, making
it faster to search, insert, and delete items.

The importance of AVL trees in data structures is that they ensure operations like search, insert, and
delete are always fast, even if we do them many times. This makes AVL trees very useful in
applications like databases and memory management, where we need to quickly find or update
information.

Properties of AVL Tree


The following are the key properties of AVL tree in data structure, explained in simple terms:
• Balanced Tree
An AVL tree is always balanced. This means that the heights of the left and right subtrees of any
node differ by at most one.
• Height-Balanced Property
The difference in height (also called the balance factor) between the left and right subtrees of any
node is -1, 0, or 1.
• Self-Balancing
An AVL tree automatically performs rotations to maintain balance after every insertion and deletion.
• Binary Search Tree Properties
An AVL tree follows the properties of a binary search tree (BST). This means that for any node,

ADSA UNIT-I, AITS, TPT JSR


24
the left subtree contains only nodes with values less than the node's value, and the right subtree
contains only nodes with values greater than the node's value.
• Rotations
To maintain balance, an AVL tree uses four types of rotations: left rotation, right rotation, left-right
rotation, and right-left rotation.

Balance Factor of AVL Tree


The AVL tree balance factor is a key concept used to maintain the tree's balanced property. In AVL
trees, the heights of the two child subtrees of any node differ by no more than one. This balance is
maintained through rotations during insertion and deletion operations.
The balancing factor (BF) of a node in an AVL tree is defined as the height difference between its
left and right subtrees.
For a node N, the balancing factor is calculated as:
BF(N) = height (left subtree) − height (right subtree)
Properties:
The balancing factor can be -1, 0, or 1 for a balanced AVL tree.
• BF = -1: The right subtree is one level higher than the left subtree.
• BF = 0: The left and right subtrees are of equal height.
• BF = 1: The left subtree is one level higher than the right subtree.
Balancing Factor Calculation:
• For each node, calculate the heights of its left and right subtrees.
• Subtract the height of the right subtree from the height of the left subtree to get the balancing
factor.

Example:
Consider the following AVL tree:

Balancing Factors:
• For node 10: Left height = 0, Right height = 0, BF = 0 - 0 = 0
• For node 20: Left height = 1, Right height = 0, BF = 1 - 0 = 1
• For node 50: Left height = 0, Right height = 0, BF = 0 - 0 = 0
• For node 40: Left height = 0, Right height = 1, BF = 0 - 1 = -1

ADSA UNIT-I, AITS, TPT JSR


25
• For node 30: Left height = 2, Right height = 2, BF = 2 - 2 = 0
This tree is balanced as all nodes have a balancing factor of -1, 0, or 1.

Operations on AVL Tree Data Structure


1. Insertion
Insertion in an AVL tree involves adding a new key while ensuring the tree remains balanced.
After inserting a new node, the balance factor of each node is checked, and rotations are performed
if necessary to maintain the AVL property.
Algorithm:
• Perform a standard BST insertion.
• Update the height of the current node.
• Calculate the balance factor of the current node.
• Perform rotations if the node becomes unbalanced.
Syntax: // Insertion
struct Node* insert(struct Node* node, int key) {
if (node == NULL)
return newNode(key);

if (key < node->key)


node->left = insert(node->left, key);
else if (key > node->key)
node->right = insert(node->right, key);
else
return node; // no duplicates

node->height = 1 + max(height(node->left), height(node->right));

int balance = getBalance(node);

Example:
Insert keys 10, 20, 30 into an AVL tree.

• Insert 10:

• Insert 20:

ADSA UNIT-I, AITS, TPT JSR


26
• Insert 30 (requires left rotation):
Before rotation:

After rotation:

2. Deletion
Deletion in an AVL tree involves removing a node and then ensuring the tree remains balanced.
After deleting a node, the balance factor of each node is checked, and rotations are performed if
necessary to maintain the AVL property.
Algorithm:
• Perform a standard BST deletion.
• Update the height of the current node.
• Calculate the balance factor of the current node.
• Perform rotations if the node becomes unbalanced.
Syntax: // Deletion

ADSA UNIT-I, AITS, TPT JSR


27
struct Node* deleteNode(struct Node* root, int key) {
if (root == NULL)
return root
if (key < root->key)
root->left = deleteNode(root->left, key);
else if (key > root->key)
root->right = deleteNode(root->right, key);
else {
if ((root->left == NULL) || (root->right == NULL)) {
struct Node *temp = root->left ? root->left : root->right;
if (temp == NULL) {
temp = root;
root = NULL;
} else
*root = *temp;

free(temp);
} else {
struct Node* temp = minValueNode(root->right);
root->key = temp->key;
root->right = deleteNode(root->right, temp->key);
}
}
if (root == NULL)
return root;
root->height = 1 + max(height(root->left), height(root->right));
int balance = getBalance(root);

Example:
Delete key 20 from the AVL tree:

ADSA UNIT-I, AITS, TPT JSR


28
3. Searching
Searching in an AVL tree is similar to searching in a binary search tree (BST). Since AVL trees
are balanced, searching is efficient with a time complexity of O(log n).
Algorithm:
• Start from the root node.
• Compare the target key with the key of the current node.
• If the target key is equal to the current node's key, return the node.
• If the target key is less than the current node's key, move to the left child.
• If the target key is greater than the current node's key, move to the right child.
• Repeat steps 2-5 until the target key is found or the leaf node is reached.
Syntax: // Search
struct Node* search(struct Node* root, int key) {
if (root == NULL || root->key == key)
return root;

if (key < root->key)


return search(root->left, key);
else
return search(root->right, key);
}
Example:
Search for key 20 in the AVL tree:

Result: Node with key 20 found.

AVL Tree Rotations


Rotations in AVL tree are operations performed to maintain the balance of the tree after insertions
or deletions.
There are four types of rotations: Left Rotation, Right Rotation, Left-Right Rotation, and Right-

ADSA UNIT-I, AITS, TPT JSR


29
Left Rotation. Each rotation helps to rebalance the tree when a node becomes unbalanced.
1. Left Rotation
A left rotation is performed when a node's right subtree is heavier than its left subtree (balance
factor < -1). This rotation shifts the balance to the left.
Algorithm:
• Let the unbalanced node be x.
• Let y be the right child of x.
• Perform the rotation by making y the new root of the subtree.
• The left child of y becomes the right child of x.
Example and Diagram:

Before rotation:

After left rotation:

ADSA UNIT-I, AITS, TPT JSR


30
2. Right Rotation
A right rotation is performed when a node's left subtree is heavier than its right subtree (balance
factor > 1). This rotation shifts the balance to the right.
Algorithm:
• Let the unbalanced node be x.
• Let y be the left child of x.
• Perform the rotation by making y the new root of the subtree.
• The right child of y becomes the left child of x.
Example and Diagram:

Before rotation:

After right rotation:

3. Left-Right (LR) Rotation


A left-right or LR rotation in AVL tree is performed when a node's left subtree has a right-heavy

ADSA UNIT-I, AITS, TPT JSR


31
child, causing the balance factor of the left subtree to be less than zero. This rotation is a
combination of a left rotation followed by a right rotation.
Algorithm:
• Perform a left rotation on the left child.
• Perform a right rotation on the original unbalanced node.
Example and Diagram:

Before rotation:

After left-right rotation:

4. Right-Left (RL) Rotation


A right-left or RL rotation in AVL tree is performed when a node's right subtree has a left-heavy
child, causing the balance factor of the right subtree to be greater than zero. This rotation is a
combination of a right rotation followed by a left rotation.
Algorithm:

ADSA UNIT-I, AITS, TPT JSR


32
• Perform a right rotation on the right child.
• Perform a left rotation on the original unbalanced node.
Example and Diagram:

Before rotation:

After right-left rotation:

AVL Tree Traversal Methods


Tree traversal is the process of visiting all the nodes in a tree in a specific order. The main
traversal methods for AVL trees, as for any binary search tree, are in-order, pre-order, and post-
order traversal.
1. In-order Traversal

ADSA UNIT-I, AITS, TPT JSR


33
Algorithm:
• Traverse the left subtree in in-order.
• Visit the root node.
• Traverse the right subtree in in-order.
Example:
Consider the following AVL tree:

In-order Traversal Steps:


• Traverse left subtree of 20: 10
• Visit root: 20
• Traverse right subtree of 20:
• Traverse left subtree of 30: 25
• Visit root: 30
• Traverse right subtree of 30: 40
In-order Traversal Result: 10, 20, 25, 30, 40
2. Pre-order Traversal
Algorithm:
• Visit the root node.
• Traverse the left subtree in pre-order.
• Traverse the right subtree in pre-order.
Example:
Consider the following AVL tree:

ADSA UNIT-I, AITS, TPT JSR


34
Pre-order Traversal Steps:
• Visit root: 20
• Traverse left subtree of 20: 10
• Traverse right subtree of 20:
• Visit root: 30
• Traverse left subtree of 30: 25
• Traverse right subtree of 30: 40
Pre-order Traversal Result: 20, 10, 30, 25, 40
3. Post-order Traversal
Algorithm:
• Traverse the left subtree in post-order.
• Traverse the right subtree in post-order.
• Visit the root node.
Example:
Consider the following AVL tree:

Post-order Traversal Steps:


• Traverse left subtree of 20: 10
• Traverse right subtree of 20:
• Traverse left subtree of 30: 25
• Traverse right subtree of 30: 40
• Visit root: 30
• Visit root: 20
Post-order Traversal Result: 10, 25, 40, 30, 20

ADSA UNIT-I, AITS, TPT JSR


35
Illustration of Insertion at AVL Tree:

ADSA UNIT-I, AITS, TPT JSR


36
ADSA UNIT-I, AITS, TPT JSR
37
ADSA UNIT-I, AITS, TPT JSR
38
ADSA UNIT-I, AITS, TPT JSR
39
Applications of AVL Tree:
1. AVL Tree is used as a first example self-balancing BST in teaching DSA as it is easier to
understand and implement compared to Red Black
2. Applications, where insertions and deletions are less common but frequent data lookups along with
other operations of BST like sorted traversal, floor, ceil, min and max.
3. Red Black tree is more commonly implemented in language libraries like map in C++, set in
C++, TreeMap in Java and TreeSet in Java.
4. AVL Trees can be used in a real time environment where predictable and consistent performance is
required.

Advantages of AVL Tree:


1. AVL trees can self-balance themselves and therefore provides time complexity as O(log
n) for search, insert and delete.
2. As it is a balanced BST, so items can be traversed in sorted order.
3. Since the balancing rules are strict compared to Red Black Tree, AVL trees in general have
relatively less height and hence the search is faster.
4. AVL tree is relatively less complex to understand and implement compared to Red Black Trees.

Disadvantages of AVL Tree:


1. It is difficult to implement compared to normal BST.
2. Less used compared to Red-Black trees. Due to its rather strict balance.
3. AVL trees provide complicated insertion and removal operations as more rotations are performed.

B Trees: B trees are extended binary search trees that are specialized in m-way searching, since the order of B trees
is 'm'. Order of a tree is defined as the maximum number of children a node can accommodate. Therefore, the height
of a b tree is relatively smaller than the height of AVL tree and RB tree.
They are general form of a Binary Search Tree as it holds more than one key and two children.
The various properties of B trees include –
• Every node in a B Tree will hold a maximum of m children and (m-1) keys, since the order of the
tree is m.

ADSA UNIT-I, AITS, TPT JSR


40
• Every node in a B tree, except root and leaf, can hold at least m/2 children
• The root node must have no less than two children.
• All the paths in a B tree must end at the same level, i.e. the leaf nodes must be at the same level.
• A B tree always maintains sorted data.
A B-Tree is a specialized m-way tree designed to optimize data access, especially on disk-based storage
systems.
• In a B-Tree of order m, each node can have up to m children and m-1 keys, allowing it to
efficiently manage large datasets.
• The value of m is decided based on disk block and key sizes.
• One of the standout features of a B-Tree is its ability to store a significant number of keys
within a single node, including large key values. It significantly reduces the tree’s height,
hence reducing costly disk operations.
• B Trees allow faster data retrieval and updates, making them an ideal choice for systems
requiring efficient and scalable data management. By maintaining a balanced structure at
all times,
• B-Trees deliver consistent and efficient performance for critical operations such as search,
insertion, and deletion.
Following is an example of a B-Tree of order 5 .

Properties of a B-Tree
A B Tree of order m can be defined as an m-way search tree which satisfies the following properties:
1. All leaf nodes of a B tree are at the same level, i.e. they have the same depth (height of the
tree).
2. The keys of each node of a B tree (in case of multiple keys), should be stored in the
ascending order.
3. In a B tree, all non-leaf nodes (except root node) should have at least m/2 children.
4. All nodes (except root node) should have at least m/2 - 1 keys.
5. If the root node is a leaf node (only node in the tree), then it will have no children and will
have at least one key. If the root node is a non-leaf node, then it will have at least 2
children and at least one key.
6. A non-leaf node with n-1 key values should have n non NULL children.
We can see in the above diagram that all the leaf nodes are at the same level and all non-leaf nodes have
no empty sub-tree and have number of keys one less than the number of their children.
Interesting Facts about B-Tree
• The minimum height of the B-Tree that can exist with n number of nodes and m is the
maximum number of children of a node can have is: hmin=⌈log⁡m(n+1)⌉−1 hmin
=⌈logm(n+1)⌉−1
• The maximum height of the B-Tree that can exist with n number of nodes and t is the

ADSA UNIT-I, AITS, TPT JSR


41
minimum number of children that a non-root node can have
is: hmax=⌊log⁡tn+12⌋ hmax=⌊logt2n+1⌋ and t=⌈m2⌉t=⌈2m⌉
Need of a B-Tree
The B-Tree data structure is essential for several reasons:
• Improved Performance Over M-way Trees:
While M-way trees can be either balanced or skewed, B-Trees are always self-balanced.
This self-balancing property ensures fewer levels in the tree, significantly reducing access
time compared to M-way trees. This makes B-Trees particularly suitable for external
storage systems where faster data retrieval is crucial.
• Optimized for Large Datasets:
B-Trees are designed to handle millions of records efficiently. Their reduced height and
balanced structure enable faster sequential access to data and simplify operations like
insertion and deletion. This ensures efficient management of large datasets while
maintaining an ordered structure.
Operations on B-Tree

Note: "n" is the total number of elements in the B-tree


Search Operation in B-Tree
Search is similar to the search in Binary Search Tree. Let the key to be searched is k.
• Start from the root and recursively traverse down.
• For every visited non-leaf node
o If the current node contains k, return the node.
o Otherwise, determine the appropriate child to traverse. This is the child just before
the first key greater than k.
• If we reach a leaf node and don't find k in the leaf node, then return NULL.
Searching a B-Tree is similar to searching a binary tree. The algorithm is similar and goes with recursion.
At each level, the search is optimized as if the key value is not present in the range of the parent then the
key is present in another branch. As these values limit the search they are also known as limiting values
or separation values. If we reach a leaf node and don’t find the desired key then it will display NULL.
// Search a key
BTreeNode *search(BTreeNode *root, int k) {
int i = 0;
while (i < root->n && k > root->keys[i]) i++;
if (i < root->n && root->keys[i] == k) return root;
if (root->leaf) return NULL;
return search(root->children[i], k);
}

ADSA UNIT-I, AITS, TPT JSR


42
Input: Search 120 in the given B-Tree.

ADSA UNIT-I, AITS, TPT JSR


43
Insert Operation in B-Tree

The insert() operation in a B-Tree. A new key is always inserted into a leaf node. To insert a key k,
we start from the root and traverse down the tree until we reach the appropriate leaf node. Once there, the
key is added to the leaf.
Unlike Binary Search Trees (BSTs), nodes in a B-Tree have a predefined range for the number of keys
they can hold. Therefore, before inserting a key, we ensure the node has enough space. If the node is full,
an operation called splitChild() is performed to create space by splitting the node.

Insertion Operation
To insert a new key, we go down from root to leaf. Before traversing down to a node, we first check if the
node is full. If the node is full, we split it to create space. Following is the complete algorithm.
Insertion Algorithm
1: procedure B-Tree-Insert (Node x, Key k)
2: find i such that x:keys[i] > k or i >=numkeys(x)
3: if x is a leaf then
4: Insert k into [Link] at i
5: else
6: if x:child[i] is full then
7: Split x:child[i]
8: if k > x:key[i] then
9: i = i + 1
10: end if
11: end if
12: B-Tree-Insert(x:child[i]; k)

ADSA UNIT-I, AITS, TPT JSR


44
13: end if
14: end procedure
• The algorithm starts with a node x and a key k to insert.
• Find the position i in the node x where k should be inserted:
o Locate the first key in x greater than k, or move to the end if no such key exists.
• If x is a leaf node, directly insert k at position i in sorted order.
• If x is not a leaf node, proceed to the child node at position i:
o Check if the child node is full (has the maximum number of keys allowed).
o If the child is full:
o Split the child node into two nodes.
o Move the middle key of the child node up to the parent node (x).
o Adjust the position i if k is greater than the promoted key.
• Recursively call the B-Tree-Insert procedure on the appropriate child node to continue the
insertion.
• The process ends when k is successfully inserted into a leaf node, ensuring the tree remains
balanced and within its key limit.

Example

Delete Operation in B-Tree


A B Tree is a type of data structure commonly known as a Balanced Tree that stores multiple data items
very easily. B Trees are one of the most useful data structures that provide ordered access to the data in
the database. The delete operation in the B-Tree. B-Trees are self-balancing trees.

ADSA UNIT-I, AITS, TPT JSR


45
Deletion Process in B-Trees
Deletion from a B-tree is more complicated than insertion because we can delete a key from any node-not
just a leaf—and when we delete a key from an internal node, we will have to rearrange the node’s children.
As in insertion, we must make sure the deletion doesn't violate the B-tree properties. Just as we had to
ensure that a node didn't get too big due to insertion, we must ensure that a node doesn't get too small
during deletion (except that the root is allowed to have fewer than the minimum number t-1 of keys). Just
as a simple insertion algorithm might have to back up if a node on the path to where the key was to be
inserted was full, a simple approach to deletion might have to back up if a node (other than the root) along
the path to where the key is to be deleted has the minimum number of keys.

The deletion procedure deletes the key k from the subtree rooted at x. This procedure guarantees that
whenever it calls itself recursively on a node x, the number of keys in x is at least the minimum degree t.
Note that this condition requires one more key than the minimum required by the usual B-tree conditions,
so sometimes a key may have to be moved into a child node before recursion descends to that child. This
strengthened condition allows us to delete a key from the tree in one downward pass without having to
"back up" (with one exception, which we’ll explain). You should interpret the following specification for
deletion from a B-tree with the understanding that if the root node x ever becomes an internal node having
no keys (this situation can occur in cases 2c and 3b then we delete x, and x’s only child x.c1 becomes the
new root of the tree, decreasing the height of the tree by one and preserving the property that the root of
the tree contains at least one key (unless the tree is empty).
Various Cases of Deletion
Case 1: If the key k is in node x and x is a leaf, delete the key k from x.
Case 2: If the key k is in node x and x is an internal node, do the following.
• If the child y that precedes k in node x has at least t keys, then find the predecessor k0 of k
in the sub-tree rooted at y. Recursively delete k0, and replace k with k0 in x. (We can find
k0 and delete it in a single downward pass.)
• If y has fewer than t keys, then, symmetrically, examine the child z that follows k in node x.
If z has at least t keys, then find the successor k0 of k in the subtree rooted at z. Recursively
delete k0, and replace k with k0 in x. (We can find k0 and delete it in a single downward
pass.)
• Otherwise, if both y and z have only t-1 keys, merge k and all of z into y, so that x loses both
k and the pointer to z, and y now contains 2t-1 keys. Then free z and recursively delete k
from y.
Case 3: If the key k is not present in internal node x, determine the root x.c(i) of the appropriate subtree
that must contain k, if k is in the tree at all. If x.c(i) has only t-1 keys, execute steps 3a or 3b as
necessary to guarantee that we descend to a node containing at least t keys. Then finish by recursing on
the appropriate child of x.
• If x.c(i) has only t-1 keys but has an immediate sibling with at least t keys, give x.c(i) an
extra key by moving a key from x down into x.c(i), moving a key from x.c(i) ’s immediate
left or right sibling up into x, and moving the appropriate child pointer from the sibling into
x.c(i).
• If x.c(i) and both of x.c(i)'s immediate siblings have t-1 keys, merge x.c(i) with one sibling,
which involves moving a key from x down into the new merged node to become the median
key for that node.
Since most of the keys in a B-tree are in the leaves, deletion operations are most often used to delete
keys from leaves. The recursive delete procedure then acts in one downward pass through the tree,
without having to back up. When deleting a key in an internal node, however, the procedure makes a

ADSA UNIT-I, AITS, TPT JSR


46
downward pass through the tree but may have to return to the node from which the key was deleted to
replace the key with its predecessor or successor (cases 2a and 2b).
Syntax: // Delete key from node
void deleteFromNode(BTreeNode *x, int k);
// Borrow from previous
void borrowFromPrev(BTreeNode *x, int idx) {
BTreeNode *child = x->children[idx];
BTreeNode *sibling = x->children[idx - 1];
for (int i = child->n - 1; i >= 0; i--) child->keys[i + 1] = child->keys[i];
if (!child->leaf) {
for (int i = child->n; i >= 0; i--) child->children[i + 1] = child->children[i];
}
child->keys[0] = x->keys[idx - 1];
if (!child->leaf) child->children[0] = sibling->children[sibling->n];

x->keys[idx - 1] = sibling->keys[sibling->n - 1];


child->n++;
sibling->n--;
}
The following figures explain the deletion process.

Deletion Operation in B+ Trees

ADSA UNIT-I, AITS, TPT JSR


47
The next processes are shown below in the figure.

Deletion Operation in B+ Trees


Applications of B-Trees
• It is used in large databases to access data stored on the disk
• Searching for data in a data set can be achieved in significantly less time using the B-Tree
• With the indexing feature, multilevel indexing can be achieved.
• Most of the servers also use the B-tree approach.
• B-Trees are used in CAD systems to organize and search geometric data.
• B-Trees are also used in other areas such as natural language processing, computer
networks, and cryptography.
Advantages of B-Trees
• B-Trees have a guaranteed time complexity of O(log n) for basic operations like insertion,
deletion, and searching, which makes them suitable for large data sets and real-time
applications.
• B-Trees are self-balancing.
• High-concurrency and high-throughput.
• Efficient storage utilization.
Disadvantages of B-Trees
• B-Trees are based on disk-based data structures and can have a high disk usage.
• Not the best for all cases.
• For small datasets, the search time in a B-Tree might be slower compared to a binary
search tree, as each node may contain multiple keys.

ADSA UNIT-I, AITS, TPT JSR


48
UNIT-II
Heap trees(Priority Queues)-Min And Max heaps, Operations and applications.
Graphs-Terminology, Representations , Basic Search and Traversals, connected and Bi connected
components, Applications.
Divide And Conquer(DAC)- The general Method, Quick sort, Merge sort, Finding Minimum and
Maximum , Stassen’s Matric ,multiplication.

Heap trees(Priority Queues):


A Heap is a complete binary tree data structure that satisfies the heap property: for every node,
the value of its children is greater than or equal to its own value. Heaps are usually used to
implement priority queues, where the smallest (or largest) element is always at the root of the
tree.

1
ADSA UNIT-II JSR
Representation of Heap Tree:
An array can be used to simulate a tree in the following way. If we are storing one element at
index i in array Ar, then its parent will be stored at index i/2 (unless its a root, as root has no
parent) and can be accessed by Ar[ i/2 ], and its left child can be accessed by Ar[ 2 * i ] and its
right child can be accessed by Ar[ 2 * i +1 ]. The index of root will be 1 in an array.

Heap Operations
1. Insertion
● Insert the new element at the end of the heap (bottom-most, right-most position).
● Perform heapify-up (bubble-up):
○ Compared with parent.
○ If heap property is violated → swap with parent.
○ Repeat until the root or property is satisfied.

2. Deletion (Extract Root)


● Remove the root element (minimum for Min Heap, maximum for Max Heap).
● Replace root with the last element in the heap.
● Perform heapify-down (bubble-down):
○ Compare with children.
○ Swap with smaller (Min Heap) or larger (Max Heap) child.
○ Repeat until property is satisfied.

3. Heapify
● Operation to fix heap property in a subtree.
● Compare node with its children:
○ If property is violated, swap with the appropriate child.
○ Recursively apply to the affected subtree.
● Used when building a heap or after deletion.

Types of Heaps
There are two main types of heaps:

[Link] Heap: The root node contains the maximum value, and the values
decrease as you move down the tree.

● In other words, the max heap can be defined as for every node i; the value
of node i is less than or equal to its parent value except the root node.
Mathematically, it can be defined as:
● A[Parent(i)] >= A[i]

2
ADSA UNIT-II JSR
The above tree is a max heap tree as it satisfies the property of the max heap. Now,
let's see the array representation of the max heap.

[Link] Heap: The root node contains the minimum value, and the values increase
as you move down the tree
● In other words, the min-heap can be defined as, for every node i, the value
of node i is greater than or equal to its parent value except the root node.
Mathematically, it can be defined as:
A[Parent(i)] <= A[i]
Let's understand the min-heap through an example.
● In the above figure, 11 is the root node, and the value of the root node is

less than the value of all the other nodes (left child or a right child).
Max Heap Construction Algorithm
We shall use the same example to demonstrate how a Max Heap is created. The procedure to
create Min Heap is similar but we go for min values instead of max values.
We are going to derive an algorithm for max heap by inserting one element at a time. At any
point of time, heap must maintain its property. While insertion, we also assume that we are
inserting a node in an already heapified tree.
Step 1 − Create a new node at the end of heap.
Step 2 − Assign new value to the node.
Step 3 − Compare the value of this child node with its parent.
Step 4 − If value of parent is less than child, then swap them.
Step 5 − Repeat step 3 & 4 until Heap property holds.
Note − In Min Heap construction algorithm, we expect the value of the parent node
to be less than that of the child node.
Let’s understand the max heap through an example.
In the above figure, 55 is the parent node and it is greater than both of its child, and
11 is the parent of 9 and 8, so 11 is also greater than from both of its child.
Therefore, we can say that the above tree is a max heap tree.
Insertion in the Heap tree 44,33, 77, 11, 55, 88, 66
Suppose we want to create the max heap tree. To create the max heap tree, we need
to consider the following two cases:
• First, we have to insert the element in such a way that the property of the
complete binary tree must be maintained.
• Secondly, the value of the parent node should be greater than the either of
its child.
3
ADSA UNIT-II JSR
Step 1: First we add the 44 element in the tree as shown below:

Step 2: The next element is 33. As we know that insertion in the binary tree always starts from the

left side so 44 will be added at the left of 33 as shown below:

Step 3: The next element is 77 and it will be added to the right of the 44 as shown below:

As we can observe in the above tree that it does not satisfy the max heap property, i.e., parent node
44 is less than the child 77. So, we will swap these two values as shown below:

Step 4: The next element is 11. The node 11 is added to the left of 33 as shown below:

4
ADSA UNIT-II JSR
Step 5: The next element is 55. To make it a complete binary tree, we will add the node 55 to the
right of 33 as shown below:

As we can observe in the above figure that it does not satisfy the property of the max heap because
33<55, so we will swap these two values as shown below:

Step 6: The next element is 88. The left subtree is completed so we will add 88 to the left of 44 as
shown below:

As we can observe in the above figure that it does not satisfy the property of the max heap because
44<88, so we will swap these two values as shown below:
Again, it is violating the max heap property because 88>77 so we will swap these two values as
shown below:
Step 7: The next element is 66. To make a complete binary tree, we will add the 66 element to the
right side of 77 as shown below:
5
ADSA UNIT-II JSR
Max Heap Deletion Algorithm

Let us derive an algorithm to delete from max heap. Deletion in Max (or Min) Heap always happens at the root
to remove the Maximum (or minimum) value.
Step 1 − Remove root node.
Step 2 − Move the last element of the last level to root.
Step 3 − Compare the value of this child node with its parent.
Step 4 − If the value of the parent is less than the child, then swap them.
Step 5 − Repeat step 3 & 4 until Heap property holds.

Array representation:
88, 55, 77, 11, 33, 44, 66

Tree view:
88
/ \
55 77
/ \ /\
11 33 44 66

Step-by-Step Deletion of Max Element (88)


Step 1: Replace 88 with last element (66)
Array: [66, 55, 77, 11, 33, 44]
Tree:
66
/ \
55 77
/ \ /
11 33 44

Step 2: Heapify Down (Bubble Down)


● Root: 66
● Children: 55 (left), 77 (right)
● 77 > 66 → Swap 66 and 77
Array: [77, 55, 66, 11, 33, 44]

6
ADSA UNIT-II JSR
Tree:
77
/ \
55 66
/ \ /
11 33 44
● Now, 66 is at index 2 (value = 66)

● Children: 44 (left), no right child

● 66 > 44 → No swap needed


Final Heap After Deletion:
Array:
[77, 55, 66, 11, 33, 44]
Tree:
77
/ \
55 66
/ \ /
11 33 44

Min-Heap Data Structure


A Min-Heap is defined as a type of Heap Data Structure in which each node is smaller than or equal to
its children .
The heap data structure is a type of binary tree that is commonly used in computer science for various
purposes, including sorting, searching, and organizing data.

Min Heap Insertion Algorithm


Steps:
1. Insert element at the end (bottom-most, right-most position).
2. Compare with parent:
3. If the new element < parent, swap.
4. Repeat step 2 until:
5. The root is reached, or
6. The parent is smaller (heap property holds)

Min Heap : Insertion & Deletion)

This walkthrough shows all steps (placement + every swap) while building a Min Heap by inserting the keys
[20, 15, 30, 10, 8, 25], then

Start
1. (empty)
2. Insert 20
20
3. Insert 15
20
/
15
Compare & swap with parent (15 < 20)

7
ADSA UNIT-II JSR
Array (after swap):** `[15, 20]`
15
/
20
4. Insert 30
15
/ \
20 30
5. Insert 10
15
/ \
20 30
/
10
Step 1 – swap with parent (10 < 20)
Array: [15, 10, 30, 20]
15
/ \
10 30
/
20
Step 2 – swap with parent (10 < 15)
Array (final after insert 10):
[10, 15, 30, 20]
10
/ \
15 30
/
20
6. Insert 8
10
/ \
15 30
/ \
20 8
Step 1 – swap with parent (8 < 15)
Array:[10, 8, 30, 20, 15]
10
/ \
8 30
/ \
20 15
Step 2 – swap with parent (8 < 10)
Array (final after insert 8):[8, 10, 30, 20, 15]
8
/ \
10 30
/ \
20 15
[Link] 25

8
ADSA UNIT-II JSR
8
/ \
10 30
/ \ /
20 15 25
Step 1 – swap with parent (25 < 30)
Array (final after insert 25):[8, 10, 25, 20, 15, 30]
8
/ \
10 25
/ \ /
20 15 30
Final Min Heap after all insertions

Array: 8, 10, 25, 20, 15, 30


8
/ \
10 25
/ \ /
20 15 30

Min Heap Deletion Algorithm

Steps:
1. Remove root element (minimum element).

2. Replace root with the last element in the heap.

3. Reduce heap size by 1.

4. Heapify-down (bubble down) from the root:


○ Compare node with its children.
○ If node > smaller child → swap with smaller child
○ Repeat until heap property is restored (or leaf reached)
We now delete the minimum (root = 8).
Step 0 – Replace root with last element (30) and remove last slot
Array (after replacement, before heapify-down):[30, 10, 25, 20, 15]
30
/ \
10 25
/ \
20 15

Step 1 – Compare 30 with children (10, 25) → swap with smaller (10)
Array:[10, 30, 25, 20, 15]
10
/ \
30 25
/ \
20 15

9
ADSA UNIT-II JSR
Step 2 – Compare 30 with children (20, 15) → swap with smaller (15)
Array (final after delete-min): [10, 15, 25, 20, 30]
10
/ \
15 25
/ \
20 30
Applications

Heap Data Structure is generally taught with Heapsort. Heap sort algorithm has limited uses because Quicksort is
better in practice. Nevertheless, the Heap data structure itself is enormously used.

1. Priority Queues: Heaps are commonly used to implement priority queues, where elements with higher
priority are extracted first. This is useful in many applications such as scheduling tasks, handling
interruptions, and processing events.
2. Sorting Algorithms: Heapsort, a comparison-based sorting algorithm, is implemented using the Heap
data structure. It has a time complexity of O(n log n), making it efficient for large datasets.
3. Graph algorithms: Heaps are used in graph algorithms such as Prim's Algorithm, Dijkstra's algorithm., and
the A* search algorithm.
4. Lossless Compression: Heaps are used in data compression algorithms such as Huffman coding, which
uses a priority queue implemented as a min-heap to build a Huffman tree.
5. Medical Applications: In medical applications, heaps are used to store and manage patient information based
on priority, such as vital signs, treatments, and test results.
6. Load balancing: Heaps are used in load balancing algorithms to distribute tasks or requests to servers,
by processing elements with the lowest load first.
7. Order statistics: The Heap data structure can be used to efficiently find the kth smallest (or largest) element in
an array. See method 4 and 6 of this post for details.
8. Resource allocation: Heaps can be used to efficiently allocate resources in a system, such as memory blocks
or CPU time, by assigning a priority to each resource and processing requests in order of priority.
9. Job scheduling: The heap data structure is used in job scheduling algorithms, where tasks are scheduled based
on their priority or deadline. The heap data structure allows efficient access to the highest-priority task, making
it a useful data structure for job scheduling applications

GRAPHS
Introduction to Graphs
Graph is a non linear data structure; A map is a well-known example of a graph. In a map various connections are
made between the cities. The cities are connected via roads, railway lines and aerial network. We can assume that
the graph is the interconnection of cities by roads. Euler used graph theory to solve Seven Bridges of Königsberg
problem. Is there a possible way to traverse every bridge exactly once – Euler Tour

Figure: Section of the river Pregal in Koenigsberg and Euler's graph.


Defining the degree of a vertex to be the number of edges incident to it, Euler showed that there is a walk starting
at any vertex, going through each edge exactly once and terminating at the start vertex iff the degree of each, vertex
is even. A walk which does this is called Eulerian. There is no Eulerian walk for the Koenigsberg bridge problem

10
ADSA UNIT-II JSR
as all four vertices are of odd degree.
A graph contains a set of points known as nodes (or vertices) and set of links known as edges (or Arcs) which
connects the vertices.
Definition:
A graph is defined as Graph is a collection of vertices and arcs which connects vertices in the graph. A graph G is
represented as G = ( V , E ), where V is set of vertices and E is set of edges.
Example: graph G can be defined as G = ( V , E ) Where V = {A,B,C,D,E} and
E = {(A,B),(A,C)(A,D),(B,D),(C,D),(B,E),(E,D)}. This is a graph with 5 vertices and 6 edges.

Graph Terminology
[Link] : An individual data element of a graph is called as Vertex. Vertex is also known as node. In
above example graph, A, B, C, D & E are known as vertices.
[Link] : An edge is a connecting link between two vertices. Edge is also known as Arc. An edge is represented
as (starting Vertex, ending Vertex).
In above graph, the link between vertices A and B is represented as (A,B).
Edges are three types:

[Link] Edge - An undirected edge is a bidirectional edge. If there is an undirected edge between vertices A
and B then edge (A , B) is equal to edge (B , A).

[Link] Edge - A directed edge is a unidirectional edge. If there is a directed edge between vertices A and B
then edge (A , B) is not equal to edge (B , A).

[Link] Edge - A weighted edge is an edge with cost on it.


Types of Graphs
[Link] Graph
A graph with only undirected edges is said to be undirected graph.

[Link] Graph
A graph with only directed edges is said to be directed graph.

11
ADSA UNIT-II JSR
[Link] Graph
A graph in which any V node is adjacent to all other nodes present in the graph is known as a complete graph. An
undirected graph contains the edges that are equal to edges = n(n-1)/2 where n is the number of vertices present in
the graph. The following figure shows a complete graph.

[Link] Graph
Regular graph is the graph in which nodes are adjacent to each other, i.e., each node is accessible from
any other node.

[Link] Graph
A graph having cycle is called cycle graph. In this case the first and last nodes are the same. A closed simple path
is a cycle.

12
ADSA UNIT-II JSR
[Link] Graph
A graph without cycle is called acyclic graphs.

7. Weighted Graph
A graph is said to be weighted if there are some non negative value assigned to each edges of the graph. The
value is equal to the length between two vertices. Weighted graph is also called a network.

Outgoing Edge
A directed edge is said to be outgoing edge on its orign vertex.
Incoming Edge
A directed edge is said to be incoming edge on its destination vertex.
Degree
Total number of edges connected to a vertex is said to be degree of that vertex.
Indegree
Total number of incoming edges connected to a vertex is said to be indegree of that vertex.
Outdegree
Total number of outgoing edges connected to a vertex is said to be outdegree of that vertex.
Parallel edges or Multiple edges
If there are two undirected edges to have the same end vertices, and for two directed edges to have the same
origin and the same destination. Such edges are called parallel edges or multiple edges.
Self-loop
An edge (undirected or directed) is a self-loop if its two endpoints coincide.
Simple Graph
A graph is said to be simple if there are no parallel and self-loop edges.
Adjacent nodes
When there is an edge from one node to another then these nodes are called adjacent nodes.
Incidence
In an undirected graph the edge between v1 and v2 is incident on node v1 and v2.

13
ADSA UNIT-II JSR
Walk
A walk is defined as a finite alternating sequence of vertices and edges, beginning and ending with vertices, such
that each edge is incident with the vertices preceding and following it.
Closed walk
A walk which is to begin and end at the same vertex is called close walk. Otherwise it is an open walk.

If e1,e2,e3,and e4 be the edges of pair of vertices (v1,v2),(v2,v4),(v4,v3) and (v3,v1) respectively ,then v1 e1 v2
e2 v4 e3 v3 e4 v1 be its closed walk or circuit.
Path
A open walk in which no vertex appears more than once is called a path.

If e1 and e2 be the two edges between the pair of vertices (v1,v3) and (v1,v2) respectively, then v3 e1 v1 e2 v2 be
its path.
Length of a path
The number of edges in a path is called the length of that path. In the following, the length of the path is 3.

Fig: An open walk Graph


Circuit
A closed walk in which no vertex (except the initial and the final vertex) appears more than once is called a
circuit.
A circuit having three vertices and three edges.

14
ADSA UNIT-II JSR
Sub Graph
A graph S is said to be a sub graph of a graph G if all the vertices and all the edges of S are in G, and each edge
of S has the same end vertices in S as in G. A subgraph of G is a graph G’ such that V(G’) ⊆ V(G) and E(G’) ⊆
E(G)

Connected Graph
A graph G is said to be connected if there is at least one path between every pair of vertices in G. Otherwise,G is
disconnected.

A connected graph G A disconnected graph G


This graph is disconnected because the vertex v1 is not connected with the other vertices of the graph.
Degree
In an undirected graph, the number of edges connected to a node is called the degree of that node or the degree of
a node is the number of edges incident on it.
In the above graph, degree of vertex v1 is 1, degree of vertex v2 is 3, degree of v3 and v4 is 2 in a connected
graph.
Indegree
The indegree of a node is the number of edges connecting to that node or in other words edges incident to it.

15
ADSA UNIT-II JSR
In the above graph,the indegree of vertices v1, v3 is 2, indegree of vertices v2, v5 is 1 and indegree of v4 is zero.

Outdegree
The outdegree of a node (or vertex) is the number of edges going outside from that node or in other words the
ADT of Graph:
Structure Graph is
objects: a nonempty set of vertices and a set of undirected edges, where each edge is a pair of vertices
functions: for all graph ∈ Graph, v, v1 and v2 ∈ Vertices
Graph Create()::=return an empty graph
Graph InsertVertex(graph, v)::= return a graph with v inserted. v has no edge.
Graph InsertEdge(graph, v1,v2)::= return a graph with new edge between v1 and v2
Graph DeleteVertex(graph, v)::= return a graph in which v and all edges incident to it are removed
Graph DeleteEdge(graph, v1, v2)::=return a graph in which the edge (v1, v2) is removed
Boolean IsEmpty(graph)::= if (graph==empty graph) return TRUE else return FALSE
List Adjacent(graph,v)::= return a list of all vertices that are adjacent to v
Graph Representations
Graph data structure is represented using following representations
1. Adjacency Matrix
2. Adjacency List
[Link] Matrix
In this representation, graph can be represented using a matrix of size total number of vertices by total number of
vertices; means if a graph with 4 vertices can be represented using a matrix of 4X4 size.
In this matrix, rows and columns both represent vertices.
This matrix is filled with either 1 or 0. Here, 1 represents there is an edge from row vertex to column vertex and 0
represents there is no edge from row vertex to column vertex.
Adjacency Matrix : let G = (V, E) with n vertices, n ≥ 1. The adjacency matrix of G is a 2-dimensional n × n
matrix, A, A(i, j) = 1 iff (vi, vj) ∈E(G) (〈vi, vj〉 for a diagraph), A(i, j) = 0 otherwise.
example : for undirected graph

+n

16
ADSA UNIT-II JSR
The adjacency matrix for an undirected graph is symmetric; the adjacency matrix for a digraph need not be
symmetric.
Merits of Adjacency Matrix:
From the adjacency matrix, to determine the connection of vertices is easy
n−1
The degree of a vertex is ∑ adj _ mat[i][ j]
j =0
For a digraph, the row sum is the out_degree, while the column sum is the in_degree
n−1 n−1
ind (vi) = A[ j, i] outd(vi) = A[i, j]
j =0 j =0

The space needed to represent a graph using adjacency matrix is n2 bits. To identify the edges in a graph,
adjacency matrices will require at least O(n2) time.
2. Adjacency List
In this representation, every vertex of graph contains list of its adjacent vertices. The n rows of the adjacency
matrix are represented as n chains. The nodes in chain I represent the vertices that are adjacent to vertex i.
It can be represented in two forms. In one form, array is used to store n vertices and chain is used to store its
adjacencies. Example:
So that we can access the adjacency list for any vertex in O(1) time. Adjlist[i] is a pointer to to first node in the

adjacency list for vertex i. Structure is


Another type of representation is given [Link]: consider the following directed graph representation
implemented using linked list

17
ADSA UNIT-II JSR
This r0130epresentation can also be implemented using array

ELEMENTARY GRAPH OPERATIONS


Given a graph G = (V E) and a vertex v in V(G) we wish to visit all vertices in G that are reachable from v (i.e.,
all vertices that are connected to v). We shall look at two ways of doing this: depth-first search and breadth-first
search. Although these methods work on both directed and undirected graphs the following discussion assumes
that the graphs are undirected.
Depth-First Search
● Begin the search by visiting the start vertex v
o If v has an unvisited neighbor, traverse it recursively
o Otherwise, backtrack
● Time complexity
o Adjacency list: O(|E|)
o Adjacency matrix: O(|V|2)

We begin by visiting the start vertex v. Next an unvisited vertex w adjacent to v is selected, and a depth-first
search from w is initiated. When a vertex u is reached such that all its adjacent vertices have been visited, we back
up to the last vertex visited that has an unvisited vertex w adjacent to it and initiate a depth-first search from w.
The search terminates when no unvisited vertex can be reached from any of the visited vertices.
DFS traversal of a graph, produces a spanning tree as final result. Spanning Tree is a graph without any loops.
We use Stack data structure with maximum size of total number of vertices in the graph to implement DFS
traversal of a graph.

We use the following steps to implement DFS traversal...

Step 1: Define a Stack of size total number of vertices in the graph.


Step 2: Select any vertex as starting point for traversal. Visit that vertex and push it on to the Stack.
Step 3: Visit any one of the adjacent vertex of the verex which is at top of the stack which is not visited and push
it on to the stack.
Step 4: Repeat step 3 until there are no new vertex to be visit from the vertex on top of the stack.
Step 5: When there is no new vertex to be visit then use back tracking and pop one vertex from the stack.
Step 6: Repeat steps 3, 4 and 5 until stack becomes Empty.
Step 7: When stack becomes Empty, then produce final spanning tree by removing unused edges from the graph
This function is best described recursively as in Program.
#define FALSE 0
#define TRUE 1
int visited[MAX_VERTICES];
void dfs(int v)
{

18
ADSA UNIT-II JSR
node_pointer w;
visited[v]= TRUE;
printf(“%d”, v);
for (w=graph[v]; w; w=w->link)
if (!visited[w->vertex])
dfs(w->vertex);
}
Consider the graph G of Figure 6.16(a), which is represented by its adjacency lists as in Figure 6.16(b). If a depth-
first search is initiated from vertex 0 then the vertices of G are visited in the following order: 0 1 3 7 4 5 2 6. Since
DFS(O) visits all vertices that can be reached from 0 the vertices visited, together with all edges in G incident to
these vertices form a connected component of G.

Figure: Graph and its adjacency list representation, DFS spanning tree
Analysis or DFS:
When G is represented by its adjacency lists, the vertices w adjacent to v can be determined by following a chain
of links. Since DFS examines each node in the adjacency lists at most once and there are 2e list nodes the time to
complete the search is O(e). If G is represented by its adjacency matrix then the time to determine all vertices
adjacent to v is O(n). Since at most n vertices are visited the total time is O(n2).

19
ADSA UNIT-II JSR
20
ADSA UNIT-II JSR
Breadth-First Search
In a breadth-first search, we begin by visiting the start vertex v. Next all unvisited vertices adjacent to v are visited.
Unvisited vertices adjacent to these newly visited vertices are then visited and so on. Algorithm BFS (Program 6.2)
gives the details.
typedef struct queue *queue_pointer;
typedef struct queue {
int vertex;

queue_pointer link;
};
void addq(queue_pointer *,
queue_pointer *, int);
int deleteq(queue_pointer *);
void bfs(int v)
{
node_pointer w;
queue_pointer front, rear;
front = rear = NULL;
printf(“%d”, v);
visited[v] = TRUE;
addq(&front, &rear, v);
while (front) {
v= deleteq(&front);
for (w=graph[v]; w; w=w->link)
if (!visited[w->vertex]) {
printf(“%d”, w->vertex);
addq(&front, &rear, w->vertex);
visited[w->vertex] = TRUE;
}
}
}
Steps:
BFS traversal of a graph, produces a spanning tree as final result. Spanning Tree is a graph without any loops. We
use Queue data structure with maximum size of total number of vertices in the graph to implement BFS traversal
of a graph.

We use the following steps to implement BFS traversal...


Step 1: Define a Queue of size total number of vertices in the graph.
Step 2: Select any vertex as starting point for traversal. Visit that vertex and insert it into the Queue.
Step 3: Visit all the adjacent vertices of the vertex which is at front of the Queue which is not visited and insert
them into the Queue.
Step 4: When there is no new vertex to be visit from the vertex at front of the Queue then delete that vertex from
the Queue.
Step 5: Repeat step 3 and 4 until queue becomes empty.
Step 6: When queue becomes Empty, then produce final spanning tree by removing unused edges from the graph

21
ADSA UNIT-II JSR
22
ADSA UNIT-II JSR
Difference between DFS and BFS:

Parameters BFS DFS

BFS stands for Breadth First


Stands for Search. DFS stands for Depth First Search.

BFS(Breadth First Search) uses


Data Queue data structure for finding DFS(Depth First Search) uses Stack data
Structure the shortest path. structure.

DFS is also a traversal approach in which


BFS is a traversal approach in the traverse begins at the root node and
which we first walk through all proceeds through the nodes as far as
nodes on the same level before possible until we reach the node with no
Definition moving on to the next level. unvisited nearby nodes.

Conceptual BFS builds the tree level by


Difference level. DFS builds the tree sub-tree by sub-tree.

Approach It works on the concept of FIFO It works on the concept of LIFO (Last In
used (First In First Out). First Out).

BFS is more suitable for


searching vertices closer to the DFS is more suitable when there are
Suitable for given source. solutions away from source.

BFS is used in various DFS is used in various applications such


applications such as bipartite as acyclic graphs and finding strongly
Applications graphs, shortest paths, etc. connected components etc.

Connected and Biconnected Components


CONNECTED COMPONENT
A connected component of graph G is said to be a maximally connected induced subgraph. This
means that it is a connected induced subgraph which is not a proper subgraph by itself of any other
connected subgraph of G.

23
ADSA UNIT-II JSR
For example in the graph shown below, {0, 1, 2} form a connected component and {3, 4} form
another connected component.
Biconnected Components

Example-1

Example-2

Example-3

Example-4

24
ADSA UNIT-II JSR
Example-5
Example-6

In above graph, following are the biconnected components:


● 4–2 3–4 3–1 2–3 1–2
● 8–9
● 8–5 7–8 5–7
● 6–0 5–6 1–5 0–1
● 10–11

25
ADSA UNIT-II JSR
Divide And Conquer(DAC)

Divide and Conquer is an algorithmic pattern. In algorithmic methods, the design is to take a dispute
on a huge input, break the input into minor pieces, decide the problem on each of the small pieces,
and then merge the piecewise solutions into a global solution. This mechanism of solving the
problem is called the Divide & Conquer Strategy.
Divide and Conquer algorithm consists of a dispute using the following three steps.
Divide the original problem into a set of subproblems. Break down the original problem into smaller
subproblems.
● Each subproblem should represent a part of the overall problem.
● The goal is to divide the problem until no further division is possible.
Conquer: Solve every subproblem individually, recursively. Solve each of the smaller subproblems
individually.
● If a subproblem is small enough (often referred to as the “base case”), we solve it directly
without further recursion.
● The goal is to find solutions for these subproblems independently.
Combine: Put together the solutions of the subproblems to get the solution to the whole problem.
Combine the sub-problems to get the final solution of the whole problem.
● Once the smaller subproblems are solved, we recursively combine their solutions to get the
solution of larger problem.
● The goal is to formulate a solution for the original problem by merging the results from the
subproblems.

Generally, we can follow the divide-and-conquer approach in a three-step process. Examples:


The specific computer algorithms are based on the Divide & Conquer approach: Applications
of Divide and Conquer Approach:
Following algorithms are based on the concept of the Divide and Conquer Technique:
1. Binary Search: The binary search algorithm is a searching algorithm, which is also called a
half-interval search or logarithmic search. It works by comparing the target value with the
middle element existing in a sorted array. After making the comparison, if the value differs,
then the half that cannot contain the target will eventually eliminate, followed by continuing
the search on the other half. We will again consider the middle element and compare it with
the target value. The process keeps on repeating until the target value is met. If we found the
other half to be empty after ending the search, then it can be concluded that the target is not
present in the array.
2. Quicksort: It is the most efficient sorting algorithm, which is also known as partition-

26
ADSA UNIT-II JSR
exchange sort. It starts by selecting a pivot value from an array followed by dividing the rest
of the array elements into two sub-arrays. The partition is made by comparing each of the
elements with the pivot value. It compares whether the element holds a greater value or lesser
value than the pivot and then sort the arrays recursively.
3. Merge Sort: It is a sorting algorithm that sorts an array by making comparisons. It starts by
dividing an array into sub-array and then recursively sorts each of them. After the sorting is
done, it merges them back.

4. 4. Closest Pair of Points: It is a problem of computational geometry. This algorithm


emphasizes finding out the closest pair of points in a metric space, given n points, such that
the distance between the pair of points should be minimal.
5. Strassen's Algorithm: It is an algorithm for matrix multiplication, which is named after
Volker Strassen. It has proven to be much faster than the traditional algorithm when works
on large matrices.
6. Cooley-Tukey Fast Fourier Transform (FFT) algorithm: The Fast Fourier Transform
algorithm is named after J. W. Cooley and John Turkey. It follows the Divide and Conquer
Approach and imposes a complexity of O(nlogn).
7. Karatsuba algorithm for fast multiplication: It is one of the fastest multiplication
algorithms of the traditional time, invented by Anatoly Karatsuba in late 1960 and got
published in 1962. It multiplies two n-digit numbers in such a way by reducing it to at most
single-digit.
Advantages of Divide and Conquer
o Divide and Conquer tend to successfully solve one of the biggest problems, such as the
Tower of Hanoi, a mathematical puzzle. It is challenging to solve complicated problems for
which you have no basic idea, but with the help of the divide and conquer approach, it has
lessened the effort as it works on dividing the main problem into two halves and then solve
them recursively. This algorithm is much faster than other algorithms.
o It efficiently uses cache memory without occupying much space because it solves simple
subproblems within the cache memory instead of accessing the slower main memory.
o It is more proficient than that of its counterpart Brute Force technique.
o Since these algorithms inhibit parallelism, it does not involve any modification and is
handled by systems incorporating parallel processing.
Disadvantages of Divide and Conquer
o Since most of its algorithms are designed by incorporating recursion, so it necessitates high
memory management.
o An explicit stack may overuse the space.
o It may even crash the system if the recursion is performed rigorously greater than the stack
present in the CPU.
Characteristics of Divide and Conquer Algorithm:
Divide and Conquer Algorithm involves breaking down a problem into smaller, more manageable
parts, solving each part individually, and then combining the solutions to solve the original problem.
The characteristics of Divide and Conquer Algorithm are:
● Dividing the Problem: The first step is to break the problem into smaller, more manageable
subproblems. This division can be done recursively until the subproblems become simple
enough to solve directly.
● Independence of Subproblems: Each subproblem should be independent of the others,
meaning that solving one subproblem does not depend on the solution of another. This allows
for parallel processing or concurrent execution of subproblems, which can lead to efficiency
gains.

27
ADSA UNIT-II JSR
● Conquering Each Subproblem: Once divided, the subproblems are solved individually.
This may involve applying the same divide and conquer approach recursively until the
subproblems become simple enough to solve directly, or it may involve applying a different
algorithm or technique.
● Combining Solutions: After solving the subproblems, their solutions are combined to obtain
the solution to the original problem. This combination step should be relatively efficient and
straightforward, as the solutions to the subproblems should be designed to fit together
seamlessly.

Quick Sort Algorithm


It is an algorithm of Divide & Conquer type.

Divide: Rearrange the elements and split arrays into two sub-arrays and an element in between
search that each element in left sub array is less than or equal to the average element and each
element in the right sub- array is larger than the middle element.
Conquer: Recursively, sort two sub arrays.
Combine: Combine the already sorted array.
Algorithm:
QuickSort function
void quicksort(int a[], int first, int last) {
int i, j, pivot, temp;

if (first < last) {


pivot = first;
i = first;
j = last;

while (i < j) {
while (a[i] <= a[pivot] && i < last)
i++;
while (a[j] > a[pivot])
j--;

if (i < j) {
temp = a[i];
a[i] = a[j];
a[j] = temp;
}
}

temp = a[pivot];
a[pivot] = a[j];
a[j] = temp;

quicksort(a, first, j - 1);


quicksort(a, j + 1, last);
}
}

28
ADSA UNIT-II JSR
Figure: shows the execution trace partition algorithm

Example of Quick Sort:


1. 44 33 11 55 77 90 40 60 99 22 88
Let 44 be the Pivot element and scanning done from right to left
Comparing 44 to the right-side elements, and if right-side elements are smaller than 44, then swap
it. As 22 is smaller than 44 so swap them.
22 33 11 55 77 90 40 60 99 44 88
Now comparing 44 to the left side element and the element must be greater than 44 then swap
them. As 55 are greater than 44 so swap them.
22 33 11 44 77 90 40 60 99 55 88
Recursively, repeating steps 1 & steps 2 until we get two lists one left from pivot element 44 &
one right from pivot element.
22 33 11 40 77 90 44 60 99 55 88
Swap with 77:
22 33 11 40 44 90 77 60 99 55 88
Now, the element on the right side and left side are greater than and smaller than 44 respectively.
Now we get two sorted lists:

And these sublists are sorted under the same process as above done. These two sorted sublists side by side

29
ADSA UNIT-II JSR
Merging Sublists:

Example for QuickSort


Initial Array
We start with an unsorted array [10, 70, 40, 90, 50]. In Quick Sort, we select a 'pivot' element from
the array, which we will use to divide the array into two parts. Elements less than the pivot go to the
left, and elements greater than the pivot go to the right.

Step 1: Choosing a Pivot:


The first step in Quick Sort is to choose a pivot element. In this case we select 50 as our pivot
element. The pivot can be chosen in different ways - either the first element, the last element, a
random element, or the median among a few elements from the array.

30
ADSA UNIT-II JSR
Step 2: Partitioning:
We reorder the array so that all elements less than the pivot come before it and all elements greater
than the pivot go after it. In the images, we can see that 70 and 40 are in place because they are less
than the pivot, and we are moving 50 to the position just after the pivot because it is the next smallest
element after the pivot.

Step 3: Sorting the Subarrays:


After partitioning, we have two subarrays around the pivot. We then apply the same Quick Sort
process recursively to these subarrays. The subarray [10, 40] is already sorted because it contains all
elements less than the pivot and only has two elements. For the subarray [90, 70, 50], we need to
select a new pivot, partition this subarray, and continue the process.

Step 4: Final Sorted Array:


After sorting both subarrays and placing the pivot in its correct position, we combine the elements
to form the sorted array [10, 40, 50, 70, 90]. Quick Sort doesn't literally combine arrays like Merge
Sort; instead, the elements are sorted in place.

The process of selecting a pivot and partitioning the array is repeated recursively on the subarrays,
and this recursion continues until all the subarrays are sorted.
Dry Run

31
ADSA UNIT-II JSR
Step Array i j Pivot Element Subarrays

1 [10, 70, 40, 90, 50] 0 4 50 Full Array

2 [10, 40, 90, 70, 50] 1 3 50 Full Array

3 [10, 40, 50, 70, 90] 2 2 50 [10, 40] (50) [70, 90]

4 [10, 40] (50) [70, 90] - - 50 -

5 [10, 40] 0 1 40 [10] (40)

6 [10, 40] 1 0 40 -

7 [10, 40) - - 40 -

8 [90, 70] 0 1 90 (90)[70]

9 [70, 90] 1 0 90 -

10 [70, 90] - - 90 -

Pseudo Code
Before we code, let's outline our strategy with pseudo-code:
function quickSort(array, low, high) if
low < high
pivotIndex = partition(array, low, high)
quickSort(array, low, pivotIndex - 1) // Before pivot
quickSort(array, pivotIndex + 1, high) // After pivot
Time Complexity
The speed of an algorithm is often captured by its time complexity, which expresses the
relationship between the input size and the time required to complete the task.
● Best and Average Case: In the best and average scenarios, where the pivot element is
chosen to divide the array into proportionate halves consistently, the time complexity is O(n
log n). This is because the array is halved with each pass (log n), and this operation is
performed for each n element.
● Worst Case: The worst case occurs when the pivot selection results in one partition
containing all but one of the elements, leading to O(n^2). This situation commonly happens
when the pivot is the smallest or largest element in the array, as it would be in an already
sorted or reverse-sorted array.

32
ADSA UNIT-II JSR
Space Complexity
Space complexity refers to the amount of memory space required by an algorithm in its life cycle.
● In-Place Sorting: Quick Sort is an in-place sorting algorithm. However, it requires space
for the stack frames of the recursive calls. In the best case, this leads to a space complexity
of O(log n), representing the height of the balanced partition tree.
● Worst Case: In the worst case, with unbalanced partitions, the space complexity can degrade
to O(n), where n is the depth of the recursive call stack, one for each array element
Advantages of Quick Sort
● Efficient average-case complexity: Quick Sort has an average-case time complexity of O(n
log n), making it efficient for large datasets.
● In-place sorting: It doesn't require additional memory beyond the existing array, reducing
space complexity.
● Parallelizable: The algorithm can be easily parallelized across multiple processors or cores,
further improving its speed.
● Cache-friendly: Quick Sort works well with modern processor caches due to its locality of
reference, enhancing performance.
● No recursion depth limit: Unlike some other recursive algorithms, Quick Sort doesn't have
a bound on its recursion depth, making it suitable for very large datasets.
Disadvantages of Quick Sort
● Worst-case complexity: In the worst-case scenario, Quick Sort's time complexity can
deteriorate to O(n^2), which can occur when the chosen pivot elements are poorly selected.
● Unstable sorting: Quick Sort does not preserve the relative order of equal elements, which
might be important for specific applications.
● High constant factor: Compared to other sorting algorithms like Merge Sort, Quick Sort
might have a higher constant factor in its time complexity, leading to slightly slower
performance in some cases.
● Not ideal for small datasets: For small datasets, the overhead of partitioning and recursion
might outweigh the benefits, making simpler algorithms like Bubble Sort or Insertion Sort
more efficient.

Merge Sort
Merge sort is yet another sorting algorithm that falls under the category of Divide and Conquer
technique. It is one of the best sorting techniques that successfully build a recursive algorithm.

Divide and Conquer Strategy

In this technique, we segment a problem into two halves and solve them individually. After finding
the solution of each half, we merge them back to represent the solution of the main problem.
Suppose we have an array A, such that our main concern will be to sort the subsection, which starts
at index p and ends at index r, represented by A[p..r].

Divide
If assumed q to be the central point somewhere in between p and r, then we will fragment the
subarray A[p..r] into two arrays A[p..q] and A[q+1, r].
Conquer
After splitting the arrays into two halves, the next step is to conquer. In this step, we individually
sort both of the subarrays A[p..q] and A[q+1, r]. In case if we did not reach the base situation, then
we again follow the same procedure, i.e., we further segment these subarrays followed by sorting
them separately.

33
ADSA UNIT-II JSR
Combine
As when the base step is acquired by the conquer step, we successfully get our sorted subarrays
A[p..q] and A[q+1, r], after which we merge them back to form a new sorted array [p..r].
Merge Sort algorithm
The MergeSort function keeps on splitting an array into two halves until a condition is met where
we try to perform MergeSort on a subarray of size 1, i.e., p == r.
Algorithm
Step 1: Create two pointers, one for each sorted half.
Step 2: Initialize an empty temporary array to hold the merged result.
Step 3: Compare the elements at the pointers of the two halves:
Copy the smaller element into the temporary array.
Move the pointer of the sublist with the smaller element forward.
Step 4: Repeat step 3 until one of the sublists is empty.
Step 5: Copy the remaining elements from the non-empty sublist to the temporary array.
Step 6: Copy the elements back from the temporary array to the original list. Algorithm
Algorithm:

Merge function
void merge(int arr[], int low, int mid, int high) {
int i = low, j = mid + 1, k = low;

while (i <= mid && j <= high) {


if (arr[i] <= arr[j])
temp[k++] = arr[i++];
else
temp[k++] = arr[j++];
}

while (i <= mid)


temp[k++] = arr[i++];

while (j <= high)


temp[k++] = arr[j++];

for (i = low; i <= high; i++)


arr[i] = temp[i];
}

// Merge Sort function


void mergeSort(int arr[], int low, int high) {
if (low < high) {
int mid = (low + high) / 2;
mergeSort(arr, low, mid); // sort left half
mergeSort(arr, mid + 1, high); // sort right half
merge(arr, low, mid, high); // merge two halves
}
}

Working of Merge sort Algorithm

34
ADSA UNIT-II JSR
Now, let's see the working of merge sort Algorithm.
To understand the working of the merge sort algorithm, let's take an unsorted array. It will be
easier to understand the merge sort via an example.
Let the elements of array are -

According to the merge sort, first divide the given array into two equal halves. Merge sort keeps
dividing the list into equal parts until it cannot be further divided.
As there are eight elements in the given array, so it is divided into two arrays of size 4.

Now, again divide these two arrays into halves. As they are of size 4, so divide them into new arrays
of size 2.

Now, again divide these arrays to get the atomic value that cannot be further divided.

Now, combine them in the same manner they were broken.


In combining, first compare the element of each array and then combine them into another array in
sorted order.
So, first compare 12 and 31, both are in sorted positions. Then compare 25 and 8, and in the list of
two values, put 8 first followed by 25. Then compare 32 and 17, sort them and put 17 first followed
by 32. After that, compare 40 and 42, and place them sequentially.

In the next iteration of combining, now compare the arrays with two data values and merge them
into an array of found values in sorted order.

35
ADSA UNIT-II JSR
Now, there is a final merging of the arrays. After the final merging of above arrays, the array will
look like -

Now, the array is completely sorted.


Merge( ) Function Explained Step-By-Step
Consider the following example of an unsorted array, which we are going to sort with the help of
the Merge Sort algorithm.
A= (36,25,40,2,7,80,15)
Step1: The merge sort algorithm iteratively divides an array into equal halves until we achieve an
atomic value. In case if there are an odd number of elements in an array, then one of the halves will
have more elements than the other half.
Step2: After dividing an array into two subarrays, we will notice that it did not hamper the order of
elements as they were in the original array. After now, we will further divide these two arrays into
other halves.
Step3: Again, we will divide these arrays until we achieve an atomic value, i.e., a value that cannot
be further divided.
Step4: Next, we will merge them back in the same way as they were broken down.
Step5: For each list, we will first compare the element and then combine them to form a new sorted
list.
Step6: In the next iteration, we will compare the lists of two data values and merge them back into
a list of found data values, all placed in a sorted manner

36
ADSA UNIT-II JSR
Hence the array is sorted.
Advantages of Merge Sort:
● Stability : Merge sort is a stable sorting algorithm, which means it maintains the relative
order of equal elements in the input array.
● Guaranteed worst-case performance: Merge sort has a worst-case time complexity of O(N
logN) , which means it performs well even on large datasets.
● Simple to implement: The divide-and-conquer approach is straightforward.
● Naturally Parallel : We independently merge subarrays that makes it suitable for parallel
processing.
Disadvantages of Merge Sort:
● Space complexity: Merge sort requires additional memory to store the merged sub- arrays
during the sorting process.
● Not in-place: Merge sort is not an in-place sorting algorithm, which means it requires
additional memory to store the sorted data. This can be a disadvantage in applications where
memory usage is a concern.
● Slower than QuickSort in general. QuickSort is more cache friendly because it works in-
place.
Complexity Analysis of Merge Sort:
● Time Complexity:
o Best Case: O(n log n), When the array is already sorted or nearly sorted.

o Average Case: O(n log n), When the array is randomly ordered.
o Worst Case: O(n log n), When the array is sorted in reverse order.
● Auxiliary Space: O(n), Additional space is required for the temporary array used during
merging.
Applications of Merge Sort:
● Sorting large datasets
● External sorting (when the dataset is too large to fit in memory)
● Inversion counting
● Merge Sort and its variations are used in library methods of programming languages. For
example its variation TimSort is used in Python, Java Android and Swift. The main reason
why it is preferred to sort non-primitive types is stability which is not there in QuickSort.
For example [Link] in Java uses QuickSort while [Link] uses MergeSort.
● It is a preferred algorithm for sorting Linked lists.
● It can be easily parallelized as we can independently sort subarrays and then merge.
● The merge function of merge sort to efficiently solve the problems like union and
intersection of two sorted arrays.
Finding Minimum And Maximum (Application Of Divide And Conquer)
Finding a maximum and minimum element from a given array is the application of the Divide and
Conquer algorithm.
There are various ways to this problem, but the most traditional approach to to solve this problem is
the linear approach. In the linear approach, we traverse all elements once and find the minimum and
maximum element.
In this approach, the time complexity to solve this problem is θ(n).
Let's see the algorithm for the linear approach,
suppose A is the array of integers and n is the number of elements in the array A.
MinMax(A, n){ int
min = A[0]; int
max = A[0];
for(int i = 1; i < n; i++){

37
ADSA UNIT-II JSR
if(max < A[i])
max = A[i]; else
if(min > A[i])
min = A[i];
}
return (min, max);
}
Time Complexity for the above algorithm is T(n) = 2(n-1) + C ≈ θ(n).
Using Divide And Conquer Approach:

As we know in the divide and conquer approach, we first divide the problem into small problems
and combine the results of these small problems to solve the main problem.
In Divide and Conquer approach:
● Step 1: Find the mid of the array.
● Step 2: Find the maximum and minimum of the left subarray recursively.
● Step 3: Find the maximum and minimum of the right subarray recursively.
● Step 4: Compare the result of step 3 and step 4
● Step 5: Return the minimum and maximum.

Let's see the algorithm for the Divide and Conquer approach,
suppose A is the array of integers and n is the number of elements in the array A.
● i = 0 (Index of first element of array)
● j = n -1 (index of last element of array)
MinMaxDAC(A, i, j) {
if(i == j)
return (A[i], A[i]); if((j -
i) == 1)
if (A[i] < A[j])
return (A[i], A[j]) else
return (A[j], A[i]) else
{
int mid = (i + j) / 2;
LMin, LMax = MinMaxDAC(A, i, mid); RMin,
RMax = MinMaxDAC(A, mid + 1, j); if(LMax >
RMax)
max = LMax; else
max = RMax; if(LMin
< RMin)

min = LMin; else


min = RMin; return
(min, max);
Time Complexity for the DAC algorithm:
T(n) = 1, if n = 1 or n = 2 T(n)
= 2T(n/2) + C, if n > 2
After solving the above recurrence relation,
T(n) = 3n2-2 ≈ Ο(n)

38
ADSA UNIT-II JSR
Strassen’s Matrix Multiplication
1. Standard Multiplication
● Multiplying two n×nn \times n matrices → O(n³) time.

● Example: For 2×2 matrices

● Needs 8 multiplications + 4 additions.


2. Strassen’s Improvement
● Volker Strassen (1969) reduced it to:
7 multiplications + 18 additions/subtractions.

3. Reconstructing Result

4. Time Complexity
● Recurrence:
T(n)=7T(n2)+O(n2)T(n) = 7T\left(\frac{n}{2}\right) + O(n^2)
● Using Master Theorem:
T(n)=O(nlog⁡27)≈O(n2.81)T(n) = O(n^{\log_2 7}) \approx O(n^{2.81})
Faster than O(n3)O(n^3), especially for large n.

5. Applications
● Large-scale scientific computing
● Linear algebra problems
● Basis for faster algorithms (Coppersmith–Winograd, etc.)\
● Not efficient for small matrices (overhead cost of extra additions).

39
ADSA UNIT-II JSR
UNIT – III
Greedy Method: General Method, Job Sequencing with deadlines, Knapsack Problem, Minimum
cost spanning trees, Single Source Shortest Paths
Dynamic Programming: General Method, Multi Stage graphs, All pairs shortest paths, Single
Source Shortest Paths – General Weights (Bellman Ford Algorithm), Optimal Binary Search
Trees, 0/1 Knapsack, Travelling Salesperson problem

Greedy Method

General Method
Optimization Problem: An optimization problem is the problem of finding the best solution
(optimal solution) from all the feasible solutions (practicable of possible solutions). In an
optimization problem we are given a set of constraints and an optimization function. Solutions
that satisfy the constraints are called feasible solutions. A feasible solution for which the
optimization function has the best possible value is called optimal solution.
Optimization Problem Can be divided into three types
1. Greedy Method
2. Dynamic Programming
3. Branch and Bound

Greedy Method:The Greedy Method is a technique for solving optimization problems by


making the best possible choice at each step, assuming that this approach will lead to the
optimal overall solution.A greedy algorithm builds up a solution incrementally, selecting the
next component that offers the most immediate advantage and satisfies the problem's
constraints, without revisiting earlier decisions.

Characteristics of Greedy Algorithms


● feasible, i.e., it has to satisfy the problem’s constraints
● Optimal Solution: solution that is feasible and best among all possible feasible
solutions, according to some objective (e.g. max profit, min cost).
Control Abstraction for Greedy Method:
Algorithm GreedyMethod (a, n)
{
// a is an array of n inputs
Solution: =Ø;
for i: =0 to n
do
{
s: = select (a);
if (feasible (Solution, s))then

ADSA UNIT-III JSR 1


{
Solution: = union (Solution, s);
}
else
reject ();
// if the solution is not feasible, reject it.
}
return solution;
}

In the greedy method there are three important activities.


1. A selection of solutions from the given input domain is performed, i.e. s:= select(a).
2. The feasibility of the solution is performed, by using feasible ‘(solution, s)’ and then all
feasible solutions are obtained.
3. From the set of feasible solutions, the particular solution that minimizes or maximizes the
given objection function is obtained. Such a solution is called the optimal solution.
General Steps (Method)

ADSA UNIT-III JSR 2


1. Define the Optimal Substructure

○ The problem must be broken into subproblems such that the optimal solution
can be formed from optimal solutions of subproblems.
○ Example: In job sequencing, the optimal profit depends on the optimal scheduling
of remaining jobs.

2. Greedy Choice Property

○ At each step, make the choice that looks best at the moment.
○ This local best choice should lead to the global optimum.
○ Example: Pick the job with the maximum profit first (if feasible).

3. Sort / Order the Candidates

○ Arrange the possible decisions (jobs, items, edges, etc.) in an order that allows
greedy selection.
○ Example: Sort items by value/weight ratio in fractional knapsack.

4. Iteratively Make Choices


From the ordered set, choose the next best candidate that is feasible (does not violate
constraints like capacity, deadlines, etc.).
○ Example: Keep adding edges in Kruskal’s MST if no cycle is formed.

5. Check Feasibility
○ Ensure that including the current choice doesn’t break constraints.
○ Example: In Job Sequencing, a job is placed only if a free slot ≤ deadline is
available.

6. Construct the Solution


○ Continue until no more candidates can be added.
○ The solution constructed is expected to be the optimal solution.

Examples of Problems using Greedy Method


● Activity Selection Problem (choose max activities in minimum time)

● Job Sequencing with Deadlines (maximize profit)


● Fractional Knapsack Problem (maximize profit/weight ratio)
● Minimum Spanning Tree (Kruskal’s, Prim’s algorithm)
● Huffman Coding (optimal data compression)

ADSA UNIT-III JSR 3


Greedy Method – Advantages vs Disadvantages

Advantages Disadvantages

Simple and easy to implement Not always optimal (may miss the best
solution)

Fast and efficient (often O(n log n)) Works only if problem satisfies Greedy
Choice Property + Optimal Substructure

Requires less memory May fail for complex problems (e.g., 0/1
Knapsack, TSP)

Gives optimal solutions for many standard Each problem needs a correctness proof
problems (MST, Huffman, Activity Selection) (cannot assume greedy works)

JOB SEQUENCING WITH DEADLINES

This problem consists of n jobs each associated with a deadline and profit and
our objective is to earn maximum profit. We will earn profit only when the job is
completed on or before the deadline. We assume that each job will take unit time to
complete.
Points to remember:

● In this problem we have n jobs j1, j2, … jn, each has an associated deadlines are d1,
d2, … dn and profits are p1, p2, ... pn.
● Profit will only be awarded or earned if the job is completed on or before the
deadline.
● We assume that each job takes unit time to complete.
● The objective is to earn maximum profit when only one job can be scheduled or
processed at any given time

Example: Consider the following 5 jobs and their associated deadline and profit.

index 1 2 3 4 5
JOB j1 j2 j3 j4 j5
2 1 3 2 1
DEADLINE
PROFIT 60 100 20 40 20

ADSA UNIT-III JSR 4


 Sort the jobs according to their profit in descending order.
Note! If two or more jobs are having the same profit then sorts them as per their entry in the job
list.

index 1 2 3 4 5

JOB j2 j1 j4 j3 j5

DEADLINE 1 2 2 3 1

PROFIT 100 60 40 20 20

 Find the maximum deadline value

Looking at the jobs we can say the max deadline value is 3. So, dmax = 3

As dmax = 3 so we will have THREE slots to keep track of free time slots. Set the time slot
status to EMPTY
time slot 1 2 3

status EMPTY EMPTY EMPTY

Total number of jobs is 5. So we can write n = 5. Note!


If we look at job j2, it has a deadline 1. This means we have to complete job j2 in time slot
1 if we want to earn its profit.

Similarly, if we look at job j1 it has a deadline 2. This means we have to complete job j1
on or before time slot 2 in order to earn its profit.

Similarly, if we look at job j3 it has a deadline 3. This means we have to complete job j3
on or before time slot 3 in order to earn its profit.

Our objective is to select jobs that will give us higher profit.

time slot 1 2 3

Job J1 J2 J4

Profit 100 60 20

Total Profit is 180

ADSA UNIT-III JSR 5


Algorithm:
Step 1: Sort all jobs in decreasing order of profit.
Step 2: Initialize an array of slots (size = max deadline). Each slot represents a unit of time.
Step 3: For each job in the sorted list:

● Find the latest free slot before its deadline.

● If a free slot is available, assign the job to that slot.


● Otherwise, discard the job.

Step 4: Return the scheduled jobs and the total profit.

Pseudo Code:

JobSequencing(jobs):

1. Sort jobs by profit in decreasing order

2. maxDeadline = maximum of all deadlines

3. Create slot[maxDeadline] = {empty}

4. For each job in jobs (sorted):

for s = min([Link], maxDeadline) downto 1:

if slot[s] is empty:

slot[s] = job

break

5. Return slot[] and total profit

Knapsack Problem
Firstly, we have given a knapsack of the maximum capacity of m kg and n items with their weight and
profit. Fill in the knapsack with a subset of items such that the selected weight is less than or equal to the
capacity of the knapsack and the profit of items is maximum.

Algorithm of solving Knapsack Problem using Greedy Method:


->Assume a knapsack of capacity M and n items having profit pi and weight wi

->Sort items by profit/weight ratio: pi/wi

->Consider items in order of decreasing ratio

->Take as much of each item as possible.

ADSA UNIT-III JSR 6


->Traverse this sorted list as:

Example:-

Let us consider that the capacity of the knapsack M=60 and the list of provided items are shown in the
following table-

However, the provided items are not sorted based on (pi/wi). After sorting, the items are shown in the
following table.

Solution:-

Afterward, sorting all the items according to (pi/wi), first item B is chosen as the weight of B is less than
the capacity of the knapsack. Next, item A is chosen, as the available capacity of the knapsack is greater
than the weight of A. Then, C is chosen as the next item.

However, the whole item cannot be chosen as the remaining capacity of the knapsack is less than the

ADSA UNIT-III JSR 7


weight of C.

Hence, a fraction of C (i.e.(60-50)/20)is chosen.

Chiefly, the capacity of the knapsack is equal to the selected items. Hence, no more items can be selected.

So, the total weight of the selected items is 10+40+20*(10/20)=60

Also, the total profit is 100+280+120*(10/20)=380+60=440

Hence, this is the optimal solution. Moreover, we cannot gain more profit by selecting any different
combination of items.

Minimum cost spanning trees


Definition: A spanning tree of a connected graph is its connected acyclic subgraph (i.e., a tree) that
contains all the vertices of the graph. A minimum spanning tree of a weighted connected graph is its
spanning tree of the smallest weight, where the weight of a tree is defined as the sum of the weights on all
its edges. The minimum spanning tree problem is the problem of finding a minimum spanning tree for a
given weighted connected graph.

Properties of a Spanning Tree

The spanning tree holds the below-mentioned principles:

● The number of vertices (V) in the graph and the spanning tree is the same.
● There is a fixed number of edges in the spanning tree which is equal to one less than the total
number of vertices ( E = V-1 ).
● The spanning tree should not be disconnected, as in there should only be a single source of
component, not more than that.
● The spanning tree should be acyclic, which means there would not be any cycle in the tree.
● The total cost (or weight) of the spanning tree is defined as the sum of the edge weights of all
the edges of the spanning tree.
● There can be many possible spanning trees for a graph.

ADSA UNIT-III JSR 8


Applications of Minimum Spanning Trees:
● Network design: Spanning trees can be used in network design to find the minimum number
of connections required to connect all nodes. Minimum spanning trees, in particular, can help
minimize the cost of the connections by selecting the cheapest edges.
● Image processing: Spanning trees can be used in image processing to identify regions of
similar intensity or color, which can be useful for segmentation and classification tasks.
● Biology: Spanning trees and minimum spanning trees can be used in biology to construct
phylogenetic trees to represent evolutionary relationships among species or genes.
● Social network analysis: Spanning trees and minimum spanning trees can be used in social
network analysis to identify important connections and relationships among individuals or
groups.

Prim’s Algorithm
● Idea: Grow MST from a starting vertex by adding the smallest edge that connects a vertex inside
the tree to one outside.

Steps:
1. Start with any vertex.

2. Choose the minimum weight edge from this vertex to a new vertex.
3. Keep adding the minimum edge that connects a visited vertex to an unvisited vertex.
4. Repeat until all vertices are included.

Time Complexity:
● Using adjacency matrix: O(V²)

● Using min-heap & adjacency list: O(E log V)

ADSA UNIT-III JSR 9


ADSA UNIT-III JSR 10
ADSA UNIT-III JSR 11
ADSA UNIT-III JSR 12
Kruskal’s Algorithm
● Idea: Sort edges by weight, then add them one by one if they don’t form a cycle (use Disjoint Set
Union – DSU/Union-Find).

Steps:
1. Sort all edges in non-decreasing order of weight.

2. Pick the smallest edge. If it doesn’t form a cycle, include it in MST.


3. Repeat until MST has (V−1) edges.

Time Complexity:
● Sorting edges: O(E log E) ≈ O(E log V)

The graph contains 9 vertices and 14 edges. So, the minimum spanning tree formed will have (9
- 1) = 8 edges.

ADSA UNIT-III JSR 13


ADSA UNIT-III JSR 14
ADSA UNIT-III JSR 15
ADSA UNIT-III JSR 16
Single Source Shortest Paths
Definition: Find the shortest distance from a source vertex to all other vertices in a weighted
graph.
Dijkstra’s Algorithm
● Works for non-negative weights.

● Greedy approach.

Steps:
1. Assign distance 0 to source and ∞ to all others.
2. Pick the unvisited vertex with the smallest distance.
3. Update distances of its neighbors.
4. Repeat until all vertices are visited.
Complexity:

● Using simple array: O(V²)

● Using min-heap (priority queue): O(E log V)

Given a weighted undirected graph represented as an edge list and a source vertex src, find the
shortest path distances from the source vertex to all other vertices in the graph. The graph
contains V vertices, numbered from 0 to V - 1.
Note: The given graph does not contain any negative edge.

Examples:
Input: src = 0, V = 5, edges[][] = [[0, 1, 4], [0, 2, 8], [1, 4, 6], [2, 3, 2], [3, 4, 10]]

ADSA UNIT-III JSR 17


Step-by-Step Implementation
1. Set dist[source]=0 and all other distances as infinity.
2. Push the source node into the min heap as a pair <distance, node> → i.e., <0,
source>.

3. Pop the top element (node with the smallest distance) from the min heap.
1. For each adjacent neighbor of the current node:
2. Calculate the distance using the formula:
dist[v] = dist[u] + weight[u][v]
If this new distance is shorter than the current dist[v], update it.
Push the updated pair <dist[v], v> into the min heap

4. Repeat step 3 until the min heap is empty.


5. Return the distance array, which holds the shortest distance from the source to all
nodes.

ADSA UNIT-III JSR 18


ADSA UNIT-III JSR 19
Dynamic Programming
General Method
Dynamic programming is a name coined by Richard Bellman in 1955. Dynamic programming,
as a greedy method, is a powerful algorithm design technique that can be used when the solution
to the problem may be viewed as the result of a sequence of decisions.
In the greedy method we make irrevocable decisions one at a time, using a greedy
criterion. However, in dynamic programming we examine the decision sequence to see whether
an optimal decision sequence contains optimal decision subsequence. When optimal decision
sequences contain optimal decision subsequences, we can establish recurrence equations, called
dynamic-programming recurrence equations, that enable us to solve the problem in an efficient
way.
Dynamic programming is based on the principle of optimality (also coined by Bellman).
The principle of optimality states that no matter whatever the initial state and initial decision are,
the remaining decision sequence must constitute an optimal decision sequence with regard to the

ADSA UNIT-III JSR 20


state resulting from the first decision. The principle implies that an optimal decision sequence
consists of optimal decision subsequences. Since the principle of optimality may not hold for
some formulations of some problems, it is necessary to verify that it does hold for the problem
being solved.
Dynamic programming cannot be applied when this principle does not hold. The steps in
a dynamic programming solution are:
1. Verify that the principle of optimality holds
2. Set up the dynamic-programming recurrence equations
3. Solve the dynamic-programming recurrence equations for the value of the optimal
solution.
4. Performa trace back step in which the solution itself is constructed.

What is principal of optimality?

Principle of optimality: Suppose that in solving a problem, we have to make a sequence of


decisions D1, D2, …, Dn. If this sequence is optimal, then the last k decisions, 1 <k <n must
be optimal.
e.g. the shortest path problem
If i, i1, i2, …, j is a shortest path from i to j, then i1, i2, …, j must be a shortest path from i1to j.

Multi Stage graphs


A multi-stage graph is a directed graph in which vertices are divided into stages (or levels),
and edges are allowed only from one stage to the next stage.

It is mainly used to solve optimization problems (e.g., shortest path, minimal cost, maximal
profit).

Characteristics
1. The graph is divided into k stages (from source to destination).

2. Source node is in stage 1, destination node in stage k.


3. Edges go only from one stage to the next stage (not backward).
4. Used with Dynamic Programming to find the optimal path.

Approaches to Multistage Graph

• Forward Approach

• Backward Approach

ADSA UNIT-III JSR 21


Backward Approach

A multistage graph G = (V, E) is a directed graph where vertices are partitioned into k (where k
> 1) number of disjoint subsets S = {s1,s2,,sk} such that edge (u, v) is in E, then u Є si and v Є
s1 + 1 for some subsets in the partition and |s1| = |sk| = 1.

The vertex s Є s1 is called the source and the vertex t Є sk is called sink.

G is usually assumed to be a weighted graph. In this graph, the cost of an edge (i, j) is
represented by c(i, j). Hence, the cost of path from source s to sink t is the sum of costs of each
edge in this path.

The multistage graph problem is finding the path with minimum cost from source s to sink t.

Example
Find minimum path cost between vertex s and t for following multistage graph using dynamic
programming.

Solution:In the above graph, the cost of an edge is represented as c(i, j). We need to find the
minimum cost path from vertex 1 to vertex 12. Using the below formula we can find the shortest
cost path from source to destination:
cost(i,j)=min{c(j,l)+cost(i+1,l)}
Step 1:Stage 5
cost(5,12)=c(12,12)=0
We use a forward approach here therefore (cost(5,12) = 0 ). Here, 5 represents the stage number
and 12 represents a node in that stage. Since there are no outgoing edges from vertex 12, the cost
is 0 and D[12]=12
Step 2:Stage 4
cost(4,9)= c(9,12) = 4 D[9]=12
cost(4,10) = c(10,12) = 2 D[10]=12

ADSA UNIT-III JSR 22


cost(4,11) = c(11,12) = 5 D[11]=12
Step 3:Stage 3
cost(3,6)=min{c(6,9)+cost(4,9), c(6,10)+cost(4,10)}
= min{6+4,5+2} =min{10,7}=7 D[6]=10
cost(3,7)=min{c(7,9)+cost(4,9), c(7,10)+cost(4,10)}
= min{4+4,3+2} =min{8,5}=5 D[7]=10
cost(3,8)=min{c(8,10)+cost(4,10), c(8,11)+cost(4,10)}
= min{5+2,6+5} =min{7,11}=7 D[7]=10
Step 4:Stage 2
cost(2,2) = min{c(2,6)+cost(3,6), c(2,7)+cost(3,7), c(2,8)+cost(3,8)}
= min{4+7,2+5,1+7}=min{11,7,8}=7 D[2]=7
cost(2,3) = min{c(3,6)+cost(3,6), c(3,7)+cost(3,7)}
= min{2+7,7+5}=min{9,12}=9 D[3]=6
cost(2,4) = min{c(4,8)+cost(3,8)}
= min{11+7}=min{18}=18 D[4]=8
cost(2,5) = min{c(5,7)+cost(3,7), c(5,8)+cost(3,8)}
= min{11+5,8+7}=min{16,15}=15 D[3]=8
Step 5:Stage 1
cost(1,1) = min(c(1,2)+cost(2,2), c(1,3)+cost(2,3), c(1,4)+cost(2,4), c(1,5)+cost(2,5)}
= min{9+7, 7+9, 3+18, 2+15}=min{16,16,21,17}=16
D[1] = 2 (we can take 3 also )
The path through which we have to find the shortest distance
Start from vertex — 2
D ( 1) = 2
D ( 2) = 7
D ( 7) = 10
D (10) = 12
The cost is 9+2+3+2=16
ANALYSIS: The time complexity of this forward method is �(|�| + |�|)

So, the minimum –cost path is,

ADSA UNIT-III JSR 23


Forward Approach:
If there are ‘K’ stages in a graph using a backward approach. we will find out the cost of each &
every vertex starting from 1st stage to the kth stage. We will find out the minimum cost path
from destination to source (i.e.,) from stage k to stage 1
Method
1. Initialize: dist[source] = 0, and all other distances as ∞.

2. Process nodes in increasing stage order (source → destination).

3. For each edge (u → v) with weight w, relax it:

4. Continue until the last stage.

5. Finally, dist[destination] gives the shortest path cost.


Example (same graph as before)
Stages:
● Stage 1: 1 (source)

● Stage 2: 2, 3

● Stage 3: 4, 5

● Stage 4: 6 (destination)

Edges:
● 1→2=2, 1→3=3

● 2→4=2, 2→5=3

● 3→4=1, 3→5=4

● 4→6=3, 5→6=2
Step-by-step Forward Computation
Initialization:
● dist[1] = 0

● dist[2..6] = ∞

Stage 1 → Stage 2
● From 1→2:
dist[2] = min(∞, 0+2) = 2

ADSA UNIT-III JSR 24


● From 1→3:
dist[3] = min(∞, 0+3) = 3

Now:
dist = [0, 2, 3, ∞, ∞, ∞]
Stage 2 → Stage 3
● From 2→4: dist[4] = min(∞, 2+2) = 4

● From 2→5: dist[5] = min(∞, 2+3) = 5

● From 3→4: dist[4] = min(4, 3+1) = 4 (no change)

● From 3→5: dist[5] = min(5, 3+4) = 5 (no change)

Now:
dist = [0, 2, 3, 4, 5, ∞]
Stage 3 → Stage 4
● From 4→6: dist[6] = min(∞, 4+3) = 7

● From 5→6: dist[6] = min(7, 5+2) = 7 (no change)

Now:
dist = [0, 2, 3, 4, 5, 7]
Final Result
● Shortest path cost from 1 → 6 = 7

● Possible shortest paths:

○ 1→2→4→6

○ 1→3→4→6

○ 1→2→5→6

(All have cost 7).

Applications
● Shortest path problems.
● Project scheduling.
● Network routing.
● Resource allocation problems.
All pairs shortest paths

ADSA UNIT-III JSR 25


The Floyd-Warshall algorithm is a method used in computer science to find the shortest paths
between all pairs of nodes in a [Link] algorithm is particularly useful when you need to know
the shortest path for every possible pair of nodes, making it a key technique in solving all-pairs
shortest path problems. A graph is like a map that shows how different points (called nodes) are
connected by lines (called edges).

Each edge has a number, called a weight, that represents the distance or cost to travel between two
nodes. The goal of the Floyd-Warshall algorithm is to find the shortest possible path between every
pair of nodes in the graph.

The algorithm works by checking every possible path between the nodes and updating the shortest
known distance between them. It repeats this process for each node in the graph until it has
considered all possible paths, ensuring that the shortest path is found for every pair of nodes.

Key Concepts in Floyd-Warshall Algorithm

● Dynamic Programming: This is a method used to solve problems by breaking them


down into smaller, simpler subproblems. In the Floyd-Warshall algorithm, dynamic
programming is used to update and keep track of the shortest paths between nodes by
gradually improving the known distances as more paths are considered.

● All-Pairs Shortest Paths: The Floyd-Warshall algorithm finds the shortest paths for all
pairs of nodes in the graph, not just from one node to all others. This means that after
running the algorithm, you know the shortest path between any two nodes in the graph.

● Matrix Representation: The algorithm uses a matrix (a table with rows and columns) to
represent the graph and store the distances between nodes. Each cell in the matrix
represents the distance between two nodes. As the algorithm runs, it updates the matrix to
reflect the shortest known distances between all pairs of nodes.

Example Walkthrough

Let’s walk through an example using a simple graph with 4 nodes (A, B, C, D) and the following

weighted edges:

ADSA UNIT-III JSR 26


The graph can be represented as an adjacency matrix:

Where "∞" represents that there is no direct path between the nodes.
Step 1: Initialization
Initialize the dist[][] matrix as the adjacency matrix of the graph.
Step 2: Iteration through all nodes (k)
● For each node k, consider it as an intermediate node, and update the shortest paths.
Iteration with k = A:
Update distances considering A as an intermediate node:
● No updates needed as there are no shorter paths through A.
Iteration with k = B:
Update distances considering B as an intermediate node:
● dist[A][C] = min(dist[A][C], dist[A][B] + dist[B][C]) = min(∞, 3 + 1) = 4
● dist[D][C] remains unchanged as there is no shorter path.
Updated matrix:

Iteration with k = C:
Update distances considering C as an intermediate node:
● dist[A][D] = min(dist[A][D], dist[A][C] + dist[C][D]) = min(7, 4 + 2) = 6
● dist[B][D] = min(dist[B][D], dist[B][C] + dist[C][D]) = min(∞, 1 + 2) = 3
Updated matrix:

ADSA UNIT-III JSR 27


Iteration with k = D:
Update distances considering D as an intermediate node:
● dist[A][B] and dist[C][B] remain unchanged as there are no shorter paths through D.
Final matrix:

This matrix represents the shortest paths between all pairs of nodes.

Floyd-Warshall Algorithm Time Complexity

The time complexity of the Floyd-Warshall algorithm is O(V^3).

Aspect Complexity Explanation

Time O(V^3) The algorithm uses three nested loops, each iterating
Complexity over the vertices (V), leading to O(V^3).


V represents the number of vertices in the graph.

● The time complexity is O(V^3) because the algorithm checks all pairs of vertices for each
possible intermediate vertex.

ADSA UNIT-III JSR 28


Floyd-Warshall Algorithm Space Complexity

The space complexity of the Floyd-Warshall algorithm is O(V^2).

Aspect Complexity Explanation

Space O(V^2) The algorithm uses a 2D matrix to store the shortest


Complexity path distances between all pairs of nodes, requiring
O(V^2) space.

The space complexity is O(V^2) due to the use of a matrix that stores the shortest path between
every pair of vertices.

Applications of Floyd-Warshall Algorithm

● Network Routing: Finds the shortest paths between all pairs of nodes in communication
networks to optimize routing protocols.

● Urban Planning: Helps in determining the most efficient routes between multiple
locations in a city for infrastructure development.

● Traffic Management: Assists in calculating the shortest travel paths between


intersections in road networks.

● Social Network Analysis: Used to determine the shortest connection paths between
individuals in social networks.

● Computer Graphics: Computes shortest paths in 3D meshes for rendering and


animation.

● Telecommunications: Optimizes signal transmission paths across various nodes in


telecommunication networks.

● Flight Scheduling: Determines the shortest flight routes between multiple airports for
airlines.

● Game Development: Used in pathfinding algorithms to determine the shortest route


between points in a game environment.

ADSA UNIT-III JSR 29


Floyd-Warshall vs Dijkstra vs Bellman-Ford Algorithms

Aspect Floyd-Warshall Dijkstra’s Bellman-Ford


Algorithm Algorithm Algorithm

Purpose Finds the shortest Finds the shortest Finds the shortest path
paths between all pairs path from a single from a single source to
of nodes source to all other all other nodes
nodes

Graph Type Weighted, directed or Weighted, directed Weighted, directed or


undirected, with or undirected, with undirected, with positive
positive or negative non-negative weights or negative weights
weights

Handles Yes, but cannot handle No Yes, and can detect


Negative negative weight cycles negative weight cycles
Weights

Time O(V^3) O((V + E) log V) O(V * E)


Complexity with a priority queue

Space O(V^2) O(V) with a priority O(V)


Complexity queue

Best for Dense graphs or when Sparse graphs or Graphs with negative
all-pairs shortest paths when shortest path weights or where
are needed from a single source negative weight cycle
is needed detection is needed

Algorithm Dynamic Greedy Dynamic Programming


Type Programming

Updates Iteratively updates the Iteratively relaxes Iteratively relaxes all


Distances distance matrix for all edges from the edges V-1 times
pairs of nodes source node

ADSA UNIT-III JSR 30


Output Matrix of shortest Shortest path Shortest path distances
paths between all pairs distances from from source to all other
of nodes source to all other nodes
nodes

Use Case Network routing with GPS navigation, Finding shortest paths in
Examples multiple connections, shortest path in road graphs with negative
traffic management networks weights, detecting
negative cycles

Optimality Provides exact shortest Provides exact Provides exact shortest


paths for all pairs of shortest paths from paths from source to all
nodes source to all other other nodes
nodes

Single Source Shortest Paths

General Weights (Bellman Ford Algorithm)


The single source shortest path algorithm (for arbitrary weight positive or negative) is also
known Bellman-Ford algorithm is used to find minimum distance from source vertex to any
other vertex. The main difference between this algorithm and Dijkstra’s algorithm is, in
Dijkstra’s algorithm we cannot handle the negative weight, but here we can handle it easily.

ADSA UNIT-III JSR 31


Bellman-Ford algorithm finds the distance in a bottom up manner. At first it finds those distances
which have only one edge in the path. After that increase the path length to find all possible
solutions.

ADSA UNIT-III JSR 32


Optimal Binary Search Trees
OBST is a binary search tree which provides the smallest possible search time (or expected
search) for a given sequence of accesses (or access probabilities).

The search time can be improved in Optimal Cost Binary Search Tree, placing the most
frequently used data in the root and closer to the root element, while placing the least
frequently used data near leaves and in leaves.

Eg.

For our tiny example, we could find the optimal tree by generating all 14 binary search trees
with 4 keys. As a general algorithm, this exhaustive- search approach is unrealistic: the total
number of binary search trees with n keys is equal to the nth Catalan number,

ADSA UNIT-III JSR 33


Given a set of identifiers {a1,a2,..,an}. Suppose we need to construct a binary search tree and
p(i) be the probability with which we search for ai then:

If a binary search tree represents n identifiers, then there will be exactly n internal nodes and
n+1 external nodes. Every node internal node represents a point where a successful search
may terminate. Every external node represents a point where an unsuccessful search may
terminate.

If a successful search terminates at an internal node at level l, then l comparison is needed.


Hence the expected cost contribution from the internal node for ai is p(i)*level(ai).

The identifiers not in the binary search tree can be partitioned into n+1 equivalence classes Ei,
0 ≤ i ≤ n. If the failure node for Ei is at level l, then only l -1 comparison are needed.

Let q(i) be the probability that the identifier x being searched for is in then clearly

and the cost contribution for the failure node for Ei is q(i)*(level (Ei) -
1).

There fore, the cost of the optimal binary search tree is:

Time complexity: The computing time for above algorithm is O(n2). To construct obst from r[i,j] is
O(n). So total time to construct obst is O(n3).

Space complexity = O(n2)

ADSA UNIT-III JSR 34


ADSA UNIT-III JSR 35
ADSA UNIT-III JSR 36
ADSA UNIT-III JSR 37
ADSA UNIT-III JSR 38
ADSA UNIT-III JSR 39
Successful Search cost of the tree = 1(2) + 4(1) + 2(2) + 1(3) = 13.

Unsuccessful search cost of the tree = 4(3-1) + 2(3-1) + 4(3-1) + 1(4-1) + 1(4-1) = 26

So, total cost of the tree = 13+26 =39.

Optimal Binary Search Tree with Successful search probabilities with suitable examples.

• OBST is a binary search tree which provides the smallest possible search time
(or expected search) for a given sequence of accesses (or access probabilities).

ADSA UNIT-III JSR 40


• The search time can be improved in Optimal Cost Binary Search Tree, placing
the most frequently used data in the root and closer to the root element, while placing
the least frequently used data near leaves and in leaves.

• Given a set of identifiers {a1,a2,..,an}. Suppose we need to construct a binary


search tree and p(i) be the probability with which we search for ai then:
If a successful search terminates at an internal node at level l, then l
comparison is needed. Hence the expected cost contribution from the internal node for
ai is p(i)*level(ai).

• There fore, the cost of the optimal binary search tree is:

ADSA UNIT-III JSR 41


Time Complexity: The computing time for above algorithm is O(n2). To construct obst from
r[i,j] is O(n). So total time to construct obst is O(n3).

Space complexity: O(n2)

ADSA UNIT-III JSR 42


ADSA UNIT-III JSR 43
ADSA UNIT-III JSR 44
0/1 Knapsack

Knapsack Problem
Given a set of items, each with a weight and a value, determine a subset of items to include
in a collection so that the total weight is less than or equal to a given limit and the total value
is as large as possible.

Fractional Knapsack

ADSA UNIT-III JSR 45


In this case, items can be broken into smaller pieces, hence we can select fractions of items.

According to the problem statement,

● There are n items in the store


● Weight of ith item wi > 0
● Profit for ith item pi>0 and
● · Capacity of the Knapsack is W

In this version of Knapsack problem, items can be broken into smaller pieces. So, the thief
may take only a fraction xi of ith item.
0 ≤ xi ≤ 1

0/1 Knapsack:
In this item cannot be broken which means we should take the item as a whole or should
leave it. That's why it is called 0/1 knapsack Problem.

The classic dynamic programming approach, on the other hand, works

bottom up: it fills a table with solutions to all smaller subproblems, but each of them is solved
only once. An unsatisfying aspect of this approach is that solutions to some of these smaller
subproblems are often not necessary for getting a solution to the problem given. Since this
drawback is not present in the top-down approach, it is natural to try to combine the strengths
of the top-down and bottom-up approaches.

The goal is to get a method that solves only subproblems that are necessary and does so only
once. Such a method exists; it is based on using memory functions.

This method solves a given problem in the top-down manner but, in addition, maintains a
table of the kind that would have been used by a bottom-up dynamic programming algorithm.

ADSA UNIT-III JSR 46


Initially, all the table’s entries are initialized with a special ―null‖ symbol to indicate that
they have not yet been calculated. Thereafter, whenever a new value needs to be calculated,
the method checks the corresponding entry in the table first: if this entry is not ―null,‖ it is
simply retrieved from the table; otherwise, it is computed by the recursive call whose result is
then recorded in the table.

Time Complexity:

The time efficiency and space efficiency of this algorithm are both in θ(nW). The time

needed to find the composition of an optimal solution is in O(n).

ADSA UNIT-III JSR 47


ADSA UNIT-III JSR 48
ADSA UNIT-III JSR 49
Travelling Salesperson problem

Given a set of cities and distance between every pair of cities, the problem is to find the
shortest possible route that visits every city exactly once and returns to the starting point.

ADSA UNIT-III JSR 50


Let G=(V,E) be a directed graph with edge cost Cij. The variable Cij is define such that Cij > 0
every i,j and Cij = ∞ if (i,j) ∈ E. Let |V| = n and assume n>1.
A tour of G is a directed simple cycle that include every vertex in V. The cost of a tour is the
sum of the cost of the edges on the tour. The travelling salesperson problem is to find a tour
of minimum cost without loss of generality, assume a tour is a simple path that starts and ends
at vertex 1.

Every tour consists of an edge (1,k) for some k ∈ V-{1} and a path from vertex k to vertex 1.
The path from vertex k to vertex 1 goes through each vertex in V-{1,k} exactly once. It is
easy to see that if the tour is optimal, then the path from k to 1 must be a shortest k to1 path
going through all vertices in V-{1,k}

Let g(i,S) be the length of a shortest path starting at vertex i, going through all vertices in S
and terminating at vertex 1. The function g(1, V-{1}) is the length of an optimal salesperson
tour.
From the principal of optimality, it follows that g(1,V-

{1}) = min {C1k + g(k, V-{1,k})}

2≤k≤n

Generalizing above, we obtain for i E S

g(i,S) = min {Cij+g(j,S-


{j})} -- (1) j ∈ S
Time complexity: O(n22n) as the computation of g(i,S) with|S| = k requires k-1 comparisons
when solving equation (1).

Space Complexity: O(n.2n)

ADSA UNIT-III JSR 51


ADSA UNIT-III JSR 52
ADSA UNIT-III JSR 53
1
2
UNIT-IV

Backtracking- General Method- 8 Queens Problem- Sum of subset problem- Graph coloring- Hamiltonian
Cycles- 0/1Knapsack Problem

Backtracking- General Method:


 The name Backtrack was first coined by [Link] in 1950’s.
 It is a method of determining the correct solution to a problem by examining all the available paths.
 If a particular path leads to unsuccessful solution then its previous solution is examined in-order to final
correct solution.
 In many applications of the backtrack method, the desired solution is expressible as an n-tuple (x1, x2,..
xn) where xi is chosen from some finite set Si. Often the problem to be solved calls for finding one vector
that maximizes or minimizes or satisfies a criterion function P(x1,x2… xn)
 In Brute force algorithm, we consider all feasible solutions for finding optimal solution.
 In Backtracking algorithm, it is having ability to yield same answer with far fewer than m trails.
 Many of the problems, we solve using backtracking require that all solutions satisfy the complex set of
constraints.
 Two types of Constraints
1. Explicit Constraints are the rules that restrict each xi to take on values only from a given set.
Eg: xi>=0 or Si= {all non negative real numbers}
Xi= 0 or 1 or Si= { 0 , 1}
2. Implicit Constraints are the rules that determine which of the tuples in the solution space I satisfy the
criterion functions. Thus Implicit Constraints describe the way in which the xi must relate to each other.

Some Important Definitions:

1
2
Recursive Backtracking Algorithm:

Iterative Backtracking Algorithm:

3
Applications of Backtracking:
 Backtracking method is applied to solve various problems like:
1. N Queens Problem
2. Sum of Subsets Problem
3. Graph Coloring
4. Hamiltonian Cycles
5. Knapsack Problem

N Queens Problem (8 Queens Problem)


 N Queens Problems means:
1. Place N Queens placed on N X N chess board.
2. No Two Queens are placed in same row or same column or diagonal.
3. No Two Queens attack to each other.

4-Queens Problem solution:

4
4- Queens Problem –state space tree:

5
N-Queens Problem- algorithm1: Placing a new queen in kth row & ith column.

N-Queens Problem- algorithm2: All solutions for N Queens Problem.

6
8-Queens Problem solution:

Sum of subset problem

7
Sum of Subsets Problem-Algorithm

8
Sum of Subsets Problem-Example

9
Graph coloring:
 Let G be a graph and m be a positive integer.
 It is to find whether that nodes of G can be colored in such a way that no two adjacent nodes have the same
color
yet only m colors are used where m is a chromatic number.
 If d is degree of a given graph G, then it is colored with d+ 1 colors.
 Degree means number of edges connected to that node.

10
Graph coloring- m coloring algorithm

Graph coloring- state space tree

11
Graph coloring- generating color algorithm

Graph coloring- another example

12
Hamiltonian Cycles

Hamiltonian Cycles-Generating a next vertex Algorithm

13
Hamiltonian Cycles-Finding all Hamiltonian Cycles Algorithm

Knapsack Problem

14
Knapsack Problem- Bounding Function Algorithm

Backtracking Knapsack Problem- Algorithm

15
Knapsack Problem- Example

16
17
18
Branch and Bound: The method, Travelling salesperson, 0/1 Knapsack problem.

Branch and Bound (B & B)- The method:


 B & B is a general algorithmic method for finding optimal solutions of various problems.
 In B & B, a state space tree is built and all the children of E-nodes are generated before any other node
become a live node.
 E node is a live node that can be expanded to generate its children node.
 Live node is a node that can be expanded without generating its children node.
 B & B is used only for optimization problem.
 B & B needs two additional values when compared to backtracking.
1. A bound value of objective function for every node of state space tree.
2. Value of best solution is compared to node’s bound.
 If node’s bound is better than best solution node is terminated.
 Lower bound is for minimization problems.
 Upper bound is for maximization problems.
 In Branching, we define tree structure from set of candidates in a recursive manner.
 In Bounding, we calculate lower bound & upper bound of each node in the tree.
 Lower bound > Upper bound first node is discarded from the search Pruning.
 B & B is based on advanced BFS which is done with priority queue instead of traditional list. That
means highest priority element is always on first position.
 Bounding functions are useful because it doesn’t allow to generate sub tree that has no answer nodes.
 3 types of search strategies:
1. FIFO (First- In- First- Out) Search or BFS.
2. LIFO (Last- In- First- Out) Search or DFS.
3. Least Count (LC) Search.

Difference between Backtracking & Branch and Bound:

1
FIFO (First- In- First- Out) Search or BFS:

LIFO (Last- In- First- Out) Search or DFS:

2
Least Count (LC) Search:

3
Travelling Sales Person using B & B

4
TSP Example:

5
6
7
8
9
10
11
12
0/1 Knapsack problem using B & B

13
14
15
16
UNIT-V
P, NP, NP-COMPLETE, NP-HARD PROBLEMS

Introduction to Problems:

Types of Algorithms

 Two types of Algorithms:


1. Deterministic Algorithm: It has a property that result of every operation is uniquely defined.

1
2. Non Deterministic Algorithm: It terminates unsuccessfully if and only if there exists no set of choices
leading to a success signal.
 To specify such algorithms, we introduce 3 functions:

2
3
4
Cook’s Theorem:

5
6
Satifiability problem

7
3 CNF Satifiability problem

8
9
10
11
12
NP Hard Scheduling Problems: Scheduling Identical Processors, Job Shop Scheduling

13
14
15

You might also like