Advanced Data Structures & Algorithms
Advanced Data Structures & Algorithms
Algorithm:
Algorithm is a step-by-step procedure, which defines a set of instructions to be executed in a
certain order to get the desired output. Algorithms are generally created independent of
underlying languages, i.e. an algorithm can be implemented in more than one programming
language.
From the data structure point of view, following are some important categories of algorithms
−
Search − Algorithm to search an item in a data structure.
Sort − Algorithm to sort items in a certain order.
Insert − Algorithm to insert item in a data structure.
Update − Algorithm to update an existing item in a data structure.
Delete − Algorithm to delete an existing item from a data structure.
How to Write an Algorithm?
There are no well-defined standards for writing algorithms. Rather, it is problem and
resource dependent. Algorithms are never written to support a particular programming
code.
As we know that all programming languages share basic code constructs like loops (do,
for, while), flow-control (if-else), etc. These common constructs can be used to write
an algorithm.
We write algorithms in a step-by-step manner, but it is not always the case. Algorithm
writing is a process and is executed after the problem domain is well-defined. That is,
we should know the problem domain, for which we are designing a solution.
Example
Let's try to learn algorithm-writing by using an example.
Problem − Design an algorithm to add two numbers and display the result.
Step 1 − START
Step 2 − declare three integers a, b & c
Step 3 − define values of a & b
Step 4 − add values of a & b
Step 5 − store output of step 4 to c
Step 6 − print c
Step 7 − STOP
Algorithms tell the programmers how to code the program. Alternatively, the algorithm can be
written as −
Step 1 − START ADD
Step 2 − get values of a & b
Step 3 − c ← a + b
Step 4 − display c
Step 5 − STOP
In design and analysis of algorithms, usually the second method is used to describe an
algorithm. It makes it easy for the analyst to analyze the algorithm ignoring all unwanted
definitions. He can observe what operations are being used and how the process is flowing.
Writing step numbers, is optional.
We design an algorithm to get a solution of a given problem. A problem can be solved in more
than one ways.
Hence, many solution algorithms can be derived for a given problem. The next step is to
analyze those proposed solution algorithms and implement the best suitable solution.
Algorithm Analysis
Efficiency of an algorithm can be analyzed at two different stages, before implementation and
after implementation. They are the following −
A Priori Analysis − This is a theoretical analysis of an algorithm. Efficiency of an
algorithm is measured by assuming that all other factors, for example, processor speed,
are constant and have no effect on the implementation.
A Posterior Analysis − This is an empirical analysis of an algorithm. The selected
algorithm is implemented using programming language. This is then executed on target
computer machine. In this analysis, actual statistics like running time and space
required, are collected.
We shall learn about a priori algorithm analysis. Algorithm analysis deals with the execution
or running time of various operations involved. The running time of an operation can be
defined as the number of computer instructions executed per operation.
Algorithms as a Technology
Efficiency Different algorithms devised to solve the same problem often differ dramatically in
their efficiency. These differences can be much more significant than differences due to
hardware and software.
Algorithms and other technologies The example above shows that we should consider
algorithms, like computer hardware, as a technology. Total system performance depends on
choosing efficient algorithms as much as on choosing fast hardware. Just as rapid advances are
being made in other computer technologies, they are being made in algorithms as well. You
might wonder whether algorithms are truly that important on contemporary computers in light
of other advanced technologies, such as
advanced computer architectures and fabrication technologies,
easy-to-use, intuitive, graphical user interfaces (GUIs),
object-oriented systems,
integrated Web technologies, and
fast networking, both wired and wireless.
Insertion sort
Our first algorithm, insertion sort, solves the sorting problem introduced in Chapter 1: Input:
A sequence of n numbers ha1; a2;:::;ani. Output: A permutation (reordering) ha0 1; a0 2;:::;a0
ni of the input sequence such that a0 1 a0 2 a0 n. The numbers that we wish to sort are also
known as the keys. Although conceptually we are sorting a sequence, the input comes to us in
the form of an array with n elements.
Algorithm Complexity:
Suppose X is an algorithm and n is the size of input data, the time and space used by the
algorithm X are the two main factors, which decide the efficiency of X.
Time Factor − Time is measured by counting the number of key operations such as
comparisons in the sorting algorithm.
Space Factor − Space is measured by counting the maximum memory space required
by the algorithm.
The complexity of an algorithm f(n) gives the running time and/or the storage space required
by the algorithm in terms of n as the size of input data.
Space Complexity
Space complexity of an algorithm represents the amount of memory space required by the
algorithm in its life cycle. The space required by an algorithm is equal to the sum of the
following two components −
A fixed part that is a space required to store certain data and variables, that are
independent of the size of the problem. For example, simple variables and constants
used, program size, etc.
A variable part is a space required by variables, whose size depends on the size of the
problem. For example, dynamic memory allocation, recursion stack space, etc.
Space complexity S(P) of any algorithm P is S(P) = C + SP(I), where C is the fixed part and
S(I) is the variable part of the algorithm, which depends on instance characteristic I. Following
is a simple example that tries to explain the concept −
Algorithm: SUM(A, B)
Step 1 - START
Step 2 - C ← A + B + 10
Step 3 - Stop
Here we have three variables A, B, and C and one constant. Hence S(P) = 1 + 3. Now, space
depends on data types of given variables and constant types and it will be multiplied
accordingly.
Time Complexity
Time complexity of an algorithm represents the amount of time required by the algorithm to
run to completion. Time requirements can be defined as a numerical function T(n), where T(n)
can be measured as the number of steps, provided each step consumes constant time.
For example, addition of two n-bit integers takes n steps. Consequently, the total
computational time is T(n) = c ∗ n, where c is the time taken for the addition of two bits. Here,
we observe that T(n) grows linearly as the input size increases.
Asymptotic Notation:
The main idea of asymptotic analysis is to have a measure of efficiency of algorithms that
doesn’t depend on machine specific constants, and doesn’t require algorithms to be
implemented and time taken by programs to be compared. Asymptotic notations are
mathematical tools to represent time complexity of algorithms for asymptotic analysis. The
following 3 asymptotic notations are mostly used to represent time complexity of algorithms.
1) Θ Notation: The theta notation bounds a functions from above and below, so it defines
exact asymptotic behavior.
A simple way to get Theta notation of an expression is to drop low order terms and ignore
leading constants. For example, consider the following expression.
3n3 + 6n2 + 6000 = Θ(n3)
Dropping lower order terms is always fine because there will always be a n0 after which Θ(n3)
has higher values than Θn2) irrespective of the constants involved.
For a given function g(n), we denote Θ(g(n)) is following set of functions.
Θ(g(n)) = {f(n): there exist positive constants c1, c2 and n0 such
that 0 <= c1*g(n) <= f(n) <= c2*g(n) for all n >= n0}
The above definition means, if f(n) is theta of g(n), then the value f(n) is always between
c1*g(n) and c2*g(n) for large values of n (n >= n0). The definition of theta also requires that
f(n) must be non-negative for values of n greater than n0.
2) Big O Notation: The Big O notation defines an upper bound of an algorithm, it bounds a
function only from above. For example, consider the case of Insertion Sort. It takes linear time
in best case and quadratic time in worst case. We can safely say that the time complexity of
Insertion sort is O(n^2). Note that O(n^2) also covers linear time.
If we use Θ notation to represent time complexity of Insertion sort, we have to use two
statements for best and worst cases:
1. The worst case time complexity of Insertion Sort is Θ(n^2).
2. The best case time complexity of Insertion Sort is Θ(n).
The Big O notation is useful when we only have upper bound on time complexity of an
algorithm. Many times we easily find an upper bound by simply looking at the algorithm.
O(g(n)) = { f(n): there exist positive constants c and
n0 such that 0 <= f(n) <= c*g(n) for
all n >= n0}
Efficient algorithms are of paramount importance when working with advanced data structures.
Advanced data structures provide more specialized ways of organizing and storing data to
improve various aspects of algorithmic performance, such as time complexity, space
complexity, and overall efficiency. However, without efficient algorithms to operate on these
data structures, their potential benefits might not be fully realized. Here's why efficient
algorithms are crucial when working with advanced data structures:
Optimized Resource Usage: Advanced data structures are designed to optimize certain
operations, like insertion, deletion, search, etc. However, if inefficient algorithms are used, the
benefits of these data structures might be offset by poor performance. Efficient algorithms help
ensure that the advanced data structures are utilized optimally, leading to minimized resource
usage (CPU time, memory, etc.).
Time Complexity: Advanced data structures often offer improved time complexity for specific
operations compared to simpler data structures. For example, self-balancing binary search trees
like AVL trees or red-black trees have logarithmic height, which ensures efficient searching.
However, to maintain this time complexity, you need algorithms that take advantage of the
balanced properties. Inefficient algorithms might lead to worst-case behavior, degrading
performance.
Space Complexity: While advanced data structures might have more complex internal
arrangements, they often optimize space usage. For instance, a hash table with open addressing
can have better space efficiency compared to a basic array. Efficient algorithms ensure that this
space optimization is maintained and not squandered due to poor algorithmic choices.
Consistency: Advanced data structures often come with certain properties and invariants that
need to be maintained to ensure their efficiency. Efficient algorithms help in performing
operations that respect these properties, keeping the data structure in a consistent and optimized
state.
Scalability: As datasets grow larger, the impact of algorithmic inefficiencies becomes more
pronounced. Advanced data structures are often chosen to handle large-scale applications
precisely because of their efficiency improvements. Efficient algorithms are crucial for scaling
these structures to handle massive datasets.
Real-world Applications: Many real-world applications, such as databases, networking
protocols, compilers, and simulations, rely on advanced data structures to optimize their
operations. Without efficient algorithms, these applications might suffer from poor performance
and sluggish behavior.
Algorithmic Innovation: The study of advanced data structures and the development of
efficient algorithms often go hand in hand. The design and analysis of algorithms for these
structures can lead to new insights and innovations in computer science, driving progress in
various fields.
Competitive Advantage: In industries where performance is crucial, such as finance, gaming,
and machine learning, having efficient algorithms operating on advanced data structures can
provide a competitive advantage. Faster processing and response times can lead to better user
experiences and more efficient business operations.
Comparison of running times For each function f .n/ and time t in the following table,
determine the largest size n of a problem that can be solved in time t, assuming that the
algorithm to solve the problem takes f .n/ microseconds.
Recurrences:
Many algorithms are recursive in nature. When we analyze them, we get a recurrence relation
for time complexity. We get running time on an input of size n as a function of n and the
running time on inputs of smaller sizes. For example in merge sort, to sort a given array, we
divide it in two halves and recursively repeat the process for the two halves. Finally we merge
the results. Time complexity of Merge Sort can be written as T(n) = 2T(n/2) + cn.
1) Substitution Method: We make a guess for the solution and then we use mathematical
induction to prove the guess is correct or incorrect.
For example consider the recurrence T(n) = 2T(n/2) + n
We need to prove that T(n) <= cnLogn. We can assume that it is true
for values smaller than n.
T(n) = 2T(n/2) + n
<= cn/2Log(n/2) + n
= cnLogn - cnLog2 + n
= cnLogn - cn + n
<= cnLogn
2) Recurrence Tree Method: In this method, we draw a recurrence tree and calculate the
time taken by every level of tree. Finally, we sum the work done at all levels. To draw the
recurrence tree, we start from the given recurrence and keep drawing till we find a pattern
among levels. The pattern is typically a arithmetic or geometric series.
For example consider the recurrence relation
T(n) = T(n/4) + T(n/2) + cn2
cn2
/ \
T(n/4) T(n/2)
cn2
/ \
2
c(n )/16 c(n2)/4
/ \ / \
T(n/16) T(n/8) T(n/8) T(n/4)
Breaking down further gives us following
cn2
/ \
c(n2)/16 c(n2)/4
/ \ / \
c(n2)/256 c(n2)/64 c(n2)/64 c(n2)/16
/ \ / \ / \ / \
In recurrence tree method, we calculate total work done. If the work done at leaves is
polynomially more, then leaves are the dominant part, and our result becomes the work done
at leaves (Case 1). If work done at leaves and root is asymptotically same, then our result
becomes height multiplied by work done at any level (Case 2). If work done at root is
asymptotically more, then our result becomes work done at root (Case 3).
Examples of some standard algorithms whose time complexity can be evaluated using
Master Method
Merge Sort: T(n) = 2T(n/2) + Θ(n). It falls in case 2 as c is 1 and Logba] is also 1. So the
solution is Θ(n Logn)
Binary Search: T(n) = T(n/2) + Θ(1). It also falls in case 2 as c is 0 and Logba is also 0. So the
solution is Θ(Logn)
Notes:
1) It is not necessary that a recurrence of the form T(n) = aT(n/b) + f(n) can be solved using
Master Theorem. The given three cases have some gaps between them. For example, the
recurrence T(n) = 2T(n/2) + n/Logn cannot be solved using master method.
2) Case 2 can be extended for f(n) = Θ(ncLogkn)
If f(n) = Θ(ncLogkn) for some constant k >= 0 and c = Logba, then T(n) = Θ(ncLogk+1n)
Applications of Data Structure and Algorithms
Algorithm is a step-by-step procedure, which defines a set of instructions to be executed in a
certain order to get the desired output. Algorithms are generally created independent of
underlying languages, i.e. an algorithm can be implemented in more than one programming
language.
From the data structure point of view, following are some important categories of algorithms −
Search − Algorithm to search an item in a data structure.
Sort − Algorithm to sort items in a certain order.
Insert − Algorithm to insert item in a data structure.
Update − Algorithm to update an existing item in a data structure.
Delete − Algorithm to delete an existing item from a data structure.
The following computer problems can be solved using Data Structures −
Important Terms
Following are the important terms with respect to tree.
Path − Path refers to the sequence of nodes along the edges of a tree.
Root − The node at the top of the tree is called root. There is only one root per tree and
one path from the root node to any node.
Parent − Any node except the root node has one edge upward to a node called parent.
Child − The node below a given node connected by its edge downward is called its child
node.
Leaf − The node which does not have any child node is called the leaf node.
Subtree − Subtree represents the descendants of a node.
Visiting − Visiting refers to checking the value of a node when control is on the node.
Traversing − Traversing means passing through nodes in a specific order.
Levels − Level of a node represents the generation of a node. If the root node is at level
0, then its next child node is at level 1, its grandchild is at level 2, and so on.
Keys − Key represents a value of a node based on which a search operation is to be
carried out for a node.
Types of Trees
There are three types of trees −
General Trees
Binary Trees
Binary Search Trees
General Trees
General trees are unordered tree data structures where the root node has minimum 0 or
maximum ‘n’ subtrees.
The General trees have no constraint placed on their hierarchy. The root node thus acts like the
superset of all the other subtrees.
UNIT II- HIERARCHICAL DATA STRUCTURES
Binary Search Trees: Basics – Querying a Binary search tree – Insertion and Deletion- Red
Black trees: Properties of Red-Black Trees – Rotations – Insertion – Deletion -B-Trees:
Definition of B -trees – Basic operations on B-Trees – Deleting a key from a B-Tree- Heap –
Heap Implementation – Disjoint Sets - Fibonacci Heaps: structure – Mergeable-heap
operations- Decreasing a key and deleting a node-Bounding the maximum degree.
Representation
BST is a collection of nodes arranged in a way where they maintain BST properties. Each node
has a key and an associated value. While searching, the desired key is compared to the keys in
BST and if found, the associated value is retrieved.
Following is a pictorial representation of BST −
We observe that the root node key (27) has all less-valued keys on the left sub-tree and the
higher valued keys on the right sub-tree.
Basic Operations
Following are the basic operations of a tree −
Search − Searches an element in a tree.
Insert − Inserts an element in a tree.
Pre-order Traversal − Traverses a tree in a pre-order manner.
In-order Traversal − Traverses a tree in an in-order manner.
Post-order Traversal − Traverses a tree in a post-order manner.
Now, let's see the creation of binary search tree using an example.
Suppose the data elements are - 45, 15, 79, 90, 10, 55, 12, 20, 50
o First, we have to insert 45 into the tree as the root of the tree.
o Then, read the next element; if it is smaller than the root node, insert it as the root of the
left subtree, and move to the next element.
o Otherwise, if the element is larger than the root node, then insert it as the root of the
right subtree.
Now, let's see the process of creating the Binary search tree using the given data element. The
process of creating the BST is shown below -
Step 1 - Insert 45. Step 2 - Insert 15. Step 3 - Insert 79.
As 15 is smaller than 45, so insert it as the root node of the left subtree.
As 79 is greater than 45, so insert it as the root node of the right subtree.
90 is greater than 45 and 79, so it will be inserted as the right subtree of 79.
50 is greater than 45 but smaller than 79 and 55. So, it will be inserted as a left subtree of 55.
Now, the creation of binary search tree is completed. After that, let's move towards the
operations that can be performed on Binary search tree.
We can perform insert, delete and search operations on the binary search tree.
Searching means to find or locate a specific element or node in a data structure. In Binary search
tree, searching a node is easy because elements in BST are stored in a specific order. The steps
of searching a node in Binary Search tree are listed as follows -
1. First, compare the element to be searched with the root element of the tree.
2. If root is matched with the target element, then return the node's location.
3. If it is not matched, then check whether the item is less than the root element, if it is
smaller than the root element, then move to the left subtree.
4. If it is larger than the root element, then move to the right subtree.
5. Repeat the above procedure recursively until the match is found.
6. If the element is not found or not present in the tree, then return NULL.
Now, let's understand the searching in binary tree using an example. We are taking the binary
search tree formed above. Suppose we have to find node 20 from the below tree.
Step1: Step2:
Step3:
Now, let's see the algorithm to search an element in the Binary search tree.
Algorithm to search an element in Binary search tree
1. Search (root, item)
2. Step 1 - if (item = root → data) or (root = NULL)
3. return root
4. else if (item < root → data)
5. return Search(root → left, item)
6. else
7. return Search(root → right, item)
8. END if
9. Step 2 - END
Now let's understand how the deletion is performed on a binary search tree. We will also see an
example to delete an element from the given tree.
Insertion O(n)
Deletion O(n)
Search O(n)
o The space complexity of all operations of Binary search tree is O(n).
1. Search O(log n)
2. Insert O(log n)
3. Delete O(log n)
The following are some rules used to create the Red-Black tree:
1. If the tree is empty, then we create a new node as a root node with the color black.
2. If the tree is not empty, then we create a new node as a leaf node with a color red.
3. If the parent of a new node is black, then exit.
4. If the parent of a new node is Red, then we have to check the color of the parent's sibling
of a new node.
4a) If the color is Black, then we perform rotations and recoloring.
4b) If the color is Red then we recolor the node. We will also check whether the parents' parent
of a new node is the root node or not; if it is not a root node, we will recolor and recheck the
node.
Step 1: Initially, the tree is empty, so we create a new node having value 10. This is the first
node of the tree, so it would be the root node of the tree. As we already discussed, that root node
must be black in color, which is shown below:
Step 2: The next node is 18. As 18 is greater than 10 so it will come at the right of 10 as shown
below.
We know the second rule of the Red Black tree that if the tree is not empty then the newly
created node will have the Red color. Therefore, node 18 has a Red color, as shown in the below
figure:
Now we verify the third rule of the Red-Black tree, i.e., the parent of the new node is black or
not. In the above figure, the parent of the node is black in color; therefore, it is a Red-Black tree.
Step 3: Now, we create the new node having value 7 with Red color. As 7 is less than 10, so it
will come at the left of 10 as shown below.
Now we verify the third rule of the Red-Black tree, i.e., the parent of the new node is black or
not. As we can observe, the parent of the node 7 is black in color, and it obeys the Red-Black
tree's properties.
Step 4: The next element is 15, and 15 is greater than 10, but less than 18, so the new node will
be created at the left of node 18. The node 15 would be Red in color as the tree is not empty.
The above tree violates the property of the Red-Black tree as it has Red-red parent-child
relationship. Now we have to apply some rule to make a Red-Black tree. The rule 4 says that if
the new node's parent is Red, then we have to check the color of the parent's sibling of a new
node. The new node is node 15; the parent of the new node is node 18 and the sibling of the
parent node is node 7. As the color of the parent's sibling is Red in color, so we apply the rule
4a. The rule 4a says that we have to recolor both the parent and parent's sibling node. So, both
the nodes, i.e., 7 and 18, would be recolored as shown in the below figure.
We also have to check whether the parent's parent of the new node is the root node or not. As
we can observe in the above figure, the parent's parent of a new node is the root node, so we do
not need to recolor it.
Step 5: The next element is 16. As 16 is greater than 10 but less than 18 and greater than 15, so
node 16 will come at the right of node 15. The tree is not empty; node 16 would be Red in color,
as shown in the below figure:
In the above figure, we can observe that it violates the property of the parent-child relationship
as it has a red-red parent-child relationship. We have to apply some rules to make a Red-Black
tree. Since the new node's parent is Red color, and the parent of the new node has no sibling, so
rule 4a will be applied. The rule 4a says that some rotations and recoloring would be performed
on the tree.
Since node 16 is right of node 15 and the parent of node 15 is node 18. Node 15 is the left of
node 18. Here we have an LR relationship, so we require to perform two rotations. First, we
will perform left, and then we will perform the right rotation. The left rotation would be
performed on nodes 15 and 16, where node 16 will move upward, and node 15 will move
downward. Once the left rotation is performed, the tree looks like as shown in the below figure:
In the above figure, we can observe that there is an LL relationship. The above tree has a Red-
red conflict, so we perform the right rotation. When we perform the right rotation, the median
element would be the root node. Once the right rotation is performed, node 16 would become
the root node, and nodes 15 and 18 would be the left child and right child, respectively, as shown
in the below figure.
After rotation, node 16 and node 18 would be recolored; the color of node 16 is red, so it will
change to black, and the color of node 18 is black, so it will change to a red color as shown in
the below figure:
Step 6: The next element is 30. Node 30 is inserted at the right of node 18. As the tree is not
empty, so the color of node 30 would be red.
The color of the parent and parent's sibling of a new node is Red, so rule 4b is applied. In rule
4b, we have to do only recoloring, i.e., no rotations are required. The color of both the parent
(node 18) and parent's sibling (node 15) would become black, as shown in the below image.
We also have to check the parent's parent of the new node, whether it is a root node or not. The
parent's parent of the new node, i.e., node 30 is node 16 and node 16 is not a root node, so we
will recolor the node 16 and changes to the Red color. The parent of node 16 is node 10, and it
is not in Red color, so there is no Red-red conflict.
Step 7: The next element is 25, which we have to insert in a tree. Since 25 is greater than 10,
16, 18 but less than 30; so, it will come at the left of node 30. As the tree is not empty, node 25
would be in Red color. Here Red-red conflict occurs as the parent of the newly created is Red
color.
Since there is no parent's sibling, so rule 4a is applied in which rotation, as well as recoloring,
are performed. First, we will perform rotations. As the newly created node is at the left of its
parent and the parent node is at the right of its parent, so the RL relationship is formed. Firstly,
the right rotation is performed in which node 25 goes upwards, whereas node 30 goes
downwards, as shown in the below figure.
After the first rotation, there is an RR relationship, so left rotation is performed. After right
rotation, the median element, i.e., 25 would be the root node; node 30 would be at the right of
25 and node 18 would be at the left of node 25.
Now recoloring would be performed on nodes 25 and 18; node 25 becomes black in color, and
node 18 becomes red in color.
Step 8: The next element is 40. Since 40 is greater than 10, 16, 18, 25, and 30, so node 40 will
come at the right of node 30. As the tree is not empty, node 40 would be Red in color. There is
a Red-red conflict between nodes 40 and 30, so rule 4b will be applied.
As the color of parent and parent's sibling node of a new node is Red so recoloring would be
performed. The color of both the nodes would become black, as shown in the below image.
After recoloring, we also have to check the parent's parent of a new node, i.e., 25, which is not
a root node, so recoloring would be performed, and the color of node 25 changes to Red.
After recoloring, red-red conflict occurs between nodes 25 and 16. Now node 25 would be
considered as the new node. Since the parent of node 25 is red in color, and the parent's sibling
is black in color, rule 4a would be applied. Since 25 is at the right of the node 16 and 16 is at
the right of its parent, so there is an RR relationship. In the RR relationship, left rotation is
performed. After left rotation, the median element 16 would be the root node, as shown in the
below figure.
After rotation, recoloring is performed on nodes 16 and 10. The color of node 10 and node 16
changes to Red and Black, respectively as shown in the below figure.
Step 9: The next element is 60. Since 60 is greater than 16, 25, 30, and 40, so node 60 will come
at the right of node 40. As the tree is not empty, the color of node 60 would be Red.
As we can observe in the above tree that there is a Red-red conflict occurs. The parent node is
Red in color, and there is no parent's sibling exists in the tree, so rule 4a would be applied. The
first rotation would be performed. The RR relationship exists between the nodes, so left rotation
would be performed.
When left rotation is performed, node 40 will come upwards, and node 30 will come
downwards, as shown in the below figure:
After rotation, the recoloring is performed on nodes 30 and 40. The color of node 30 would
become Red, while the color of node 40 would become black.
The above tree is a Red-Black tree as it follows all the Red-Black tree properties.
Initially, we are having the address of the root node. First, we will apply BST to search the node.
Since 30 is greater than 10 and 20, which means that 30 is the right child of node 20. Node 30
is a leaf node and Red in color, so it is simply deleted from the tree.
If we want to delete the internal node that has one child. First, replace the value of the internal
node with the value of the child node and then simply delete the child node.
Let's take another example in which we want to delete the internal node, i.e., node 20.
We cannot delete the internal node; we can only replace the value of that node with another
value. Node 20 is at the right of the root node, and it is having only one child, node 30. So, node
20 is replaced with a value 30, but the color of the node would remain the same, i.e., Black. In
the end, node 20 (leaf node) is deleted from the tree.
If we want to delete the internal node that has two child nodes. In this case, we have to decide
from which we have to replace the value of the internal node (either left subtree or right subtree).
We have two ways:
o Inorder predecessor: We will replace with the largest value that exists in the left
subtree.
o Inorder successor: We will replace with the smallest value that exists in the right
subtree.
Suppose we want to delete node 30 from the tree, which is shown below:
Node 30 is at the right of the root node. In this case, we will use the inorder successor. The
value 38 is the smallest value in the right subtree, so we will replace the value 30 with 38, but
the node would remain the same, i.e., Red. After replacement, the leaf node, i.e., 30, would be
deleted from the tree. Since node 30 is a leaf node and Red in color, we need to delete it (we do
not have to perform any rotations or any recoloring).
Case 2: If the root node is also double black, then simply remove the double black and make it
a single black.
Case 3: If the double black's sibling is black and both its children are black.
o Remove the double black node.
o Add the color of the node to the parent (P) node.
1. If the color of P is red then it becomes black.
2. If the color of P is black, then it becomes double black.
o The color of double black's sibling changes to red.
o If still double black situation arises, then we will apply other cases.
Once the swapping of the color is completed, the rotation towards the double black would be
performed. The node 30 will move upwards and the node 20 will move downwards as shown
in the below figure.
In the above tree, we can observe that double black situation still exists in the tree. It satisfies
the case 3 in which double black's sibling is black as well as both its children are black. First,
we remove the double black from the node and add the black color to its parent node. At the
end, the color of the double black's sibling, i.e., node 25 changes to Red as shown in the below
figure.
In the above tree, we can observe that the double black situation has been resolved. It also
satisfies the properties of the Red Black tree.
Case 5: If double black's sibling is black, sibling's child who is far from the double black is
black, but near child to double black is red.
o Swap the color of double black's sibling and the sibling child which is nearer to the
double black node.
o Rotate the sibling in the opposite direction of the double black.
o Apply case 6
First, we replace the value 1 with the nil value. The node becomes double black as both the
nodes, i.e., 1 and nil are black. It satisfies the case 3 that implies if DB's sibling is black and
both its children are black. First, we remove the double black of the nil node. Since the parent
of DB is Black, so when the black color is added to the parent node then it becomes double
black. After adding the color, the double black's sibling color changes to Red as shown below.
We can observe in the above screenshot that the double black problem still exists in the tree.
So, we will reapply the cases. We will apply case 5 because the sibling of node 5 is node 30,
which is black in color, the child of node 30, which is far from node 5 is black, and the child of
the node 30 which is near to node 5 is Red. In this case, first we will swap the color of node 30
and node 25 so the color of node 30 changes to Red and the color of node 25 changes to Black
as shown below.
Once the swapping of the color between the nodes is completed, we need to rotate the sibling
in the opposite direction of the double black node. In this rotation, the node 30 moves
downwards while the node 25 moves upwards as shown below.
As we can observe in the above tree that double black situation still exists. So, we need to case
6. Let's first see what is case 6.
Now we will apply case 6 in the above example to solve the double black's situation.
In the above example, the double black is node 5, and the sibling of node 5 is node 25, which is
black in color. The far child of the double black node is node 30, which is Red in color as shown
in the below figure:
First, we will swap the colors of Parent and its sibling. The parent of node 5 is node 10, and the
sibling node is node 25. The colors of both the nodes are black, so there is no swapping would
occur.
In the second step, we need to rotate the parent in the double black's direction. After rotation,
node 25 will move upwards, whereas node 10 will move downwards. Once the rotation is
performed, the tree would like, as shown in the below figure:
In the next step, we will remove double black from node 5 and node 5 will give its black color
to the far child, i.e., node 30. Therefore, the color of node 30 changes to black as shown in the
below figure.
B Tree:
B Tree is a specialized m-way tree that can be widely used for disk access. A B-Tree of order
m can have at most m-1 keys and m children. One of the main reason of using B tree is its
capability to store large number of keys in a single node and large key values by keeping the
height of the tree relatively small.
A B tree of order m contains all the properties of an M way tree. In addition, it contains the
following properties.
1. Every node in a B-Tree contains at most m children.
2. Every node in a B-Tree except the root node and the leaf node contain at least m/2
children.
3. The root nodes must have at least 2 nodes.
4. All leaf nodes must be at the same level.
It is not necessary that, all the nodes contain the same number of children but, each node must
have m/2 number of nodes.
Operations
Searching :
Searching in B Trees is similar to that in Binary search tree. For example, if we search for an
item 49 in the following B Tree. The process will something like following :
1. Compare item 49 with root node 78. since 49 < 78 hence, move to its left sub-tree.
2. Since, 40<49<56, traverse right sub-tree of 40.
3. 49>45, move to right. Compare 49.
4. match found, return.
Searching in a B tree depends upon the height of the tree. The search algorithm takes O(log n)
time to search any element in a B tree.
Insertionof B-Tree:
Insertions are done at the leaf node level. The following algorithm needs to be followed in order
to insert an item into B Tree.
1. Traverse the B Tree in order to find the appropriate leaf node at which the node can be
inserted.
2. If the leaf node contain less than m-1 keys then insert the element in the increasing order.
3. Else, if the leaf node contains m-1 keys, then follow the following steps.
o Insert the new element in the increasing order of elements.
o Split the node into the two nodes at the median.
o Push the median element upto its parent node.
o If the parent node also contain m-1 number of keys, then split it too by following
the same steps.
Example:
Insert the node 8 into the B Tree of order 5 shown in the following image.
The node, now contain 5 keys which is greater than (5 -1 = 4 ) keys. Therefore split the node
from the median i.e. 8 and push it up to its parent node shown as follows.
Deletion of B-Tree:
Deletion is also performed at the leaf nodes. The node which is to be deleted can either be a leaf
node or an internal node. Following algorithm needs to be followed in order to delete a node
from a B tree.
1. Locate the leaf node.
2. If there are more than m/2 keys in the leaf node then delete the desired key from the
node.
3. If the leaf node doesn't contain m/2 keys then complete the keys by taking the element
from eight or left sibling.
o If the left sibling contains more than m/2 elements then push its largest element
up to its parent and move the intervening element down to the node where the
key is deleted.
o If the right sibling contains more than m/2 elements then push its smallest
element up to the parent and move intervening element down to the node where
the key is deleted.
4. If neither of the sibling contain more than m/2 elements then create a new leaf node by
joining two leaf nodes and the intervening element of the parent node.
5. If parent is left with less than m/2 nodes then, apply the above process on the parent too.
If the the node which is to be deleted is an internal node, then replace the node with its in-order
successor or predecessor. Since, successor or predecessor will always be on the leaf node hence,
the process will be similar as the node is being deleted from the leaf node.
Example 1
Delete the node 53 from the B Tree of order 5 shown in the following figure.
Now, 57 is the only element which is left in the node, the minimum number of elements that
must be present in a B tree of order 5, is 2. it is less than that, the elements in its left and right
sub-tree are also not sufficient therefore, merge it with the left sibling and intervening element
of parent i.e. 49.
The final B tree is shown as follows.
Application of B tree
B tree is used to index the data and provides fast access to the actual data stored on the disks
since, the access to value stored in a large database that is stored on a disk is a very time
consuming process.
Searching an un-indexed and unsorted database containing n key values needs O(n) running
time in worst case. However, if we use B Tree to index this database, it will be searched in
O(log n) time in worst case.
Heap:
A Heap is a special Tree-based data structure in which the tree is a complete binary tree.
Operations of Heap Data Structure:
Heapify: a process of creating a heap from an array.
Insertion: process to insert an element in existing heap time complexity O(log N).
Deletion: deleting the top element of the heap or the highest priority element, and then
organizing the heap and returning the element with time complexity O(log N).
Peek: to check or find the first (or can say the top) element of the heap.
Suppose we want to create the max heap tree. To create the max heap tree, we need
to consider the following two cases:
o First, we have to insert the element in such a way that the property of the
complete binary tree must be maintained.
o Secondly, the value of the parent node should be greater than the either of its
child.
Step 2: The next element is 33. As we know that insertion in the binary tree always
starts from the left side so 44 will be added at the left of 33 as shown below:
Step 3: The next element is 77 and it will be added to the right of the 44 as shown
below:
As we can observe in the above tree that it does not satisfy the max heap property, i.e.,
parent node 44 is less than the child 77. So, we will swap these two values as shown
below:
Step 4: The next element is 11. The node 11 is added to the left of 33 as shown below:
Step 5: The next element is 55. To make it a complete binary tree, we will add the node
55 to the right of 33 as shown below:
As we can observe in the above figure that it does not satisfy the property of the max
heap because 33<55, so we will swap these two values as shown below:
Step 6: The next element is 88. The left subtree is completed so we will add 88 to the
left of 44 as shown below:
As we can observe in the above figure that it does not satisfy the property of the max
heap because 44<88, so we will swap these two values as shown below:
Again, it is violating the max heap property because 88>77 so we will swap these two
values as shown below:
Step 7: The next element is 66. To make a complete binary tree, we will add the 66
element to the right side of 77 as shown below:
In the above figure, we can observe that the tree satisfies the property of max heap;
therefore, it is a heap tree.
In Deletion in the heap tree, the root node is always deleted and it is replaced with the
last element.
Step 1: In the above tree, the first 30 node is deleted from the tree and it is replaced
with the 15 element as shown below:
Now we will heapify the tree. We will check whether the 15 is greater than either of its
child or not. 15 is less than 20 so we will swap these two values as shown below:
Again, we will compare 15 with its child. Since 15 is greater than 10 so no swapping will
occur.
Disjoint set
The disjoint set data structure is also known as union-find data structure and merge-
find set. It is a data structure that contains a collection of disjoint or non-overlapping
sets. The disjoint set means that when the set is partitioned into the disjoint subsets.
The various operations can be performed on the disjoint subsets. In this case, we can
add new sets, we can merge the sets, and we can also find the representative member
of a set. It also allows to find out whether the two elements are in the same set or not
efficiently.
The disjoint set can be defined as the subsets where there is no common element
between the two sets. Let's understand the disjoint sets through an example.
s1 = {1, 2, 3, 4}
s2 = {5, 6, 7, 8}
When the union operation is applied, the set would be represented as:
s1Us2 = {1, 2, 3, 4, 5, 6, 7, 8}
Suppose we add one more edge between 1 and 5. Now the final set can be represented
as:
s3 = {1, 2, 3, 4, 5, 6, 7, 8}
Each vertex is labelled with some weight. There is a universal set with 8 vertices. We will
consider each edge one by one and form the sets.
First, we consider vertices 1 and 2. Both belong to the universal set; we perform the
union operation between elements 1 and 2. We will add the elements 1 and 2 in a set
s1 and remove these two elements from the universal set shown below:
s1 = {1, 2}
The vertices that we consider now are 3 and 4. Both the vertices belong to the universal
set; we perform the union operation between elements 3 and 4. We will form the set
s3 having elements 3 and 4 and remove the elements from the universal set shown as
below:
s2 = {3, 4}
Fibonacci Heap
Heaps are the abstract data type which is used to show the relationship between parents and
children: Heap is categorized into Min-Heap and Max-Heap:
1. A min-heap is a tree in which, for all the nodes, the key value of the parent must be
smaller than the key value of the children.
2. A max-heap is a tree in which, for all the nodes, the key value of the parent must be
greater than the key value of the children.
Define Fibonacci Heap:
Fibonacci Heap - A Fibonacci heap is defined as the collection of rooted-tree in which all the
trees must hold the property of Min-heap. That is, for all the nodes, the key value of the parent
node should be greater than the key value of the parent node:
Structure
A Fibonacci heap is a collection of trees satisfying the minimum-heap property, that is, the key
of a child is always greater than or equal to the key of the parent. This implies that the minimum
key is always at the root of one of the trees.
Implementation of operations
Operation find minimum is now trivial because we keep the pointer to the node containing it.
It does not change the potential of the heap, therefore both actual and amortized cost are
constant.
As mentioned above, merge is implemented simply by concatenating the lists of tree roots of
the two heaps. This can be done in constant time and the potential does not change, leading
again to constant amortized time.
Operation insert works by creating a new heap with one element and doing merge. This takes
constant time, and the potential increases by one, because the number of trees increases. The
amortized cost is thus still constant.
Operation extract minimum (same as delete minimum) operates in three phases. First we take
the root containing the minimum element and remove it. Its children will become roots of new
trees. If the number of children was d, it takes time O(d) to process all new roots and the
potential increases by d−1. Therefore, the amortized running time of this phase is O(d)
= O(log n).
Operation decrease key will take the node, decrease the key and if the heap property becomes
violated (the new key is smaller than the key of the parent), the node is cut from its parent. If
the parent is not a root, it is marked. If it has been marked already, it is cut as well and its
parent is marked. We continue upwards until we reach either the root or an unmarked node.
Now we set the minimum pointer to the decreased value if it is the new minimum. In the
process we create some number, say k, of new trees. Each of these new trees except possibly
the first one was marked originally but as a root it will become unmarked. One node can
become marked. Therefore, the number of marked nodes changes by −(k − 1) + 1 = − k + 2.
Combining these 2 changes, the potential changes by 2(−k + 2) + k = −k + 4. The actual time
to perform the cutting was O(k), therefore (again with a sufficiently large choice of c) the
amortized running time is constant.
Finally, operation delete can be implemented simply by decreasing the key of the element to
be deleted to minus infinity, thus turning it into the minimum of the whole heap. Then we call
extract minimum to remove it. The amortized running time of this operation is O(log n).
proved by induction) that for all integers , where . (We then have , and
1. It can have multiple trees of equal degrees, and each tree doesn't need to have 2^k nodes.
2. All the trees in the Fibonacci Heap are rooted but not ordered.
3. All the roots and siblings are stored in a separated circular-doubly-linked list.
4. The degree of a node is the number of its children. Node X -> degree = Number of X's
children.
5. Each node has a mark-attribute in which it is marked TRUE or FALSE. The FALSE
indicates the node has not any of its children. The TRUE represents that the node has
lost one child. The newly created node is marked FALSE.
6. The potential function of the Fibonacci heap is F(FH) = t[FH] + 2 * m[FH]
7. The Fibonacci Heap (FH) has some important technicalities listed below:
1. min[FH] - Pointer points to the minimum node in the Fibonacci Heap
2. n[FH] - Determines the number of nodes
3. t[FH] - Determines the number of rooted trees
4. m[FH] - Determines the number of marked nodes
5. F(FH) - Potential Function.
UNIT III – GRAPHS
Adjacency lists can readily be adapted to represent weighted graphs, that is, graphs for which
each edge has an associated weight, typically given by a weight function w : E R. For
example, let G = (V, E) be a weighted graph with weight function w. The weight w(u,v) of the
edge (u,v) E is simply stored with vertex v in u's adjacency list. The adjacency-list
representation is quite robust in that it can be modified to support many other graph variants.
For the adjacency-matrix representation of a graph G = (V, E), we assume that the vertices are
numbered 1, 2, . . . , |V| in some arbitrary manner. The adjacency-matrix representation of a
graph G then consists of a |V| |V| matrix A = (aij) such that
Breadth-first search
Breadth-first search is one of the simplest algorithms for searching a graph and the archetype
for many important graph algorithms. Dijkstra's single-source shortest-paths algorithm
(Chapter 25) and Prim's minimum-spanning-tree algorithm (Section 24.2) use ideas similar
to those in breadth-first search.
BFS(G,s)
1 for each vertex u V[G] - {s}
2 do color[u] WHITE
3 d[u]
4 [u] NIL
5 color[s] GRAY
6 d[s] 0
7 [s] NIL
8 Q {s}
9 while Q
10 do u head[Q]
11 for each v Adj[u]
12 do if color[v] = WHITE
13 then color[v] GRAY
14 d[v] d[u] + 1
15 [v] u
16 ENQUEUE(Q,v)
17 DEQUEUE(Q)
18 color[u] BLACK
The procedure BFS works as follows. Lines 1-4 paint every vertex white, set d [u]
to be infinity for every vertex u, and set the parent of every vertex to be NIL. Line 5
paints the source vertex s gray, since it is considered to be discovered when the
procedure begins. Line 6 initializes d[s] to 0, and line 7 sets the predecessor of the
source to be NIL. Line 8 initializes Q to the queue containing just the vertex s;
thereafter, Q always contains the set of gray vertices.
The main loop of the program is contained in lines 9-18. The loop iterates as long
as there remain gray vertices, which are discovered vertices that have not yet had
their adjacency lists fully examined. Line 10 determines the gray vertex u at the
head of the queue Q. The for loop of lines 11-16 considers each vertex v in the
adjacency list of u. If v is white, then it has not yet been discovered, and the
algorithm discovers it by executing lines 13-16. It is first grayed, and its
distance d[v] is set to d[u] + 1. Then, u is recorded as its parent. Finally, it is placed
at the tail of the queue Q. When all the vertices on u's adjacency list have been
examined, u is removed from Q and blackened in lines 17-18.
Analysis
Before proving all the various properties of breadth-first search, we take on the somewhat
easier job of analyzing its running time on an input graph G = (V,E). After initialization, no
vertex is ever whitened, and thus the test in line 12 ensures that each vertex is enqueued at most
once, and hence dequeued at most once. The operations of enqueuing and dequeuing take O(1)
time, so the total time devoted to queue operations is O(V). Because the adjacency list of each
vertex is scanned only when the vertex is dequeued, the adjacency list of each vertex is scanned
at most once. Since the sum of the lengths of all the adjacency lists is (E), at most O(E) time
is spent in total scanning adjacency lists. The overhead for initialization is O(V), and thus the
total running time of BFS is O(V + E). Thus, breadth-first search runs in time linear in the size
of the adjacency- list representation of G.
Depth-first search
As in breadth-first search, whenever a vertex v is discovered during a scan of the adjacency list
of an already discovered vertex u, depth-first search records this event by setting v's
predecessor field [v] to u. Unlike breadth-first search, whose predecessor subgraph forms a
tree, the predecessor subgraph produced by a depth-first search may be composed of several
trees, because the search may be repeated from multiple sources. The predecessor subgraph of
a depth-first search is therefore defined slightly differently from that of a breadth-first search.
DFS(G)
1 for each vertex u V[G]
2 do color[u] WHITE
3 [u] NIL
4 time 0
5 for each vertex u V[G]
6 do if color[u] = WHITE
7 then DFS-VISIT(u)
Figure 23.4 illustrates the progress of DFS on the graph shown in Figure 23.2.
Procedure DFS works as follows. Lines 1-3 paint all vertices white and initialize their fields
to NIL. Line 4 resets the global time counter. Lines 5-7 check each vertex in V in turn and,
when a white vertex is found, visit it using DFS-VISIT. Every time DFS-VISIT(u) is called in
line 7, vertex u becomes the root of a new tree in the depth-first forest. When DFS returns,
every vertex u has been assigned a discovery time d[u] and a finishing time â[u].
In each call DFS-VISIT(u), vertex u is initially white. Line 1 paints u gray, and line 2 records
the discovery time d[u] by incrementing and saving the global variable time. Lines 3-6 examine
each vertex v adjacent to u and recursively visit v if it is white. As each vertex v Adj[u] is
considered in line 3, we say that edge (u, v) is explored by the depth-first search. Finally, after
every edge leaving u has been explored, lines 7-8 paint u black and record the finishing time
in â[u].
Topological sort
A topological sort of a dag G = (V, E) is a linear ordering of all its vertices such
that if G contains an edge (u, v), then u appears before v in the ordering. (If
the graph is not acyclic, then no linear ordering is possible.) A topological sort
of a graph can be viewed as an ordering of its vertices along a horizontal line
so that all directed edges go from left to right. Topological sorting is thus
different from the usual kind of "sorting".
STRONGLY-CONNECTED-COMPONENTS(G)
1 call DFS(G) to compute finishing times f[u] for each vertex u
2 compute GT
3 call DFS(GT), but in the main loop of DFS, consider the vertices
in order of decreasing f[u] (as computed in line 1)
4 output the vertices of each tree in the depth-first forest of step 3 as a
separate strongly connected component
There are two famous algorithms for finding the Minimum Spanning Tree:
Kruskal’s Algorithm
Kruskal’s Algorithm builds the spanning tree by adding edges one by one into a growing spanning tree.
Kruskal's algorithm follows greedy approach as in each iteration it finds an edge which has least weight and
add it to the growing spanning tree.
Algorithm Steps:
Sort the graph edges with respect to their weights.
Start adding edges to the MST from the edge with the smallest weight until the edge of the largest weight.
Only add edges which doesn't form a cycle , edges which connect only disconnected components.
So now the question is how to check if 2 vertices are connected or not ?
This could be done using DFS which starts from the first vertex, then check if the second vertex is visited or
not. But DFS will make time complexity large as it has an order of �(�+�) where � is the number of
vertices, � is the number of edges. So the best solution is "Disjoint Sets":
Disjoint sets are sets whose intersection is the empty set so it means that they don't have any element in
common.
Consider following example:
In Kruskal’s algorithm, at each iteration we will select the edge with the lowest weight. So, we will start
with the lowest weighted edge first i.e., the edges with weight 1. After that we will select the second lowest
weighted edge i.e., edge with weight 2. Notice these two edges are totally disjoint. Now, the next edge will
be the third lowest weighted edge i.e., edge with weight 3, which connects the two disjoint pieces of the
graph. Now, we are not allowed to pick the edge with weight 4, that will create a cycle and we can’t have
any cycles. So we will select the fifth lowest weighted edge i.e., edge with weight 5. Now the other two
edges will create cycles so we will ignore them. In the end, we end up with a minimum spanning tree with
total cost 11 ( = 1 + 2 + 3 + 5).
Prim’s Algorithm
Prim’s Algorithm also use Greedy approach to find the minimum spanning tree. In Prim’s
Algorithm we grow the spanning tree from a starting position. Unlike an edge in Kruskal's,
we add vertex to the growing spanning tree in Prim's.
Algorithm Steps:
Maintain two disjoint sets of vertices. One containing vertices that are in the growing
spanning tree and other that are not in the growing spanning tree.
Select the cheapest vertex that is connected to the growing spanning tree and is not in
the growing spanning tree and add it into the growing spanning tree. This can be done
using Priority Queues. Insert the vertices, that are connected to growing spanning tree,
into the Priority Queue.
Check for cycles. To do that, mark the nodes which have been already selected and
insert only those nodes in the Priority Queue that are not marked.
Time Complexity:
The time complexity of the Prim’s Algorithm is because each edge is inserted in the
priority queue only once and insertion in priority queue take logarithmic time.
The single-source shortest path problem, in which we have to find shortest paths from a source
vertex v to all other vertices in the graph.
The single-destination shortest path problem, in which we have to find shortest paths from all
vertices in the directed graph to a single destination vertex v. This can be reduced to the single-source
shortest path problem by reversing the arcs in the directed graph.
The all-pairs shortest path problem, in which we have to find shortest paths between every pair of
vertices v, v' in the graph.
Bellman-Ford algorithm finds the distance in bottom up manner. At first it finds those
distances which have only one edge in the path. After that increase the path length to find all
possible solutions.
Input − The cost matrix of the graph:
06∞7∞
∞ 0 5 8 -4
∞ -2 0 ∞ ∞
∞ ∞ -3 0 9
2∞7∞0
Output − Source Vertex: 2Vert: 0 1 2 3 4Dist: -4 -2 0 3 -6Pred: 4 2 -1 0 1The graph has no
negative edge cycle
Algorithm
bellmanFord(dist, pred, source)
Input − Distance list, predecessor list and the source vertex.
Output − True, when a negative cycle is found.
Begin
iCount := 1
maxEdge := n * (n - 1) / 2 //n is number of vertices
for all vertices v of the graph, do
dist[v] := ∞
pred[v] := ϕ
done
dist[source] := 0
eCount := number of edges present in the graph
create edge list named edgeList
while iCount < n, do
for i := 0 to eCount, do
if dist[edgeList[i].v] > dist[edgeList[i].u] + (cost[u,v] for edge i)
dist[edgeList[i].v] > dist[edgeList[i].u] + (cost[u,v] for edge i)
pred[edgeList[i].v] := edgeList[i].u
done
done
iCount := iCount + 1
for all vertices i in the graph, do
if dist[edgeList[i].v] > dist[edgeList[i].u] + (cost[u,v] for edge i), then
return true
done
return false
End
Single-Source Shortest paths in Directed Acyclic Graphs
Dijkstra’s Algorithm – Single Source Shortest Path Algorithm
Dijkstra’s Algorithm is also known as Single Source Shortest Path (SSSP) problem. It is used
to find the shortest path from source node to destination node in graph.
The graph is widely accepted data structure to represent distance map. The distance between
cities effectively represented using graph.
Dijkstra proposed an efficient way to find the single source shortest path from the
weighted graph. For a given source vertex s, the algorithm finds the shortest path to
every other vertex v in the graph.
Assumption : Weight of all edges is non-negative.
Steps of the Dijkstra’s algorithm are explained here:
1. Initializes the distance of source vertex to zero and remaining all other vertices to
infinity.
2. Set source node to current node and put remaining all nodes in the list of unvisited
vertex list. Compute the tentative distance of all immediate neighbour vertex of the current
node.
3. If the newly computed value is smaller than the old value, then update it.
For example, C is the current node, whose distance from source S is dist (S, C) = 5.
d(S, N) = 11
d(S, N) = 7
d(S, C) + d(C, N) < d(S, N) ⇒ Relax
d(S, C) + d(C, N) > d(S, N) ⇒ Don’t
edge (S, N)
update d(S, N)
Update d(S, N) = 8
Weight updating in Dijkstra’s algorithm
4. When all the neighbours of a current node are explored, mark it as visited. Remove it
from unvisited vertex list. Mark the vertex from unvisited vertex list with minimum distance
and repeat the procedure.
5. Stop when the destination node is tested or when unvisited vertex list becomes empty.
Dynamic programming
Dynamic Programming (DP) is defined as a technique that solves some particular
type of problems in Polynomial Time. Dynamic Programming solutions are faster than
the exponential brute method and can be easily proved their correctness.
Characteristics of Dynamic Programming Algorithm:
In general, dynamic programming (DP) is one of the most powerful techniques for
solving a certain class of problems.
There is an elegant way to formulate the approach and a very simple thinking process,
and the coding part is very easy.
Essentially, it is a simple idea, after solving a problem with a given input, save the
result as a reference for future use, so you won’t have to re-solve it.. briefly ‘Remember
your Past’ :).
It is a big hint for DP if the given problem can be broken up into smaller sub-problems,
and these smaller subproblems can be divided into still smaller ones, and in this
process, you see some overlapping subproblems.
Additionally, the optimal solutions to the subproblems contribute to the optimal
solution of the given problem (referred to as the Optimal Substructure Property).
The solutions to the subproblems are stored in a table or array (memoization) or in a
bottom-up manner (tabulation) to avoid redundant computation.
The solution to the problem can be constructed from the solutions to the subproblems.
Dynamic programming can be implemented using a recursive algorithm, where the
solutions to subproblems are found recursively, or using an iterative algorithm, where
the solutions are found by working through the subproblems in a specific order.
Dynamic programming works on following principles:
Characterize structure of optimal solution, i.e. build a mathematical model of the
solution.
Recursively define the value of the optimal solution.
Using bottom-up approach, compute the value of the optimal solution for each possible
subproblems.
Construct optimal solution for the original problem using information computed in the
previous step.
Applications:
ynamic programming is used to solve optimization problems. It is used to solve many real-
life problems such as,
(i) Make a change problem
(ii) Knapsack problem
(iii) Optimal binary search tree
Floyd–Warshall algorithm
The Floyd–Warshall algorithm compares many possible paths through the graph between each
pair of vertices. It is guaranteed to find all shortest paths and is able to do this
with comparisons in a graph, even though there may be edges in the graph. It does
so by incrementally improving an estimate on the shortest path between two vertices, until the
estimate is optimal.
Example
Algorithm
Step 1 − Construct an adjacency matrix A with all the costs of edges present in the graph. If
there is no path between two vertices, mark the value as ∞.
Step 2 − Derive another adjacency matrix A1 from A keeping the first row and first column of
the original adjacency matrix intact in A1. And for the remaining values, say A1[i,j],
if A[i,j]>A[i,k]+A[k,j] then replace A1[i,j] with A[i,k]+A[k,j]. Otherwise, do not change the
values. Here, in this step, k = 1 (first vertex acting as pivot).
Step 3 − Repeat Step 2 for all the vertices in the graph by changing the k value for every pivot
vertex until the final matrix is achieved.
Step 4 − The final adjacency matrix obtained is the final solution with all the shortest paths.
Analysis
From the pseudocode above, the Floyd-Warshall algorithm operates using three for loops to
find the shortest distance between all pairs of vertices within a graph. Therefore, the time
complexity of the Floyd-Warshall algorithm is O(n3), where ‘n’ is the number of vertices in
the graph. The space complexity of the algorithm is O(n2).
UNIT IV - ALGORITHM DESIGN TECHNIQUES
Dynamic Programming: Matrix-Chain Multiplication – Elements of Dynamic Programming –
Longest Common Subsequence- Greedy Algorithms: – Elements of the Greedy Strategy- An
Activity-Selection Problem - Huffman Coding.
Dynamic programming
Dynamic programming approach is similar to divide and conquer in breaking down the
problem into smaller and yet smaller possible sub-problems. But unlike divide and conquer,
these sub-problems are not solved independently. Rather, results of these smaller sub-problems
are remembered and used for similar or overlapping sub-problems.
Mostly, dynamic programming algorithms are used for solving optimization problems. Before
solving the in-hand sub-problem, dynamic algorithm will try to examine the results of the
previously solved sub-problems. The solutions of sub-problems are combined in order to
achieve the best optimal final solution. This paradigm is thus said to be using Bottom-up
approach.
So we can conclude that −
The problem should be able to be divided into smaller overlapping sub-problem.
Final optimum solution can be achieved by using an optimum solution of smaller sub-
problems.
Dynamic algorithms use memorization.
Steps of Dynamic Programming Approach
Dynamic Programming algorithm is designed using the following four steps −
Characterize the structure of an optimal solution.
Recursively define the value of an optimal solution.
Compute the value of an optimal solution, typically in a bottom-up fashion.
Construct an optimal solution from the computed information.
Examples
Fibonacci number series
Knapsack problem
Tower of Hanoi
All pair shortest path by Floyd-Warshall and Bellman Ford
Shortest path by Dijkstra
Project scheduling
Matrix Chain Multiplication
Greedy Algorithms
A greedy algorithm, as the name suggests, always makes the choice that seems to be the best
at that moment. This means that it makes a locally-optimal choice in the hope that this choice
will lead to a globally-optimal solution.
Assume that you have an objective function that needs to be optimized (either maximized or
minimized) at a given point. A Greedy algorithm makes greedy choices at each step to ensure
that the objective function is optimized. The Greedy algorithm has only one shot to compute
the optimal solution so that it never goes back and reverses the decision.
1. It is quite easy to come up with a greedy algorithm (or even multiple greedy
algorithms) for a problem.
2. Analyzing the run time for greedy algorithms will generally be much easier than
for other techniques (like Divide and conquer). For the Divide and conquer technique,
it is not clear whether the technique is fast or slow. This is because at each level of
recursion the size of gets smaller and the number of sub-problems increases.
3. The difficult part is that for greedy algorithms you have to work much harder to
understand correctness issues. Even with the correct algorithm, it is hard to prove
why it is correct. Proving that a greedy algorithm is correct is more of an art than a
science. It involves a lot of creativity.
This is a simple Greedy-algorithm problem. In each iteration, you have to greedily select the
things which will take the minimum amount of time to complete while maintaining two
variables currentTime and numberOfThings. To complete the calculation, you must:
currentTime = 1
numberOfThings = 1
currentTime is 1 + 2 = 3
numberOfThings = 2
currentTime is 3 + 3 = 6
numberOfThings = 3
After the 4th iteration, currentTime is 6 + 4 = 10, which is greater than T. Therefore, the
answer is 3.
Algorithm:
The method which is used to construct optimal prefix code is called Huffman coding.
This algorithm builds a tree in bottom up manner. We can denote this tree by T
Let, |c| be number of leaves
|c| -1 are number of operations required to merge the nodes. Q be the priority queue which
can be used while constructing binary heap.
Algorithm Huffman (c)
{
n= |c|
Q=c
for i<-1 to n-1
do
{
3. Create a new internal node with a frequency equal to the sum of the two nodes
frequencies. Make the first extracted node as its left child and the other extracted node
as its right child. Add this node to the min heap.
4. Repeat steps#2 and #3 until the heap contains only one node. The remaining node is the
root node and the tree is complete.
Let us understand the algorithm with an example:
character Frequency
a 5
b 9
c 12
d 13
e 16
f 45
Step 1. Build a min heap that contains 6 nodes where each node represents root of a tree
with single node.
Step 2 Extract two minimum frequency nodes from min heap. Add a new internal node
with frequency 5 + 9 = 14.
Illustration of step 2
Now min heap contains 5 nodes where 4 nodes are roots of trees with single element each,
and one heap node is root of tree with 3 elements
character Frequency
c 12
d 13
Internal Node 14
e 16
f 45
Step 3: Extract two minimum frequency nodes from heap. Add a new internal node with
frequency 12 + 13 = 25
Illustration of step 3
Now min heap contains 4 nodes where 2 nodes are roots of trees with single element each,
and two heap nodes are root of tree with more than one nodes
character Frequency
Internal Node 14
e 16
Internal Node 25
f 45
Step 4: Extract two minimum frequency nodes. Add a new internal node with frequency 14
+ 16 = 30
Illustration of step 4
Illustration of step 5
NP-Completeness:
Show that may of the problems with no polynomial time algorithms are computational time
algorithms are computationally related.
There are two classes of non-polynomial time problems
1. NP-Hard
2. NP-Complete
NP Hard and NP-Complete
A problem is in the class NPC if it is in NP and is as hard as any problem in NP. A problem
is NP-hard if all problems in NP are polynomial time reducible to it, even though it may not
be in NP itself.
If a polynomial time algorithm exists for any of these problems, all problems in NP would be
polynomial time solvable. These problems are called NP-complete. The phenomenon of NP-
completeness is important for both theoretical and practical reasons.
Definition of NP-Completeness
A language B is NP-complete if it satisfies two conditions
B is in NP
Every A in NP is polynomial time reducible to B.
If a language satisfies the second property, but not necessarily the first one, the language B is
known as NP-Hard. Informally, a search problem B is NP-Hard if there exists some NP-
Complete problem A that Turing reduces to B.
The problem in NP-Hard cannot be solved in polynomial time, until P = NP. If a problem is
proved to be NPC, there is no need to waste time on trying to find an efficient algorithm for it.
Instead, we can focus on design approximation algorithm.
NP-Complete Problems
Following are some NP-Complete problems, for which no polynomial time algorithm is
known.
NP Completeness
If you can do both of these things, then you have proved that a problem is NP-Complete. If
you can prove that either of these things cannot be done, then you have proved that a problem
is not NP-Complete. Sometimes you can't come up with good proofs, and you just don't
know.
The complexity classes P and NP-Hard may be put in terms of the above:
3-SAT is a very simple NP-Complete problem. You are given a boolean expression, which is
a big AND (∧) of clauses:
E = C0 ∧ C1 ∧ ... ∧ Cm-1
Each clause Ci is the OR (∨) of three literals, where a literal is either a variable xi or the
negation of a variable ¬ xi (or sometimes the negation of a is denoted a). Here is an example
with three clauses and three variables. To make it easier to read, I'm simply calling the
variables a, b and c .
E=(a∨b∨c)∧(a∨b∨c)∧(a∨b∨c)
Given this definition, 3-SAT is simple -- is there an assignment of the variables so that E is
true? In the above example, it's easy to find such an assignment. For example, set a and c to
TRUE and b to FALSE (I'm coloring the true statements red -- you can see that there is
always at least one TRUE in each clause).
E=(a∨b∨c)∧(a∨b∨c)∧(a∨b∨c)
In general, 3-SAT can be a very difficult problem to solve. Here's a harder example with
seven clauses and four variables.
E=(a∨b∨c)∧(a∨b∨d)∧(a∨c∨d)∧(b∨c∨d)∧(a∨b∨c)∧(b∨c∨d)∧(
b∨c∨d)
E=(a∨b∨c)∧(a∨b∨d)∧(a∨c∨d)∧(b∨c∨d)∧(a∨b∨c)∧(b∨c∨d)∧
(b∨c∨d)
From our lecture notes on enumeration, we can answer whether an instance of 3-SAT is true
or false with a simple power set enumeration. That enumerates all possible true/false settings
of the literals, and for each setting, you can test to see whether the expression is true. Of
course, if there are n literals, the power set enumeration will enumerate 2n settings, so this is
definitely not polynomial time.
It is an easy matter to prove that 3-SAT is in NP. How many different clauses can there be?
(4/3) * n * (n-1) * (n-2) -- we'll go over that in class. That's a polynomial of n. If we have a
solution, we can test its validity by simply setting the variables and seeing if E is true. That
test is polynomial time, so 3-SAT is in NP.
As for proving that 3-SAT is NP-Complete, that is well beyond the scope of this class.
However, 3-SAT is a very popular problem for proving that other problems are NP-
Complete.
The yellow nodes are an independent set of size 5. There is no independent set of size 6.
Next, I need to figure out how to take an instance of 3-SAT, and convert it into an instance of
ISDP, so that if you can solve the ISDP instance in polynomial time, then you can solve the
instance of 3-SAT in polynomial time. Here's one way:
Turn each clause into three nodes, and label the nodes with their literals (including the
not). Add an edge between each of these nodes.
For every pair of nodes with the same, but negated, literals, add an edge between that
pair of nodes.
Any independent set of size k=n will correspond to an assignment of the literals for
which the 3-SAT expression is true.
Here's the simple three-clause 3-SAT problem above, converted to a graph, with an example
3-node independent set colored magenta. You'll note that the set corresponds to a setting of
the variables that makes the 3-SAT equation true:
Below, I also convert the more complicated 7-node expression to a graph for the ISDP
problem. I have the clauses clumped together going clockwise around the graph, starting at
roughly 1:00. I also have colored inter-clause edges according to the literals that they
connect:
I've colored the nodes in the Independent Set gray. You should be able to verify that:
Polynomial-Time Verification:
Before talking about the class of NP-complete problems, it is essential to introduce the notion
of a verification algorithm.
Many problems are hard to solve, but they have the property that it easy to authenticate the
solution if one is provided.
Consider the Hamiltonian cycle problem. Given an undirected graph G, does G have a cycle
that visits each vertex exactly once? There is no known polynomial time algorithm for this
dispute.
Let us understand that a graph did have a Hamiltonian cycle. It would be easy for someone to
convince of this. They would similarly say: "the period is hv3, v7, v1....v13i.
We could then inspect the graph and check that this is indeed a legal cycle and that it visits all
of the vertices of the graph exactly once. Thus, even though we know of no efficient way to
solve the Hamiltonian cycle problem, there is a beneficial way to verify that a given cycle is
indeed a Hamiltonian cycle.
Definition of Certificate: - A piece of information which contains in the given path of a vertex
is known as certificate
1. Observe that P contains in NP. In other words, if we can solve a problem in polynomial
time, we can indeed verify the solution in polynomial time. More formally, we do not
need to see a certificate (there is no need to specify the vertex/intermediate of the
specific path) to solve the problem; we can explain it in polynomial time anyway.
2. However, it is not known whether P = NP. It seems you can verify and produce an
output of the set of decision-based problems in NP classes in a polynomial time which
is impossible because according to the definition of NP classes you can verify the
solution within the polynomial time. So this relation can never be held.
Reductions:
The class NP-complete (NPC) problems consist of a set of decision problems (a subset of class
NP) that no one knows how to solve efficiently. But if there were a polynomial solution for
even a single NP-complete problem, then every problem in NPC will be solvable in polynomial
time. For this, we need the concept of reductions.
Suppose there are two problems, A and B. You know that it is impossible to solve problem A
in polynomial time. You want to prove that B cannot be explained in polynomial time. We
want to show that (A ∉ P) => (B ∉ P)
3-color: Given a graph G, can each of its vertices be labeled with one of 3 different colors such
that two adjacent vertices do not have the same label (color).
Coloring arises in various partitioning issues where there is a constraint that two objects cannot
be assigned to the same set of partitions. The phrase "coloring" comes from the original
application which was in map drawing. Two countries that contribute a common border should
be colored with different colors.
It is well known that planar graphs can be colored (maps) with four colors. There exists a
polynomial time algorithm for this. But deciding whether this can be done with 3 colors is hard,
and there is no polynomial time algorithm for it.
Fig: Example of 3-colorable and non-3-colorable graphs.
NP-Completeness
Definition: L is NP-complete if
1. L ϵ NP and
2. L' ≤ p L for some known NP-complete problem L.' Given this formal definition, the
complexity classes are:
NP: is the set of decision problems that can be verified in polynomial time.
NP-Hard: L is NP-hard if for all L' ϵ NP, L' ≤p L. Thus if we can solve L in polynomial time,
we can solve all NP problems in polynomial time.
NP-Complete L is NP-complete if
1. L ϵ NP and
2. L is NP-hard
If any NP-complete problem is solvable in polynomial time, then every NP-Complete problem
is also solvable in polynomial time. Conversely, if we can prove that any NP-Complete
problem cannot be solved in polynomial time, every NP-Complete problem cannot be solvable
in polynomial time.
Reductions
Concept: - If the solution of NPC problem does not exist then the conversion from one NPC
problem to another NPC problem within the polynomial time. For this, you need the concept
of reduction. If a solution of the one NPC problem exists within the polynomial time, then the
rest of the problem can also give the solution in polynomial time (but it's hard to believe). For
this, you need the concept of reduction.
Example: - Suppose there are two problems, A and B. You know that it is impossible to solve
problem A in polynomial time. You want to prove that B cannot be solved in polynomial time.
So you can convert the problem A into problem B in polynomial time.
1. The point to be noted here, the output is already given, and you can verify the
output/solution within the polynomial time but can't produce an output/solution in
polynomial time.
2. Here we need the concept of reduction because when you can't produce an output of
the problem according to the given input then in case you have to use an emphasis on
the concept of reduction in which you can convert one problem into another problem.