Unit-I
SORTING
Bubble Sort Algorithm
Bubble Sort is a simple algorithm which is used to sort a given set of n elements provided
in form of an array with n number of elements. Bubble Sort compares all the element one
by one and sort them based on their values.
If the given array has to be sorted in ascending order, then bubble sort will start by
comparing the first element of the array with the second element, if the first element is
greater than the second element, it will swap both the elements, and then move on to
compare the second and the third element, and so on.
If we have total n elements, then we need to repeat this process for n-1 times.
It is known as bubble sort, because with every complete iteration the largest element in
the given array, bubbles up towards the last place or the highest index, just like a water
bubble rises up to the water surface.
Sorting takes place by stepping through all the elements one-by-one and comparing it with
the adjacent element and swapping them if required.
Steps:
1. Starting with the first element(index = 0), compare the current element with the
next element of the array.
2. If the current element is greater than the next element of the array, swap them.
3. If the current element is less than the next element, move to the next element.
Repeat Step 1.
Example:
Selection Sort
Selection Sort algorithm is used to arrange a list of elements in a particular order (Ascending or
Descending). In selection sort, the first element in the list is selected and it is compared
repeatedly with remaining all the elements in the list. If any element is smaller than the
selected element (for Ascending order), then both are swapped. Then we select the element at
second position in the list and it is compared with remaining all elements in the list. If any
element is smaller than the selected element, then both are swapped. This procedure is
repeated till the entire list is sorted.
Step by Step Process
The selection sort algorithm is performed using following steps...
Step 1: Select the first element of the list (i.e., Element at first position in the list).
Step 2: Compare the selected element with all other elements in the list.
Step 3: For every comparision, if any element is smaller than selected element (for Ascending
order), then these two are swapped.
Step 4: Repeat the same procedure with next position in the list till the entire list is sorted.
Algorithm:
Small=arr[L]
for i=L to u do
{
Small=arr[i], pos=i
For j=i+1 to u do
{
If arr[j]<small then
{
Small=arr[j]
pos=j
}
}
j=j+1
temp=arr[i]
arr[i]=small
arr[pos]=temp
}
end
Complexity of the Insertion Sort Algorithm
To sort a unsorted list with 'n' number of elements we need to make ((n-1)+(n-2)+(n-3)+......+1)
= (n (n-1))/2 number of comparisions in the worst case. If the list already sorted, then it
requires 'n' number of comparisions.
Worst Case : O(n2)
Best Case : Ω(n2)
Average Case : Θ(n2)
INSERTION SORT:
Sorting is the process of arranging a list of elements in a particular order (Ascending or
Descending).
Insertion sort algorithm arranges a list of elements in a particular order. In insertion sort
algorithm, every iteration moves an element from unsorted portion to sorted portion until all
the elements are sorted in the list.
Step by Step Process
The insertion sort algorithm is performed using following steps...
Step 1: Asume that first element in the list is in sorted portion of the list and remaining all
elements are in unsorted portion.
Step 2: Consider first element from the unsorted list and insert that element into the sorted list
in order specified.
Step 3: Repeat the above process until all the elements from the unsorted list are moved into
the sorted list.
Algorithm:
Int i,key,j;
for(i=1;i<n;i++)
Key=arr[i];
j=j-1;
while(j>=0&&arr[j]>key)
{
arr[j+1]=arr[j];
j=j-1;
}
Arr[j+1]=key;
}
}
Example
Complexity of the Insertion Sort Algorithm
To sort a unsorted list with 'n' number of elements we need to make (1+2+3+......+n-1) = (n (n-
1))/2 number of comparisions in the worst case. If the list already sorted, then it requires 'n'
number of comparisions.
Worst Case : O(n2)
Best Case : Ω(n)
Average Case : Θ(n2)
QUICK SORT ALGORITHM
Quick sort is based on the divide-and-conquer approach based on the idea of choosing one
element as a pivot element and partitioning the array around it such that: Left side of pivot
contains all the elements that are less than the pivot element Right side contains all elements
greater than the pivot
It reduces the space complexity and removes the use of the auxiliary array that is used in merge
sort. Selecting a random pivot in an array results in an improved time complexity in most of the
cases.
Implementation :
Select the first element of array as the pivot element First, we will see how the partition of the
array takes place around the pivot.
In the implementation below, the following components have been used: Here, A[ ] = array whose
elements are to be sorted
start: Leftmost position of the array
end: Rightmost position of the array
i : Boundary between the elements that are less than pivot and those greater than pivot
j: Boundary between the partitioned and unpartitioned part of array
piv: Pivot element
Time Complexity:
Average and worst case complexity=O(nlogn)
Here we find the proper position of the pivot element by rearranging the array using partition
function. Then we divide the array into two halves left side of the pivot (elements less than
pivot element) and right side of the pivot (elements greater than pivot element) and apply the
same step recursively.
You have an array A=[9,7,8,3,2,1] Observe in the diagram below, that the randpartition()
function chooses pivot randomly as 7 and then swaps it with the first element of the array and
then the partition() function call takes place, which divides the array into two halves. The first
half has elements less than 7 and the other half has elements greater than 7. For elements less
than 7, in 5th call, randpartition() function chooses 2 as pivot element randomly and then
swap it with first element and call to the partition() function takes place.
After the 7th and 8th call, no further calls can take place as only one element left in both the
calls. Similarly, you can observe the order of calls for the elements greater than 7.
Time Complexity
The worst case=O(N2) , but as this is randomized algorithm, its time complexity fluctuates
between O(N2) and O(NlogN) and mostly it comes out to be O(NlogN)
Time Complexity
The worst case=O(N2) , but as this is randomized algorithm, its time complexity fluctuates
between O(N2) and O(NlogN) and mostly it comes out to be O(NlogN)
MERGE SORT:
Merge sort is a divide-and-conquer algorithm based on the idea of breaking down a list into
several sub-lists until each sublist consists of a single element and merging those sublists in a
manner that results into a sorted list.
Idea:
Divide the unsorted list into N sublists, each containing 1 element.
Take adjacent pairs of two singleton lists and merge them to form a list of 2 elements. N will
now convert into N/2 lists of size 2.
Repeat the process till a single sorted list of obtained.
While comparing two sublists for merging, the first element of both lists is taken into
consideration. While sorting in ascending order, the element that is of a lesser value becomes a
new element of the sorted list. This procedure is repeated until both the smaller sublists are
empty and the new combined sublist comprises all the elements of both the sublists.
Algortihm:
While(i<nL&&j<nR)
{
If(L[i]<=R[j])
{
A[k]L[i]; ii+1
}
Else
{
A[k]R[j]; jj+1
}
k k+1
}
Example:
As one may understand from the image above, at each step a list of size M is being divided into
2 sublists of size M/2, until no further division can be done. To understand better, consider a
smaller array A containing the elements (9,7,8).
At the first step this list of size 3 is divided into 2 sublists the first consisting of elements
(9,7) and the second one being(8). Now, the first list consisting of elements (9,7) is further
divided into 2 sublists consisting of elements (9) and(7) respectively.
As no further breakdown of this list can be done, as each sublist consists of a maximum of 1
element, we now start to merge these lists. The 2 sub-lists formed in the last step are then
merged together in sorted order using the procedure mentioned above leading to a new list
(7,9). Backtracking further, we then need to merge the list consisting of element (8) too with
this list, leading to the new sorted list (7,8,9).
Time Complexity: The list of size N is divided into a max of logN parts, and the merging of all
sublists into a single list takes O(N) time, the worst case run time of this algorithm is O(NLogN)
Topological sorting
Topological sorting for Directed Acyclic Graph (DAG) is a linear ordering of vertices such
that for every directed edge uv, vertex u comes before v in the ordering. Topological
Sorting for a graph is not possible if the graph is not a DAG.
For example, a topological sorting of the following graph is “5 4 2 3 1 0”. There can be more
than one topological sorting for a graph. For example, another topological sorting of the
following graph is “4 5 2 3 1 0”. The first vertex in topological sorting is always a vertex
with in-degree as 0 (a vertex with no incoming edges)
first see implementation of DFS . We can modify DFS to find Topological Sorting of a graph.
In DFS, we start from a vertex, we first print it and then recursively call DFS for its adjacent
vertices. In topological sorting, we use a temporary stack. We don’t print the vertex
immediately, we first recursively call topological sorting for all its adjacent vertices, then
push it to a stack. Finally, print contents of stack. Note that a vertex is pushed to stack only
when all of its adjacent vertices (and their adjacent vertices and so on) are already in stack.
Time Complexity: The above algorithm is simply DFS with an extra stack. So time
complexity is the same as DFS which is O(V+E)
Graphs:
A graph is a pictorial representation of a set of objects where some pairs of objects are
connected by links. The interconnected objects are represented by points termed
as vertices, and the links that connect the vertices are called edges.
Formally, a graph is a pair of sets (V, E), where V is the set of vertices and E is the set of
edges, connecting the pairs of vertices. Take a look at the following graph −
In the above graph,
V = {a, b, c, d, e}
E = {ab, ac, bd, cd, de}
We can represent a graph using an array of vertices and a two-dimensional array of edges.
Before we proceed further, let's familiarize ourselves with some important terms −
Vertex − Each node of the graph is represented as a vertex. In the following
example, the labeled circle represents vertices. Thus, A to G are vertices. We can
represent them using an array as shown in the following image. Here A can be
identified by index 0. B can be identified using index 1 and so on.
Edge − Edge represents a path between two vertices or a line between two vertices.
In the following example, the lines from A to B, B to C, and so on represents edges.
We can use a two-dimensional array to represent an array as shown in the
following image. Here AB can be represented as 1 at row 0, column 1, BC as 1 at
row 1, column 2 and so on, keeping other combinations as 0.
Adjacency − Two node or vertices are adjacent if they are connected to each other
through an edge. In the following example, B is adjacent to A, C is adjacent to B, and
so on.
Path − Path represents a sequence of edges between the two vertices. In the
following example, ABCD represents a path from A to D.
There are many ways to traverse graphs. BFS is the most commonly used approach.
BFS is a traversing algorithm where you should start traversing from a selected node
(source or starting node) and traverse the graph layer wise thus exploring the neighbour
nodes (nodes which are directly connected to source node). You must then move towards
the next-level neighbour nodes.
As the name BFS suggests, you are required to traverse the graph breadth wise as follows:
1. First move horizontally and visit all the nodes of the current layer
2. Move to the next layer
BFS traversal of a graph produces a spanning tree as final result. Spanning Tree is a graph
without loops. We use Queue data structure with maximum size of total number of vertices
in the graph to implement BFS traversal.
We use the following steps to implement BFS traversal...
Step 1 - Define a Queue of size total number of vertices in the graph.
Step 2 - Select any vertex as starting point for traversal. Visit that vertex and insert it into
the Queue.
Step 3 - Visit all the non-visited adjacent vertices of the vertex which is at front of the
Queue and insert them into the Queue.
Step 4 - When there is no new vertex to be visited from the vertex which is at front of the
Queue then delete that vertex.
Step 5 - Repeat steps 3 and 4 until queue becomes empty.
Step 6 - When queue becomes empty, then produce final spanning tree by removing unused
edges from the graph
Example:
Shortest Path In Edge-Weighted Case (Dijkastra's)
Dijkstra’s algorithm solves the single-source shortest-paths problem on a directed weighted
graph G = (V, E), where all the edges are non-negative (i.e., w(u, v) ≥ 0 for each edge (u, v) Є
E).
In the following algorithm, we will use one function Extract-Min(), which extracts the node with
the smallest key.
Algorithm: Dijkstra’s-Algorithm (G, w, s)
for each vertex v Є G.V
v.d := ∞
v.∏ := NIL
s.d := 0
S := Ф
Q := G.V
while Q ≠ Ф
u := Extract-Min (Q)
S := S U {u}
for each vertex v Є [Link][u]
if v.d > u.d + w(u, v)
v.d := u.d + w(u, v)
v.∏ := u
Analysis
The complexity of this algorithm is fully dependent on the implementation of Extract-Min
function. If extract min function is implemented using linear search, the complexity of this
algorithm is O(V2 + E).
In this algorithm, if we use min-heap on which Extract-Min() function works to return the
node from Q with the smallest key, the complexity of this algorithm can be reduced
further.
Example
Let us consider vertex 1 and 9 as the start and destination vertex respectively. Initially, all
the vertices except the start vertex are marked by ∞ and the start vertex is marked by 0.
Vertex Initial Step1 V1 Step2 V3 Step3 V2 Step4 V4 Step5 V5 Step6 V7 Step7 V8 Step8 V6
1 0 0 0 0 0 0 0 0 0
2 ∞ 5 4 4 4 4 4 4 4
3 ∞ 2 2 2 2 2 2 2 2
4 ∞ ∞ ∞ 7 7 7 7 7 7
5 ∞ ∞ ∞ 11 9 9 9 9 9
6 ∞ ∞ ∞ ∞ ∞ 17 17 16 16
7 ∞ ∞ 11 11 11 11 11 11 11
8 ∞ ∞ ∞ ∞ ∞ 16 13 13 13
9 ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ 20
Hence, the minimum distance of vertex 9 from vertex 1 is 20. And the path is
1→ 3→ 7→ 8→ 6→ 9
This path is determined based on predecessor information.
DFS (Depth First Search)
DFS traversal of a graph produces a spanning tree as final result. Spanning Tree is a graph
without loops. We use Stack data structure with maximum size of total number of vertices
in the graph to implement DFS traversal.
We use the following steps to implement DFS traversal...
Step 1 - Define a Stack of size total number of vertices in the graph.
Step 2 - Select any vertex as starting point for traversal. Visit that vertex and push it on to
the Stack.
Step 3 - Visit any one of the non-visited adjacent vertices of a vertex which is at the top of
stack and push it on to the stack.
Step 4 - Repeat step 3 until there is no new vertex to be visited from the vertex which is at
the top of the stack.
Step 5 - When there is no new vertex to visit then use back tracking and pop one vertex
from the stack.
Step 6 - Repeat steps 3, 4 and 5 until stack becomes Empty.
Step 7 - When stack becomes Empty, then produce final spanning tree by removing unused
edges from the graph
Back tracking is coming back to the vertex from which we reached the current vertex.
Example:
Computation Of Strongly Connected Components
A directed graph is strongly connected if there is a path between all pairs of vertices. A
strongly connected component (SCC) of a directed graph is a maximal strongly connected
subgraph. For example, there are 3 SCCs in the following graph.
We can find all strongly connected components in O(V+E) time using Kosaraju’s algorithm.
Following is detailed Kosaraju’s algorithm.
1) Create an empty stack ‘S’ and do DFS traversal of a graph. In DFS traversal, after calling
recursive DFS for adjacent vertices of a vertex, push the vertex to stack. In the above graph,
if we start DFS from vertex 0, we get vertices in stack as 1, 2, 4, 3, 0.
2) Reverse directions of all arcs to obtain the transpose graph.
3) One by one pop a vertex from S while S is not empty. Let the popped vertex be ‘v’. Take v
as source and do DFS (call DFSUtil(v)). The DFS starting from v prints strongly connected
component of v. In the above example, we process vertices in order 0, 3, 4, 2, 1 (One by one
popped from stack).
The above algorithm is DFS based. It does DFS two times. DFS of a graph produces a single
tree if all vertices are reachable from the DFS starting point. Otherwise DFS produces a
forest. So DFS of a graph with only one SCC always produces a tree. The important point to
note is DFS may produce a tree or a forest when there are more than one SCCs depending
upon the chosen starting point. For example, in the above diagram, if we start DFS from
vertices 0 or 1 or 2, we get a tree as output. And if we start from 3 or 4, we get a forest. To
find and print all SCCs, we would want to start DFS from vertex 4 (which is a sink vertex),
then move to 3 which is sink in the remaining set (set excluding 4) and finally any of the
remaining vertices (0, 1, 2). So how do we find this sequence of picking vertices as starting
points of DFS? Unfortunately, there is no direct way for getting this sequence. However, if
we do a DFS of graph and store vertices according to their finish times, we make sure that
the finish time of a vertex that connects to other SCCs (other that its own SCC), will always
be greater than finish time of vertices in the other SCC. For example, in DFS of above
example graph, finish time of 0 is always greater than 3 and 4 (irrespective of the sequence
of vertices considered for DFS). And finish time of 3 is always greater than 4. DFS doesn’t
guarantee about other vertices, for example finish times of 1 and 2 may be smaller or
greater than 3 and 4 depending upon the sequence of vertices considered for DFS. So to use
this property, we do DFS traversal of complete graph and push every finished vertex to a
stack. In stack, 3 always appears after 4, and 0 appear after both 3 and 4.
In the next step, we reverse the graph. Consider the graph of SCCs. In the reversed graph,
the edges that connect two components are reversed. So the SCC {0, 1, 2} becomes sink and
the SCC {4} becomes source. As discussed above, in stack, we always have 0 before 3 and 4.
So if we do a DFS of the reversed graph using sequence of vertices in stack, we process
vertices from sink to source (in reversed graph). That is what we wanted to achieve and
that is all needed to print SCCs one by one.
Time and Space Complexity
Time Complexity:
Every algorithm requires some amount of computer time to execute its instruction to
perform the task. This computer time required is called time complexity.
The time complexity of an algorithm is the total amount of time required by an algorithm to
complete its execution.
Generally, the running time of an algorithm depends upon the following...
1. Whether it is running on Single processor machine or Multi processor machine.
2. Whether it is a 32 bit machine or 64 bit machine.
3. Read and Write speed of the machine.
4. The amount of time required by an algorithm to perform Arithmetic operations,
logical operations, return value and assignment operations etc.,
5. Input data
Note - When we calculate time complexity of an algorithm, we consider only input
data and ignore the remaining things, as they are machine dependent.
Calculating Time Complexity of an algorithm based on the system configuration is a very
difficult task because the configuration changes from one system to another system. To
solve this problem, we must assume a model machine with a specific configuration. So that,
we can able to calculate generalized time complexity according to that model machine.
To calculate the time complexity of an algorithm, we need to define a model machine. Let us
assume a machine with following configuration...
1. It is a Single processor machine
2. It is a 32 bit Operating System machine
3. It performs sequential execution
4. It requires 1 unit of time for Arithmetic and Logical operations
5. It requires 1 unit of time for Assignment and Return value
6. It requires 1 unit of time for Read and Write operations
Space Complexity:
When we design an algorithm to solve a problem, it needs some computer memory to
complete its execution. For any algorithm, memory is required for the following purposes...
1. To store program instructions.
2. To store constant values.
3. To store variable values.
4. And for few other things like funcion calls, jumping statements etc,.
Generally, when a program is under execution it uses the computer memory for THREE
reasons. They are as follows...
1. Instruction Space: It is the amount of memory used to store compiled version of
instructions.
2. Environmental Stack: It is the amount of memory used to store information of
partially executed functions at the time of function call.
3. Data Space: It is the amount of memory used to store all the variables and
constants.
Note - When we want to perform analysis of an algorithm based on its Space
complexity, we consider only Data Space and ignore Instruction Space as well as
Environmental Stack.
That means we calculate only the memory required to store Variables, Constants,
Structures, etc.,
To calculate the space complexity, we must know the memory required to store different
datatype values (according to the compiler). For example, the C Programming Language
compiler requires the following...
1. 2 bytes to store Integer value.
2. 4 bytes to store Floating Point value.
3. 1 byte to store Character value.
4. 6 (OR) 8 bytes to store double value.
Example:
int square(int a)
{
return a*a;
}
In the above piece of code, it requires 2 bytes of memory to store variable 'a' and another 2
bytes of memory is used for return value.
That means, totally it requires 4 bytes of memory to complete its execution. And this
4 bytes of memory is fixed for any input value of 'a'. This space complexity is said to
be Constant Space Complexity.