0% found this document useful (0 votes)
36 views74 pages

CS3401: Algorithms Course Overview

The document outlines the objectives and structure of a course on algorithms, covering topics such as algorithm analysis, graph algorithms, algorithm design techniques, state space search algorithms, and NP-completeness. It includes practical exercises for implementing various algorithms and emphasizes the importance of understanding time and space complexity. The course aims to equip students with the skills to analyze and apply different algorithmic techniques effectively.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views74 pages

CS3401: Algorithms Course Overview

The document outlines the objectives and structure of a course on algorithms, covering topics such as algorithm analysis, graph algorithms, algorithm design techniques, state space search algorithms, and NP-completeness. It includes practical exercises for implementing various algorithms and emphasizes the importance of understanding time and space complexity. The course aims to equip students with the skills to analyze and apply different algorithmic techniques effectively.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CS3401 ALGORITHMS

COURSE OBJECTIVES:
• To understand and apply the algorithm analysis techniques on searching and sorting algorithms
• To critically analyze the efficiency of graph algorithms
• To understand different algorithm design techniques
• To solve programming problems using state space tree
• To understand the concepts behind NP Completeness, Approximation algorithms and
randomized algorithms.

UNIT I INTRODUCTION
Algorithm analysis: Time and space complexity - Asymptotic Notations and its properties Best
case, Worst case and average case analysis – Recurrence relation: substitution method - Lower
bounds – searching: linear search, binary search and Interpolation Search, Pattern search: The
naïve string- matching algorithm - Rabin-Karp algorithm - Knuth-Morris-Pratt algorithm. Sorting:
Insertion sort – heap sort

UNIT II GRAPH ALGORITHMS


Graph algorithms: Representations of graphs - Graph traversal: DFS – BFS - applications -
Connectivity, strong connectivity, bi-connectivity - Minimum spanning tree: Kruskal’s and
Prim’s algorithm- Shortest path: Bellman-Ford algorithm - Dijkstra’s algorithm - Floyd-Warshall
algorithm Network flow: Flow networks - Ford-Fulkerson method – Matching: Maximum
bipartite matching

UNIT III ALGORITHM DESIGN TECHNIQUES


Divide and Conquer methodology: Finding maximum and minimum - Merge sort - Quick sort
Dynamic programming: Elements of dynamic programming — Matrix-chain multiplication -
Multistage graph — Optimal Binary Search Trees. Greedy Technique: Elements of the greedy
strategy - Activity-selection problem –- Optimal Merge pattern — Huffman Trees.

UNIT IV STATE SPACE SEARCH ALGORITHMS


Backtracking: n-Queens problem - Hamiltonian Circuit Problem - Subset Sum Problem – Graph
colouring problem Branch and Bound: Solving 15-Puzzle problem - Assignment problem -
Knapsack Problem - Travelling Salesman Problem
UNIT V NP-COMPLETE AND APPROXIMATION ALGORITHM
Tractable and intractable problems: Polynomial time algorithms – Venn diagram representation
- NP- algorithms - NP-hardness and NP-completeness – Bin Packing problem - Problem reduction:
TSP – 3- CNF problem. Approximation Algorithms: TSP - Randomized Algorithms: concept
and application - primality testing - randomized quick sort - Finding kth smallest number
45 PERIODS

PRACTICAL EXERCISES: 30 PERIODS


Searching and Sorting Algorithms
1. Implement Linear Search. Determine the time required to search for an element. Repeat the
experiment for different values of n, the number of elements in the list to be searched and plot a
graph of the time taken versus n.
2. Implement recursive Binary Search. Determine the time required to search an element. Repeat
the experiment for different values of n, the number of elements in the list to be searched and plot
a graph of the time taken versus n.
3. Given a text txt [0...n-1] and a pattern pat [0...m-1], write a function search (char pat [ ], char
txt [ ]) that prints all occurrences of pat [ ] in txt [ ]. You may assume that n > m.
4. Sort a given set of elements using the Insertion sort and Heap sort methods and determine the
time required to sort the elements. Repeat the experiment for different values of n, the number of
elements in the list to be sorted and plot a graph of the time taken versus n.
Graph Algorithms
1. Develop a program to implement graph traversal using Breadth First Search
2. Develop a program to implement graph traversal using Depth First Search
3. From a given vertex in a weighted connected graph, develop a program to find the shortest
paths to other vertices using Dijkstra’s algorithm.
4. Find the minimum cost spanning tree of a given undirected graph using Prim’s algorithm.
5. Implement Floyd’s algorithm for the All-Pairs- Shortest-Paths problem.
6. Compute the transitive closure of a given directed graph using Warshall's algorithm.
Algorithm Design Techniques
1. Develop a program to find out the maximum and minimum numbers in a given list of n
numbers using the divide and conquer technique.
2. Implement Merge sort and Quick sort methods to sort an array of elements and determine the
time required to sort. Repeat the experiment for different values of n, the number of elements in
the list to be sorted and plot a graph of the time taken versus n.
State Space Search Algorithms
1. Implement N Queens problem using Backtracking.
Approximation Algorithms Randomized Algorithms
1. Implement any scheme to find the optimal solution for the Traveling Salesperson problem and
then solve the same problem instance using any approximation algorithm and determine the error
in the approximation.
2. Implement randomized algorithms for finding the kth smallest number.
The programs can be implemented in C/C++/JAVA/ Python.

COURSE OUTCOMES: At the end of this course, the students will be able to:
CO1: Analyze the efficiency of algorithms using various frameworks
CO2: Apply graph algorithms to solve problems and analyze their efficiency.
CO3: Make use of algorithm design techniques like divide and conquer, dynamic programming
and greedy techniques to solve problems
CO4: Use the state space tree method for solving problems.
CO5: Solve problems using approximation algorithms and randomized algorithms

TEXT BOOKS: TOTAL: 75 PERIODS


1. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein, "Introduction
to Algorithms", 3rd Edition, Prentice Hall of India, 2009.
2. Ellis Horowitz, Sartaj Sahni, Sanguthevar Rajasekaran “Computer Algorithms/C++” Orient
Blackswan, 2nd Edition, 2019.
REFERENCES:
1. Anany Levitin, “Introduction to the Design and Analysis of Algorithms”, 3rd Edition, Pearson
Education, 2012.
2. Alfred V. Aho, John E. Hopcroft and Jeffrey D. Ullman, "Data Structures and Algorithms",
Reprint Edition, Pearson Education, 2006.
3. S. Sridhar, “Design and Analysis of Algorithms”, Oxford university press, 2014.
UNIT I INTRODUCTION
Algorithm analysis: Time and space complexity - Asymptotic Notations and its properties Best
case, Worst case and average case analysis – Recurrence relation: substitution method - Lower
bounds – Searching: linear search, binary search and Interpolation Search, Pattern search: The
naïve string- matching algorithm - Rabin-Karp algorithm - Knuth-Morris-Pratt algorithm. Sorting:
Insertion sort – heap sort

ALGORITHM
Definition:
An algorithm is a sequence of unambiguous instruction for solving a problem, for obtaining
a required output for any legitimate input in a finite amount of time.

Algorithm is a step-by-step procedure, which defines a set of instructions to be executed in


a certain order to get the desired output. Algorithms are generally created independent of
underlying languages, i.e. an algorithm can be implemented in more than one programming
language.
Important categories of algorithms −
 Search − Algorithm to search an item in a data structure.
 Sort − Algorithm to sort items in a certain order.
 Insert − Algorithm to insert item in a data structure.
 Update − Algorithm to update an existing item in a data structure.
 Delete − Algorithm to delete an existing item from a data structure.

Characteristics of an algorithm:
Not all procedures can be called an algorithm. An algorithm should have the following
characteristics
 Unambiguous − Algorithm should be clear and unambiguous. Each of its steps (or
phases), and their inputs/outputs should be clear and must lead to only one meaning.
 Input − An algorithm should have 0 or more well-defined inputs.
 Output − An algorithm should have 1 or more well-defined outputs, and should match the
desired output.
 Finiteness − Algorithms must terminate after a finite number of steps.
 Feasibility − Should be feasible with the available resources.
 Independent − An algorithm should have step-by-step directions, which should be
independent of any programming code.

How to Write an Algorithm?


There are no well-defined standards for writing algorithms. Rather, it is problem and
resource dependent. Algorithms are never written to support a particular programming code.
As we know that all programming languages share basic code constructs like loops (do, for, while),
flow-control (if-else), etc. These common constructs can be used to write an algorithm.
We write algorithms in a step-by-step manner, but it is not always the case. Algorithm
writing is a process and is executed after the problem domain is well-defined. That is, we should
know the problem domain, for which we are designing a solution.
Example
Let's try to learn algorithm-writing by using an example.
Problem − Design an algorithm to add two numbers and display the result.
Step 1 − START
Step 2 − declare three integers a, b & c
Step 3 − define values of a & b
Step 4 − add values of a & b
Step 5 − store output of step 4 to c
Step 6 − print c
Step 7 − STOP
Algorithms tell the programmers how to code the program. Alternatively, the algorithm can be
written as −
Step 1 − START ADD
Step 2 − get values of a & b
Step 3 − c ← a + b
Step 4 − display c
Step 5 − STOP
In design and analysis of algorithms, usually the second method is used to describe an
algorithm. It makes it easy for the analyst to analyze the algorithm ignoring all unwanted
definitions. He can observe what operations are being used and how the process is flowing.
Writing step numbers, is optional.

ALGORITHM ANALYSIS
Definition
Algorithm analysis is the process of determining the time and space complexity of an
algorithm, which are measures of the algorithm's efficiency.

Efficiency of an algorithm can be analyzed at two different stages, before implementation


and after implementation. They are the following −
A Priori Analysis − This is a theoretical analysis of an algorithm. Efficiency of an algorithm is
measured by assuming that all other factors, for example, processor speed, are constant and have
no effect on the implementation.
A Posterior Analysis − This is an empirical analysis of an algorithm. The selected algorithm is
implemented using programming language. This is then executed on target computer machine. In
this analysis, actual statistics like running time and space required, are collected.

Algorithm analysis deals with the execution or running time of various operations involved.
The running time of an operation can be defined as the number of computer instructions executed
per operation.
Algorithm Complexity
Suppose X is an algorithm and n is the size of input data, the time and space used by the
algorithm X are the two main factors, which decide the efficiency of X.
 Time Factor − Time is measured by counting the number of key operations such as
comparisons in the sorting algorithm.
 Space Factor − Space is measured by counting the maximum memory space required by
the algorithm.
The complexity of an algorithm f(n) gives the running time and/or the storage space required by
the algorithm in terms of n as the size of input data.

TIME AND SPACE COMPLEXITY


Time Complexity
Time complexity refers to the amount of time it takes for an algorithm to run as a function
of the size of the input, and is typically expressed using big O notation.
To analyze the time complexity of an algorithm, we need to consider the number of
operations performed by the algorithm, and how the number of operations changes as the size of
the input increases. This can be done by counting the number of basic operations performed in the
algorithm, such as comparisons, assignments, and function calls. The number of basic operations
is then used to determine the algorithm's time complexity using big O notation.
Time complexity is a measure of how long an algorithm takes to run as a function of the
size of the input. It is typically expressed using big O notation, which describes the upper bound
on the growth of the time required by the algorithm. For example, an algorithm with a time
complexity of O(n) takes longer to run as the input size (n) increases.
Types of time complexities:
 O(1)or constant time: the algorithm takes the same amount of time to run regardless of the size
of the input.
 O(log n) or logarithmic time: the algorithm's running time increases logarithmically with the
size of the input.
 O(n)or linear time: the algorithm's running time increases linearly with the size of the input.
 O(nlogn)or linear logarithmic time: the algorithm's running time increases linearly with the
size of the input and logarithmically with the size of the input.
 O(n^2) or quadratic time: the algorithm's running time increases quadratically with the size of
the input.
 O(2^n) or exponential time: the algorithm's running time increases exponentially with the size
of the input.

Space Complexity
Space complexity refers to the amount of memory required by an algorithm as a function of
the size of the input, and is also typically expressed using big O notation.
To analyze the space complexity of an algorithm, we need to consider the amount of memory
used by the algorithm, and how the amount of memory used changes as the size of the input
increases. This can be done by counting the number of variables used by the algorithm, and how
the number of variables used changes as the size of the input increases. The amount of memory
used is then used to determine the algorithm's space complexity using big O notation.
Space complexity of an algorithm represents the amount of memory space required by the
algorithm in its life cycle. The space required by an algorithm is equal to the sum of the following
two components −
 A fixed part that is a space required to store certain data and variables, that are independent
of the size of the problem. For example, simple variables and constants used, program size,
etc.
 A variable part is a space required by variables, whose size depends on the size of the
problem. For example, dynamic memory allocation, recursion stack space, etc.

Space complexity S(P) of any algorithm P is S(P) = C + SP(I), where C is the fixed part and
S(I) is the variable part of the algorithm, which depends on instance characteristic I.
Space complexity, on the other hand, is a measure of how much memory an algorithm uses as
a function of the size of the input. Like time complexity, it is typically expressed using big O
notation. For example, an algorithm with a space complexity of O(n) uses more memory as the
input size (n)increases.
Type of Space complexities:
 O(1) or constant space: the algorithm uses the same amount of memory regardless of the size
of the input.
 O(n) or linear space: the algorithm's memory usage increases linearly with the size of the input.
 O(n^2) or quadratic space: the algorithm's memory usage increases quadratically with the size
of the input.
 O(2^n)or exponential space: the algorithm's memory usage increases exponentially with the
size of the input.

Order of Growth:
A typical order of growth for time complexity on a graph, from fastest to slowest, would be:
O(1) (constant) < O(log n) (logarithmic) < O(n) (linear) < O(n log n) < O(n^2) (quadratic) < O(2^n)
(exponential) < O(n!) (factorial).
Key points about the order of growth:
 Constant (O(1)): Operations that take the same amount of time regardless of input size.
 Logarithmic (O(log n)): Time grows proportionally to the logarithm of the input size,
meaning it increases slowly as the input gets larger.
 Linear (O(n)): Time grows directly with the input size.
 Log-linear (O(n log n)): A combination of linear and logarithmic growth.
 Quadratic (O(n^2)): Time grows proportionally to the square of the input size.
 Exponential (O(2^n)): Time grows exponentially with the input size, usually considered
very inefficient for large inputs.
 Factorial (O(n!)): Time grows very quickly based on the factorial function, typically
impractical for large inputs.
Order of growth for various time complexities, demonstrating how their values change as the
input size n increases. We'll calculate the values for n=1,2,4,8,16,32 to illustrate how each time
complexity grows as the input size doubles.
ASYMPTOTIC NOTATION AND ITS PROPERTIES
Asymptotic notation
Asymptotic notation is a mathematical notation used to describe the behavior of an
algorithm as the size of the input (usually denoted by n) becomes arbitrarily large.
Types Asymptotic Notations:
1. O Notation (Big Oh)
2. Ω Notation (Big Omega)
3. θ Notation (Big Theta)
4. Little Oh o Notation
5. Little Omega ω Notation

[Link] O notation (Big Oh) (O(f(n)))


It provides an upper bound on the growth of a function. It describes the worst-case
scenario for the time or space complexity of an algorithm.
Definition:
A function f(n) is said to be in O(g(n)), denoted f(n) ∈ O(g(n)), if f(n) is bounded above by some
constant multiple of g(n) for all large n, i.e., if there exist some positive constant c and some
nonnegative integer n0 such that

f(n) ≤ cg(n) for all n ≥ n0 and c > 0.

O notation analyses the worst case for the given function.

Here n is size of an input &f(n) is a function of n.


When the input size increases, then the time also increases.
Example:
Let take f(n)=3n+2 and g(n)=n.
We have to prove f(n) ∈ O(g(n)). By the definition, f(n)≤c g(n)
3n+2≤ c * n where c>0, n0≥1.
We can substitute any value for c. The best option is, When c=4,
3n+2≤ 4 * n.
Most of the cases, 4*n is greater than 3n+2. Where n≥2.
Reason for taking n≥2:
If n=1, 3(1)+2 ≤ 4(1)=> 5≤4=> Becomes False.
If n=2, 3(2)+2 ≤ 4(2)=> 8≤8=> Becomes True.
If n=3, 3(3)+2 ≤ 4(3)=> 11≤12=> Becomes True.
If n=4, 3(4)+2 ≤ 4(4)=> 14≤16=> Becomes True. And so on.
Therefore 3n+2 ∈ O(n).
2. Big Ω notation (Big Omega) (Ω(f(n)))
It provides a lower bound on the growth of a function. It describes the best-case scenario
for the time or space complexity of an algorithm.
Definition:
A function f(n) is said to be in Ω(g(n)), denoted f(n) ∈ Ω(g(n)), if f(n) is bounded below by some
positive constant multiple of g(n) for all large n, i.e., if there exist some positive constant c and
some nonnegative integer n0 such that

f(n) ≥ cg(n) for all n ≥ n0 and c > 0.

Ω notation analyses the best case for the given function.


The definition is illustrated in the following figure.

Let take f(n)=3n+2 and g(n)=n.


We have to prove f(n) ∈ Ω(g(n)). By the definition, f(n)≤c g(n)
3n+2 ≥ c * n Fo all n ≥ n0.
We can substitute any value for c. The best option is, When c=4,
3n+2 ≥ 1 * n.
Most of the cases, 1*n is Lesser than 3n+2.
If n=1, 3(1)+2 ≥1(1)=> 5≥1=> Becomes True.
If n=2, 3(2)+2 ≥1(2)=> 8≥2=> Becomes True.
If n=3, 3(3)+2 ≥1(3)=> 11≥3=> Becomes True. And so on.
Therefore 3n+2 ∈ Ω(n).

3. Big Θ notation (Big Theta) (Θ(f(n)))


It provides a tight bound on the growth of a function. It describes the average-case scenario
for the time or space complexity of an algorithm.
Definition:
A function f(n) is said to be in θ(g(n)), denoted f(n) ∈ θ(g(n)), if f(n) is bounded both above and
below by some positive constant multiples of g(n) for all large n, i.e., if there exist some positive
constants c1 and c2 and some nonnegative integer n0 such that

c1g(n) ≤ f(n) ≤ c2g(n) for all n ≥ n0.

θ notation analyses the average case for the given function.


Here n is size of an input & f(n) is a function of n.
When the input size increases, then the time also increases. c1 and c2 are different
constants. f(n) is bounded by both upper and lower i.e) c1g(n) ≤ f(n) ≤ c2g(n).
Example:
Let take f(n)=3n+2 and g(n)=n.
We have to prove f(n) ∈θ(g(n)). By the definition,
c1g(n) ≤ f(n) ≤ c2g(n)
c1 * n ≤ 3n+2 ≤ c2 * n 3n+2 ≤ c2 * n
when c2=4 and 3n+2≥ c1 * n when c1=1.
Such that, 1*n ≤ 3n+2 ≤ 4*n. Where n≥2.
Therefore 3n+2 = θ(n).

4. Little-o Notation (o):


Definition:
Describes an upper bound that is not tight. It implies that the function grows strictly
slower than the given function.
Usage: It is used to describe a growth rate that is asymptotically smaller than the function in
question.
Example: means that the function grows strictly slower than for large n.
Key Property:

5. Little-omega Notation (ω):


Definition:
Describes a lower bound that is not tight. It implies that the function grows strictly faster
than the given function.
Usage: It is used to describe a growth rate that is asymptotically greater than the function in
question.
Example: means that grows strictly faster than for large n.

Key Property:
SEARCHING
Various searching techniques can be applied on the data structures to retrieve certain data.
A search operation is said to be successful only if it returns the desired element or data;
otherwise, the searching method is unsuccessful.
There are two categories these searching techniques fall into. They are −
1. Sequential Searching
2. Interval Searching
[Link] Searching
As the name suggests, the sequential searching operation traverses through each element
of the data sequentially to look for the desired data. The data need not be in a sorted manner for
this type of search.
Example − Linear Search

Fig. 1: Linear Search Operation


[Link] Searching
Unlike sequential searching, the interval searching operation requires the data to be in a
sorted manner. This method usually searches the data in intervals; it could be done by either
dividing the data into multiple sub-parts or jumping through the indices to search for an element.
Example − Binary Search, Jump Search etc.

Fig. 2: Binary Search Operation


Types of searching methods
1. Linear Search
2. Binary Search
3. Interpolation Search

LINEAR SEARCH
Definition
Linear search is a type of sequential searching algorithm. In this method, every element
within the input array is traversed and compared with the key element to be found. If a match is
found in the array the search is said to be successful; if there is no match found the search is said
to be unsuccessful and gives the worst-case time complexity.
Procedure : Linear Search
The algorithm for linear search is relatively simple. The procedure starts at the very first index of
the input array to be searched.
Step 1 − Start from the 0th index of the input array, compare the key value with the value present
in the 0th index.
Step 2 − If the value matches with the key, return the position at which the value was found.
Step 3 − If the value does not match with the key, compare the next element in the array.
Step 4 − Repeat Step 3 until there is a match found. Return the position at which the match was
found.
Step 5 − If it is an unsuccessful search, print that the element is not present in the array and exit
the program.
Algorithm

Linear Search ( Array Arr, Value a ) // Arr is the name of the array, and a is the searched
element.

Step 1: Set i to 0 // i is the index of an array which starts from 0


Step 2: if i > n then go to step 7 // n is the number of elements in array
Step 3: if Arr[i] = a then go to step 6
Step 4: Set i to i + 1
Step 5: Goto step2
Step 6: Print element a found at index i and go to step 8
Step 7: Print element not found
Step 8: Exit

Pseudocode
procedure linear_search (list, value)
for each item in the list
if match item == value
return the item's location
end if
end for
end procedure
Program
def linear_search(a, n, key):
count = 0
for i in range(n):
if(a[i] == key):
print("The element is found at position", (i+1))
count = count + 1
if(count == 0):
print("The element is not present in the array")
a = [14, 56, 77, 32, 84, 9, 10]
n = len(a)
key = 32
linear_search(a, n, key)
key = 3
linear_search(a, n, key)
Output
The element is found at position 4
The element is not present in the array
Analysis
Linear search traverses through every element sequentially therefore, the best case is
when the element is found in the very first iteration. The best-case time complexity would
be O(1).
However, the worst case of the linear search method would be an unsuccessful search that
does not find the key value in the array, it performs n iterations. Therefore, the worst-case time
complexity of the linear search algorithm would be O(n).
Example
Let us look at the step-by-step searching of the key element (say 47) in an array using the linear
search method.

Step 1
The linear search starts from the 0th index. Compare the key element with the value in the
0th index, 34.

However, 47 ≠ 34. So it moves to the next element.


Step 2
Now, the key is compared with value in the 1st index of the array.

Still, 47 ≠ 10, making the algorithm move for another iteration.


Step 3
The next element 66 is compared with 47. They are both not a match so the algorithm compares
the further elements.

Step 4
Now the element in 3rd index, 27, is compared with the key value, 47. They are not equal so the
algorithm is pushed forward to check the next element.

Step 5
Comparing the element in the 4th index of the array, 47, to the key 47. It is figured that both the
elements match. Now, the position in which 47 is present, i.e., 4 is returned.

The output achieved is “Element found at 4th index”.


Step 6
Now the remaining all the elements are checked by the key 47 until the list completed
Time and Space Complexity of Linear Search Algorithm:
Time Complexity:
 Best Case: In the best case, the key might be present at the first index. So the best case
complexity is O(1)
 Worst Case: In the worst case, the key might be present at the last index i.e., opposite to
the end from which the search has started in the list. So the worst-case complexity is O(n)
where N is the size of the list.
 Average Case: O(n)
Space complexity:
O (1) as except for the variable to iterate through the list, no other variable is used.
Applications of Linear Search Algorithm:
 Unsorted Lists: When we have an unsorted array or list, linear search is most commonly
used to find any element in the collection.
 Small Data Sets: Linear Search is preferred over binary search when we have small data
sets with
 Searching Linked Lists: In linked list implementations, linear search is commonly used
to find elements within the list. Each node is checked sequentially until the desired
element is found.
 Simple Implementation: Linear Search is much easier to understand and implement as
compared to Binary Search or Ternary Search.
Advantages of Linear Search Algorithm:
 Linear search can be used irrespective of whether the array is sorted or not. It can be used
on arrays of any data type.
 Does not require any additional memory.
 It is a well-suited algorithm for small datasets.
Disadvantages of Linear Search Algorithm:
 Linear search has a time complexity of O(N), which in turn makes it slow for large
datasets.
 Not suitable for large arrays.
When to use Linear Search Algorithm?
 When we are dealing with a small dataset.
 When you are searching for a dataset stored in contiguous memory.
BINARY SEARCH
Definition
Binary search is a fast search algorithm with run-time complexity of Ο(log n). This
search algorithm works on the principle of divide and conquer, since it divides the array into half
before searching. For this algorithm to work properly, the data collection should be in the sorted
form.
Binary Search algorithm is an interval searching method that performs the searching in
intervals only. The input taken by the binary search algorithm must always be in a sorted array
since it divides the array into subarrays based on the greater or lower values
Binary search looks for a particular key value by comparing the middle most item of the
collection. If a match occurs, then the index of item is returned. But if the middle item has a
value greater than the key value, the right sub-array of the middle item is searched. Otherwise,
the left sub-array is searched. This process continues recursively until the size of a subarray
reduces to zero.
Procedure :Binary Search
The algorithm follows the procedure below −
Step 1 − Select the middle item in the array and compare it with the key value to be searched. If
it is matched, return the position of the median.
Step 2 − If it does not match the key value, check if the key value is either greater than or less
than the median value.
Step 3 − If the key is greater, perform the search in the right sub-array; but if the key is lower
than the median value, perform the search in the left sub-array.
Step 4 − Repeat Steps 1, 2 and 3 iteratively, until the size of sub-array becomes 1.
Step 5 − If the key value does not exist in the array, then the algorithm returns an unsuccessful
search.
Pseudocode
The pseudocode of binary search algorithms should look like this −
Procedure binary_search
A ← sorted array
n ← size of array
x ← value to be searched
Set lowerBound = 1
Set upperBound = n
while x not found
if upperBound < lowerBound
EXIT: x does not exists.
set midPoint = lowerBound + ( upperBound - lowerBound ) / 2
if A[midPoint] < x
set lowerBound = midPoint + 1
if A[midPoint] > x
set upperBound = midPoint - 1
if A[midPoint] = x
EXIT: x found at location midPoint
end while
end procedure
Program
def binary_search(a, low, high, key):
mid = (low + high) // 2
if (low <= high):
if(a[mid] == key):
print("The element is present at index:", mid)
elif(key < a[mid]):
binary_search(a, low, mid-1, key)
elif (a[mid] < key):
binary_search(a, mid+1, high, key)
if(low > high):
print("Unsuccessful Search")
a = [6, 12, 14, 18, 22, 39, 55, 182]
n = len(a)
low = 0
high = n-1
key = 22
binary_search(a, low, high, key)
key = 54
binary_search(a, low, high, key)
Output
The element is present at index: 4
Unsuccessful Search
Analysis
Since the binary search algorithm performs searching iteratively, calculating the time
complexity is not as easy as the linear search algorithm.
The input array is searched iteratively by dividing into multiple sub-arrays after every
unsuccessful iteration. Therefore, the recurrence relation formed would be of a dividing function.
Example
For a binary search to work, it is mandatory for the target array to be sorted. We shall
learn the process of binary search with a pictorial example.
The following is our sorted array and let us assume that we need to search the location of
value 31 using binary search.

First, we shall determine half of the array by using this formula −


mid = low + (high - low) / 2
Here it is, 0 + (9 - 0) / 2 = 4 (integer value of 4.5). So, 4 is the mid of the array.

4th_index_array
Now we compare the value stored at location 4, with the value being searched, i.e. 31. We
find that the value at location 4 is 27, which is not a match. As the value is greater than 27 and
we have a sorted array, so we also know that the target value must be in the upper portion of the
array.
location_4_value_27
We change our low to mid + 1 and find the new mid value again.
low = mid + 1
mid = low + (high - low) / 2
Our new mid is 7 now. We compare the value stored at location 7 with our target value 31.

at_loaction_7
The value stored at location 7 is not a match, rather it is less than what we are looking for. So, the
value must be in the lower part from this location.

location_7_not_ match
Hence, we calculate the mid again. This time it is 5.

at_location_5
We compare the value stored at location 5 with our target value. We find that it is a match.

location_5_matched
We conclude that the target value 31 is stored at location 5.
Implementation
Binary search is a fast search algorithm with run-time complexity of Ο(log n). This search
algorithm works on the principle of divide and conquer. For this algorithm to work properly, the
data collection should be in a sorted form.
Time and space complexity of Binary Search
 Time Complexity:
o Best Case: O(1)
o Average Case: O(log n)
o Worst Case: O(log n)
 Space Complexity: O(1), If the recursive call stack is considered then the auxiliary space
will be O(log n).
Applications of Binary Search Algorithm
 Binary search can be used as a building block for more complex algorithms used in
machine learning, such as algorithms for training neural networks or finding the optimal
hyperparameters for a model.
 It can be used for searching in computer graphics such as algorithms for ray tracing or
texture mapping.
 It can be used for searching a database.
Advantages of Binary Search
 Binary search is faster than linear search, especially for large arrays.
 More efficient than other searching algorithms with a similar time complexity, such as
interpolation search or exponential search.
 Binary search is well-suited for searching large datasets that are stored in external
memory, such as on a hard drive or in the cloud.
Disadvantages of Binary Search
 The array should be sorted.
 Binary search requires that the data structure being searched be stored in contiguous
memory locations.
 Binary search requires that the elements of the array be comparable, meaning that they
must be able to be ordered.
INTERPOLATION SEARCH
Definition
Interpolation search finds a particular item by computing the probe position. Initially, the
probe position is the position of the middle most item of the collection
Interpolation search is an improved variant of binary search. This search algorithm works
on the probing position of the required value. For this algorithm to work properly, the data
collection should be in a sorted form and equally distributed.
If a match occurs, then the index of the item is returned. To split the list into two parts,
we use the following method −

where −
A = list
Lo = Lowest index of the list
Hi = Highest index of the list
A[n] = Value stored at index n in the list

If the middle item is greater than the item, then the probe position is again calculated in
the sub-array to the right of the middle item. Otherwise, the item is searched in the sub-array to
the left of the middle item. This process continues on the sub-array as well until the size of
subarray reduces to zero.
Algorithm: Interpolation Search
1. Start searching data from middle of the list.
2. If it is a match, return the index of the item, and exit.
3. If it is not a match, probe position.
4. Divide the list using probing formula and find the new middle.
5. If data is greater than middle, search in higher sub-list.
6. If data is smaller than middle, search in lower sub-list.
7. Repeat until match.
Pseudocode
A → Array list
N → Size of A
X → Target Value
Procedure Interpolation_Search()
Set Lo → 0
Set Mid → -1
Set Hi → N-1
While X does not match
if Lo equals to Hi OR A[Lo] equals to A[Hi]
EXIT: Failure, Target not found
end if
Set Mid = Lo + ((Hi - Lo) / (A[Hi] - A[Lo])) * (X - A[Lo])
if A[Mid] = X
EXIT: Success, Target found at Mid
else
if A[Mid] < X
Set Lo to Mid+1
else if A[Mid] > X
Set Hi to Mid-1
end if
end if
End While
End Procedure
Analysis
Runtime complexity of interpolation search algorithm is Ο(log (log n)) as compared to Ο(log n)
of BST in favorable situations.
Program
def interpolation_search( data, arr):
lo = 0
hi = len(arr) - 1
mid = -1
comparisons = 1
index = -1
while(lo <= hi):
print("Comparison ", comparisons)
print("lo : ", lo)
print("list[", lo, "] = ")
print(arr[lo])
print("hi : ", hi)
print("list[", hi, "] = ")
print(arr[hi])
comparisons = comparisons + 1
#probe the mid point
mid = lo + (((hi - lo) * (data - arr[lo])) // (arr[hi] - arr[lo]))
print("mid = ", mid)
#data found
if(arr[mid] == data):
index = mid
break
else:
if(arr[mid] < data):
#if data is larger, data is in upper half
lo = mid + 1
else:
#if data is smaller, data is in lower half
hi = mid - 1
print("Total comparisons made: ")
print(comparisons-1)
return index
arr = [10, 14, 19, 26, 27, 31, 33, 35, 42, 44]
#find location of 33
location = interpolation_search(33, arr)
#if element was found
if(location != -1):
print("Element found at location: ", (location+1))
else:
print("Element not found.")
Output
Comparison 1
lo : 0
list[ 0 ] = 10
hi : 9
list[ 9 ] = 44
mid = 6
Total comparisons made:
1
Element found at location: 7
Example
To understand the step-by-step process involved in the interpolation search, let us look at an
example and work around it.
Consider an array of sorted elements given below −

array_of_sorted_elements
Let us search for the element 19.
Solution
Unlike binary search, the middle point in this approach is chosen using the formula −

So in this given array input,


Lo = 0, A[Lo] = 10
Hi = 9, A[Hi] = 44
X = 19
Applying the formula to find the middle point in the list, we get
Since, mid is an index value, we only consider the integer part of the decimal. That is, mid = 2.

at_index_2
Comparing the key element given, that is 19, to the element present in the mid index, it is found
that both the elements match.
Therefore, the element is found at index 2.
Time Complexity for Interpolation Search:
Time Complexity
 Best Case: O(1) — the target is found immediately.
 Average Case: O(logn) — the search works efficiently with uniformly distributed data.
 Worst Case: O(n) — when the data is not uniformly distributed, and the probe position
leads to inefficient searches (degrades to linear search).
Space Complexity:
 The space complexity of interpolation search is O(1), as it only uses a few extra variables
to store the low, high, and pos indices.

Applications
1. Uniformly distributed data.
2. Predicting values based on known data.
3. Memory address calculation in systems programming.

Advantages
1. Faster than binary search for uniformly distributed data.
2. Adaptive to data distribution.
3. Efficient in uniformly distributed datasets.
4. Low space complexity O(1).

Disadvantages 4.
1. Poor performance with non-uniformly distributed data (worst-case O(n).
2. Requires sorted data.
3. More complex to implement compared to binary search.
4. Can degrade to linear search on sparse datasets
Time complexity of linear, binary and Interpolation Search

Comparison between linear, binary and Interpolation Search

PATTERN SEARCH

The pattern searching/matching algorithm is a technique that is used to locate or find a


specific pattern or substring within given text. Its basic idea is to find all the occurrences of a
particular pattern in the specified data structure. For instance, given an array of numbers [1, 2, 3,
4, 5, 6, 3, 4, 9] and we need to find all the occurrences of the pattern [3, 4] in the array. The
answer would be index number 2 and 6.
How pattern searching algorithm works?

There are multiple pattern searching algorithms that works in different ways. The main goal to
design these type of algorithms is to reduce the time complexity. The traditional approach may
take lots of time to complete the pattern searching task for a longer text.

Applications of Pattern Searching


Pattern searching algorithms have numerous applications, including:
 Text Processing: Searching for keywords in a document, finding and replacing text, spell
checking, and plagiarism detection.
 Information Retrieval: Finding relevant documents in a database, web search, and data
mining.
 Bioinformatics: Searching for DNA sequences in a genome, protein analysis, and gene
expression analysis.
 Network Security: Detecting malicious patterns in network traffic, intrusion detection,
and malware analysis.
 Data Mining: Identifying patterns in large datasets, customer segmentation, and fraud
detection.
Pattern searching algorithms
1. The naïve string-matching algorithm
2. Rabin-Karp algorithm
3. Knuth-Morris-Pratt algorithm.
[Link] NAÏVE STRING-MATCHING ALGORITHM
Naive pattern searching is the simplest method among other pattern-searching
algorithms. It checks for all characters of the main string to the pattern. This algorithm is helpful
for smaller texts. It does not need any pre-processing phases. We can find the substring by
checking once for the string. It also does not occupy extra space to perform the operation.

Algorithm
1. Start at the first position of the text T (from index i = 0).
2. For each position i in the text, check if the substring of T starting at i and having the same
length as P matches the pattern P.
3. If a match is found, print the starting position i.
4. Continue this process until all positions in T are checked.
Pseudocode:
Algorithm NaiveStringMatch(text, pattern):
Input:
- text: a string of length n
- pattern: a string of length m
Output:
- The starting indices of all occurrences of pattern in text
n = length of text
m = length of pattern
for i = 0 to (n - m): // Traverse text from 0 to n - m
if text[i:i+m] == pattern: // Compare substring of text with pattern
print("Pattern found at index", i)

Explanation:
1. Input: You have two strings: text (of length n) and pattern (of length m).
2. Loop: The loop starts from index i = 0 and goes until i = n - m because that's the last
index where the pattern can fit inside the text.
3. Substring Comparison: For each index i, we compare the substring text[i:i+m] (which is
the substring of length m starting from index i) with the pattern.
4. Match Found: If a match is found, the algorithm prints the index i where the match
starts.
5. Output: The algorithm prints all the starting positions where the pattern is found in the
text.

Program
def naive_string_match(text, pattern):
n = len(text)
m = len(pattern)
# Loop through all positions in the text where the pattern could fit
for i in range(n - m + 1):
# Compare the substring of text with the pattern
if text[i:i+m] == pattern:
print(f"Pattern found at index {i}")
# Test the function
text = "abcabcabc"
pattern = "abc"
naive_string_match(text, pattern)
Output
Pattern found at index 0
Pattern found at index 3
Pattern found at index 6

Time Complexity:
 Best case: O(n) (If the first position matches)
 Worst case: O(n * m) (When there is no match, and we compare each substring of length
m with the pattern)
fxQnipla. : &how q-1,._ (?.;.1Y1pl)¥i,gon rfta.. No.\ve. olh;~ N~
P = ~ oor T: o oc, o / o c c ,c I c c,o 1 •

0 I L J 4 5" b 1 i" ') lo ll I\... ll I'+


0 (?) 0 C)
0 () C C) I 0 C) I I 0 I
'~ j
I\ ' ' 1
"", ' I \ I '
. 4 No mcJch
IC C) 0 I

c3hifl 1k p~r11 t,. t~ht b~ ON!- [Link]

'
I L. s- b , lt ~ (0
I
II 1'1- ~ 14
(!) 0
I
0 e> I 0 c:, 0 ·l 0 I .0 0 0 J
j
'
. '
. ,,
fiotclt £011-fld . ir1 ind,.x J]
P<Jli~t> C) 0 0 J
~

8 hif t tk po.1µ.oi to rl~ht bo ONL rosi tion •


0 I '2- ~ Lt S" b i S- °t IO I, 1\- 13 \"
'

Tut '
0 0
!
C) C) \ C>
.• I
C) G l I
(!J I C> C) D , I

J\ ' i\ ,, '
.. I,
' i,
' ,, l ?;, No ~ch-
pcJ:t.,n . C!) G 0 I
I
I

8 h,'ft 1fu.. paltilb [o rjht bJ Of\sL posih'on


o I l.. J lt 5" I, 1 i C, lo II I~ \!a \~
I

I C C {) C) C> C) \
C, C C 0
'' ', ', . I '
0 I
' I I -
~ No ma.l-ch .
j
0 0 0 I

8hif t 1k rpJii~t) 'bi 'li(Jht blJ or\o... ro511°lon •


0 I "l- ~ Lt 5"" t 1 a- ') lo ( I , I 'l. ll I l/
l
[Link] C') b 0 C> I C 0 C) l 0 \' ~ 0 0 \
__.

.I'
\✓ :::>N o tlletf-ch •
Poitvrn I
'
-◊Io C) j

_S~ifi- '._H\f-_ p4n ~ 'rijht b_j •~ ~asit/on


0 J
4 5" b 1 t q to ll ti..- l~ )Lt
I 1- I

--I [Link] 0 0 0 0 l 0 0 0 l 0 \ C D 0 I
I ... , '"
\
,---
. I'
.,
"' 1PJ:Ji fcll-tld o± i
I
1
0 D 0 )
p~ln

a 0

(!) 0 0 0 ( o O D I O f C O O I
~➔ No [Link].h
...,_i.....,..s. .~.,----,

0 o o I &hrf 'Y/6 ht 011a... posit,·,,,..


0 ,,

1ut e>oo o( ooe t oloo oJ


I

0 c, 0 J
&ft '13ht ol\4.- posi l:io~
to 1 ,,._ ) I

0 O OD\ o DD I 0 \ C> o o ·J
--:;) No mal-~ .
o o o I
0 ' 'L

r~t OODb l~OO f 0 0 D O I

0 I 1- ~

itxt oooc I C>oc l ·C) I O C o J

t-f-------1
--.....> No Holc..h.
C) 0 O /

l- I~ f4
0 C) 00 l O CJ O l b J O o o J

No.1:J. fow1J a:1. t' n x 0 o o I


. • II

[Link] [Link] oi
l I
l1
\ I ' -
RABIN-KARP ALGORITHM
The Rabin-Karp Algorithm is a more efficient string matching algorithm that uses
hashing to find patterns in a text. Instead of comparing the pattern with every substring of the
text directly, it computes a hash value for the pattern and for substrings of the text, and then
compares these hash values. If two substrings have the same hash value, only then do we check
if they are actually identical. This approach significantly reduces the number of comparisons,
especially when dealing with large texts and patterns.

1. Hash Function:
A hash function is used to generate a hash value for both the pattern and the
substrings of the text. We usually use a rolling hash function to efficiently compute
hashes for all substrings of T.
2. Rolling Hash:
A rolling hash is a technique that helps in calculating the hash for the next
substring using the hash of the current substring. This allows us to avoid recalculating the
hash from scratch for each new substring.
3. Spurious Hit:
When the hash value of the pattern matches with the hash value of a window
of the text but the window is not the actual pattern then it is called a spurious hit.
Spurious hit increases the time complexity of the algorithm. In order to minimize
spurious hit, we use good hash function. It greatly reduces the spurious hit.

Algorithm:
1. Calculate the hash of the pattern (P).
2. Calculate the hash of the first substring of T of length m (same as the pattern).
3. Compare the hashes of the pattern and the first substring of T. If they are equal, check if
the actual substrings are the same (because different strings can have the same hash,
though rare).
4. Slide over the text: For the next substring, calculate the new hash using the rolling hash
technique and compare it with the hash of the pattern.
5. Repeat this process for each substring in the text.
Pseudocode:

Program (Using Rolling Hash):


def rabin_karp(text, pattern, d=256, q=101):
n = len(text)
m = len(pattern)
hpattern = 0 # Hash value of the pattern
htext = 0 # Hash value of the text
h=1 # The value of d^(m-1) % q

# Calculate d^(m-1) % q
for i in range(m - 1):
h = (h * d) % q

# Calculate hash of the pattern and first window of text


for i in range(m):
hpattern = (d * hpattern + ord(pattern[i])) % q
htext = (d * htext + ord(text[i])) % q

# Slide the pattern over text one by one


for i in range(n - m + 1):
# If hash values match, check the actual characters
if hpattern == htext:
if text[i:i + m] == pattern:
print(f"Pattern found at index {i}")

# Calculate hash value for the next window of text


if i < n - m:
htext = (d * (htext - ord(text[i]) * h) + ord(text[i + m])) % q
# Make sure hash is positive
if htext < 0:
htext = htext + q

# Test the function


text = "ababcababcab"
pattern = "abc"
rabin_karp(text, pattern)

Output:
Pattern found at index 0
Pattern found at index 3
Pattern found at index 6
Pattern found at index 9
1·1 ...,, ... I (' c_ V- -

f [Link]~ Rab;IJ .Karp Aao,nfim


Rollih~ H~h '[Link]

1-l - 1-1~ h Va1ui__


r.--
t)-1
b - [Link].- VcJu.g_ ~f Codsi- 25 b ~ loW no 0/

_P - lo..rBQ_ pt;/TuL nu.1111beK -

J~()J)h Upd.o.t. '301 [Link].


I

Hhe_LJ ~ (b x (~)
\_ old
- C Id X
0
bn -1)\ + C new\ ) n,od p. "

~ Hord - J,lltiol [Link] v~


[Link] ~ 6h -) - f ()Dt Charo.~t_ t

ChevJ - /\Jew e.,ho racli r \

• _f-r< 1 Text
Q....1
6-e 2.
c.. •3
<?1 - ',-
e - s-
h(p) f -b
a a b j -j
lfJ t2 ~ 4 h-a
~ 'f' "I - 9
hCV)b ~
I

h(p_) J -f◊

Te:ttt o o. {)._ o..0 - b. l10..-:il) Ccc ~ =J.i-

h-:: b f f- l t I
\_-- ',./
t { I
2..
'
P~ r mo.b:h ::- ~ C__au b)

....3 -3 3 fit-
Ex·. 2- c,.. ~ \
1e~t o. b c__ b C Q_
b9'-2-
c..- 3
ci - Lt
e-s

p<illR-1n f-b
HObh eoct- foY
5,1
h-8
palhrn : - 1 ~ : : [o j '

.' ·-9
2- f-6t 5:.. ID J :c- lO

,~xt a b c__ ej
l L 1\ ]
I t 2-t J
(
b

£x:3

Te.)( t : c... c_ 0. C. c.. (). 0.. e.. ~ b a.. ~ -\


b-1-
n:. l l c..-:>
cl - 4-
PoJ1i-,o '
n~~
@
-
b a. ,. j Hcuh [Link] db o- =1/:t +')...+
1
1
IJ e.- 5
f-6
3-1
h-~
i-9
.
J - ID
-
n: II

C. e.. 0. - .;+~+, :: 7

'T~ ().'4 0i ~ Ni>,e.. Jpo.1 io<Ao H;,t W<L- er t t, HOloh fc,.nc1Jo o

0.. - \
b:. I() b-1-
h :- 3 -i -J
c, : d
Q.2--: b d-4
--
3~ 3-2- 3-3 e-s-
H = 4x lo +~xlo + lx10 C3 ::: a. .
f- l, ·
l.. \
+2X.!O + I ')(ID
()
:: 4-x Jo ~-)

h- &
:: J+o o + 2-0 +I = 1t 2 l l -~
'
j - l0
Text : c.. c_ a
J ~ I
'l-- l Cl
I-} :JXIO +3Xlb + IKID C. c_ 0-- e. c... 0..
J \ t J
~ 3o o + 2> 0 + I :: 33) 33\
300+2>0-tl
/1on h urxkt- .ror rr,lili)_ fOY l'\ILX.t 1-\ov:ih [Link] f01 Tux t
KNUTH-MORRIS-PRATT ALGORITHM
The Knuth-Morris-Pratt (KMP) algorithm is an efficient string-matching algorithm
that avoids unnecessary comparisons by utilizing previously gathered information about the
pattern. Instead of starting from the beginning each time when a mismatch occurs, KMP uses a
"partial match" table (often called the prefix function or failure function) to skip ahead
intelligently.

KMP Algorithm Steps:


1. Preprocessing the Pattern:
o Compute the Longest Prefix-Suffix (LPS) array or prefix function for the
pattern.
o This array tells us how much we can skip forward if a mismatch occurs after a
successful partial match.

Longest Prefix Suffix (LPS) Array


The KMP algorithm preprocesses the pattern to create an auxiliary array called
the LPS (Longest Proper Prefix which is also Suffix) array. This array helps in
determining the number of characters to be skipped when a mismatch occurs 12. The LPS
array stores the length of the longest proper prefix of the pattern that is also a suffix for
each sub-pattern
 For a given pattern P, the LPS array is an array where LPS[i] gives the length of the
longest proper prefix of P[0...i] which is also a suffix of P[0...i].
 Example: For the pattern "ABAB", the LPS array would be [0, 0, 1, 2] because:
o No proper prefix-suffix for "A" → 0
o No proper prefix-suffix for "AB" → 0
o "A" is a proper prefix and suffix for "ABA" → 1
o "AB" is a proper prefix and suffix for "ABAB" → 2
2. Search Process:
 Use the LPS array to skip comparisons:
o If a mismatch occurs at position j, the LPS array will tell you the next position i
where you should start comparing in the pattern, avoiding rechecking the same
characters.
KMP Pseudocode:
Step 1: Preprocessing the Pattern (Computing LPS array)
Algorithm ComputeLPSArray(pattern):
n = length of pattern
LPS = array of size n
LPS[0] = 0 // First character has no proper prefix-suffix
length = 0 // Length of the previous longest prefix suffix
i=1

while i < n:
if pattern[i] == pattern[length]:
length += 1
LPS[i] = length
i += 1
else:
if length != 0:
length = LPS[length - 1] // Use the previous LPS value
else:
LPS[i] = 0
i += 1
return LPS
Step 2: Searching the Text
arduino
Copy
Algorithm KMP_Search(text, pattern):
n = length of text
m = length of pattern
LPS = ComputeLPSArray(pattern)
i = 0 // Index for text
j = 0 // Index for pattern
while i < n:
if text[i] == pattern[j]:
i += 1
j += 1
if j == m:
print("Pattern found at index", i - j)
j = LPS[j - 1] // Use LPS to shift the pattern
elif i < n and text[i] != pattern[j]:
if j != 0:
j = LPS[j - 1] // Use LPS to shift the pattern
else:
i += 1

Example:
Let’s walk through an example using the pattern "ABAB" and the text "ABABABAB".
 Pattern: "ABAB"
 Text: "ABABABAB"
1. Compute the LPS Array for the pattern "ABAB":
o LPS = [0, 0, 1, 2]
2. Search the pattern in the text using the LPS array:
o At position i = 0, j = 0: text[0] == pattern[0] → match → move i and j to 1.
o At position i = 1, j = 1: text[1] == pattern[1] → match → move i and j to 2.
o At position i = 2, j = 2: text[2] == pattern[2] → match → move i and j to 3.
o At position i = 3, j = 3: text[3] == pattern[3] → match → pattern found at index 0
→ set j = LPS[j - 1] = 2.
o Continue this process until the whole text is processed.
Time Complexity:
 Preprocessing time (LPS array): O(m), where m is the length of the pattern.
 Search time: O(n), where n is the length of the text.
 Overall time complexity: O(n + m).
This is much faster than the naive string-matching algorithm, which has a worst-case time
complexity of O(n * m).
Advantages:
 The KMP algorithm improves the brute-force string search by reducing the number of
character comparisons through the use of the LPS array.
 It guarantees O(n + m) time complexity, which is optimal for string searching
algorithms.
Given text: abcxabcdabxabcdabcdabcy
Given pattern : abcdabcy
Step 1: we will construct the prefix table for the given pattern as follows.
0 1 2 3 4 5 6 7
a b c D a b c Y
0 0 0 0 1 2 3 0

Step2: now start matching search for pattern against the text with the help of prefix table.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 2
2
a b c x a b c d a b x a b c d a b c d a B c Y

√ √ √ x

a b c d a b c y

The pattern[3] is not matching with text[3]. Hence we find the


position using the formula,
Text index of unmatched character – prefixtable[pattern index – 1]
=3 – prefixtable[3-1]
=3- 0 = 3
That means shift pattern at starting index 3.
Step 3:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

a b c x a b c d a b X a b c d a b c d a B c Y

a b c d a b c Y

As pattern[0] is not matching with text[3], so shift pattern by one position.


Step 4:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

a b c x a b c d a b X a b c d a b c d a B c Y

√ √ √ √ √ √ x

a b c d a b c y
Text index of unmatched character – prefixtable[pattern index – 1]
=10 – prefixtable[6-1]
=10- 2 = 8
Step 5:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
a b c x a b c d a b X a b c d a b c d a B c Y

√ √ x

a b c d a b c y

Text index of unmatched character – prefixtable[pattern index – 1]


=10 – prefixtable[2-1]
=10- 0 = 10
Step 6:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

a b c x a b c d a b X a b c d a b c d a B c Y

a b c d a b c y

Text[10] is not matching with pattern[0]. Hence shift one position next.
Step 7:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

a b c x a b c d a b X a b c d a b c d a B c Y

√ √ √ √ √ √ √ x

a b c d a B c y

Text index of unmatched character – prefixtable[pattern index – 1]


=18 – prefixtable[7-1]
=18- 3
= 15
Step 8:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

a b c x a b c d a b X a b c d a b c d a B c Y

√ √ √ √ √ √ √ √

a b c d a B c Y

Thus we found the pattern at starting position 15 in text string.


~ l,L1h -t--lo n 1-s - Prntl A- I(}°rflh 01
I
.Li palt.1 I) No.t~flJ [Link]'ft.m ,3 1 (;.)IZ-

of Iii') Cotripo-y ~ ' 1k pal&T e~o.)' [Link] rit 11,J: d.o nd


I)

'
tnolLh in ~ t;t and on o c.tute11LC2..... of [Link]
w<L· aw°(j 1k Infor maIJo 'il B-n.J r eofo. rt
1k C?DIYJpo.n;\on , fof. [Link]'fun ~ef ef [Link].dPr8 ft•111
1k. Gxt ' ~ aarun and ~ai n with NZAt rt'ICYeo.f1"1/niJ
~ i-/:-1·0 " od "fu'( , if'\D.. ~hcxv-acfo,; from pcJ1im> a.r a-
tncx.t'c.W · '1fuo UIV~~ '(e.c,kW,'. itie- e-tfrden1J 13a
I ,

pallim 1notd\Jr1~ ~01itfim, ..


\ • • •• I '

~- ~- PrJl 1ri~m . ~me or toWJ


~ ~ ~ . .4.. ~01t~~n_·
~
0
"
1

% b~t · & bJ.i/ld ,1k 1fttthrn ~


t ·. bwl.d • hbb . <8oF btnRD. ~ 0-.n°c\ b
Q._ ~ ) ( , ; _

a1o •. 'ea)kd ,- 0-YY°d 01 • pik~e- ·. fancffotl 1:Dble · • .


:,. • ,k ~ - [Link]- built 0sllJ ~
•• ~ ~ e, :, alt;rn' •, I,

.1ha.,. : ov~r/appi~ p.r1x o.,J· • cYtLffi 'i .j-S

[Link].J ih . k,.iu..if1 - Kory(,s - p,·oft ~o'Ti'tf,rt'.)

~
[Link]- ~rri.s - pra±l iorir/htn t,oe_ Wlr~J'
L P.s [knaet Pre,jiv. 8U-ffitl
• LPs lLo~~t £[Link] 8(.[Link] ~]_An~:
n-f ,~ ea.. h pasit,'on tn ik_
Jha.. lJ>s orr°o lj'1oreb , c.

Jhi'ti Lor paJ!irri), 1k ~ .Q}- ~ ~~


~ oJ -~ ~hi'rJ L01. Jv.ksf1;7o/ tJia± .k ~o o.

~ , ,-tu_ LPs [Link] ,/4 we.d /::o Mf ~[Link]

<Jkip ov~Y po1


·
tions a
U
~ . r#
,Jl\i.
Jh inq
U
W kn o.. [Link]
happeJ'\,I olu.r; ~ 1k k µ_ p ~e.0-l ch phaoQ : '
I ' I \

~ poi~ ti [Link]: LP.s ,


., ' J

I· 'Prope:r [Link](. : R p1opei-. p1e.-fi ~ , ad o.. . Jt,; 'u is . o..


pye-fix: ~ol k hot ettua! to ik_ ,~hi~ I [Link] .
/>.· u:.s An<J ~~~~ion : - ~ ~-S I 0-nu (2,,.~ be- ao~iru:td
In Olm) b 1Y1D-, wk1e._ • Y'A .Al) '' ',fk ~'tli ' ~d fk- ~1o •

?.· U-se. ln 'J<.H P: Wbi · •·;._ 1111-shJ:ih •0.W-tr~ . Ju:,i~ tk_


'9eo-'fc1 phooe.. of tk k.M P. aJao1•,1firYl , fk LP.s [Link] .
is· Used · ro &Jup G\JeJY. _ Q.h°'-rac~tJ. fhJ. . ~\J"-
al.1 ~ h€[Link] , ma::hl.a.d; i~,PTal[ Ina 1 effIdenu.
' "' . :. ' 1-.a.. L?.s i 'mru .. helps ,avoid '([Link]

(hrY) p~1 i sd!>nh , ma 14U.,.ifui- k HP aJao Ii ffi m ~ tii r ifi(:t/)

b..-uk f1~ [Natv\1. Jl:rind [Link]½D [Link] .


r l \. I


..

I
. ~

;\
I

l /I> •
\ .
1 I • } 1 , , !

' l 1 • , I
~
\ : : . '
.
l .· . • • I \
I A~OY;tfi ll')

1--0cr·calb c ~ )
3 Jtw, tcu.t
LP0 [_lonJ~{ Pre_fii Jl,Lfti 1].

Cl 0.. bet ~

@ A A AA ~> A11 • ebmc.t~ o.re_ [Link](L I;


LPS :0 [ C, I '2.. 3] 81)..W\Q_ 0 To c-1.; •

@ A8c_fl~
-:::>
LP.s = [? o
'··
,_
.. ::a

I
® AGcAB ® ABCp A
~
p -.s
A ~ 0 11 .,.

R B A ;:().
R~ C.A
f,~ <: ~ f)!?,@ B ~0. flGc.. (1 c_ A
f\G>c C.11~
F,~c A GcFt~

"1a,1ho d ~ : -,
s- 3 6 -, ' ')_. .3 l.\- 5" b
~ ' --i..- 1-f
A R ~ B A A A
;....--"- ft A A 13 Fr A A
LP-.S le :2- 0 ?-
~
G)
f G) ~ ~ s-
> 4-

\
A-
"l-
G
0 p, R A- B A
-> .,
@ AA
(!)


'

@
C)

~-
' Q
(

:!>
2-

L\ s-- l> .
~ A
.3 (1
~ r-)
@ RA A ~

a \ @ C) I . 2-- 0 (, @
~ 1-- (J) 4 ~ ;
0 A'
~
\ 1-- 3 't A- R 13 A A A
@ R R Ft g ' -~ ' ~

C I 2-
>
0 0 2- C)
2- 0
& L , ).. 3 '+ 5' b 1 8 0, ID l ~ G) 4 -5"" b , R
~AAA cAAA AAC ® 8_Mc A ~ A f)
0 I 2--0 I '1.-3~

• ').-6.) '1 S- l, 1 i .,
G) A A R t_ A A '1ft_!l
'-----
O I 2-C)l 'l-333
I 'l- 3 G) 5 b ., \ °lt,<>
CD @ f) ff ft CJ A A
0 \ 2-
e_ A 0 ½,
0 \ 2-3 3 3 Lt
•'~~ _s-(,1 t
0\ • B
A £l c A A3.
\..V I 2- o I 2.3
C> t
6 7 C\-
6
Q
I 10 JJ
(_
A A g A A
M ,€) 3 1-i ;- b , 6
~RR 8 A AC A A
Cf
- CJ I .2-01 '2-

~ \ l-Qli- If 6 ""1 8 81
cv e AGflRc_AA ~
o, or 2-~ 12-_3
0 I D J 2- ~ I ~ 1 0 '> b ""! 8 ', 10

U3I A-ABAAcf,A'BA
" - ,.,,,. ..,..._ __.....

D 10 I 2.-01 2-31t
~ ~ ~ @ b , t ~ I• /J
(?;,,. I
C) I C) I 2- 0
~ A f\13, A AC.-fr A BAA
c, I C) I 2-. 0 J 2- 3 Lt S-

0 1 0 I 2- C) l
A f\ l)f1A c_~ Pt 8 ft A
L.l>.S • [CJ I D I 2..- 0 I 2- 3 J+ .5J

""' o I .2..... j Lt 5 l:, "1 I? ~ /o

/ext: .A
, A A R A BA A A f3 A

Pal: A ARA
O I 'l- 3
L.P-S [ o I 1- 3. J
.0 te:AtDJ , _ pa±[j]
'
1-r=-)
J -t; I
len3Th lrJJ "- 4 't > ~nlp~ )
j = &in (pat) -4
~ !ev{pat~
• lf= 4 - )
;ncL.x :. ', -j
:; 1+-1t
.. =- o'
J
~ l "- .3>
@ t::5 Ll -1J
j ;:
-;>Bf> ;
➔ f)
j = r p0
LPS ~ G) 1 '.I- 3 J
l
'.: I ps D-0 j = Q
mistrolch. O 2- 3

J ► lp.s[o] A A R (-)

(i) ' :: 5 -~ £ i ', =b


j ;: o -_) A jJ~ 0
mis rnol:c..~-

0 ' l..- ~ lJ r 6 1 ~ '1 (o .

T _. ;; ['1 A R A A B A A Fl 8 A
i

J
t) \ \...

L.\>S [ 0 I -'l.- 3.

8 ,.• -::-er -) AB~,-9j


.-
J :: I
-- _,_ 1 r. _,J
J ps U

"r '

iJ -:c2 • lps[2--1J
tni_s rri ol:Ji ' ' 1p..s [1J
J =- I
SORTING
Definition
A Sorting Algorithm is used to rearrange a given array or list of elements in an order.
Sorting is provided in library implementation of most of the programming languages.
There is a wide range of applications for these algorithms, including searching algorithms,
database algorithms, divide and conquer methods, and data structure algorithms
Sorting Basics
 In-place Sorting:
An in-place sorting algorithm uses constant space for producing the output (modifies the
given array only. Examples: Selection Sort, Bubble Sort, Insertion Sort and Heap Sort.
 Internal Sorting:
Internal Sorting is when all the data is placed in the main memory or internal memory. In
internal sorting, the problem cannot take input beyond allocated memory size.
 External Sorting :
External Sorting is when all the data that needs to be sorted need not to be placed in
memory at a time, the sorting is called external sorting. External Sorting is used for the
massive amount of data. For example Merge sort can be used in external sorting as the
whole array does not have to be present all the time in memory,
 Stable sorting:
When two same items appear in the same order in sorted data as in the original array
called stable sort. Examples: Merge Sort, Insertion Sort, Bubble Sort.
 Hybrid Sorting:
A sorting algorithm is called Hybrid if it uses more than one standard sorting algorithms
to sort the array. The idea is to take advantages of multiple sorting algorithms. For
example IntroSort uses Insertions sort and Quick Sort.
Types of Sorting Techniques
There are various sorting algorithms are used in data structures. The following two
types of sorting algorithms can be broadly classified:
1. Comparison-based: We compare the elements in a comparison-based sorting algorithm)
2. Non-comparison-based: We do not compare the elements in a non-comparison-based
sorting algorithm)
Sorting algorithm

Applications
 When you have hundreds of datasets you want to print, you might want to arrange them
in some way.
 Once we get the data sorted, we can get the k-th smallest and k-th largest item in O(1)
time.
 Searching any element in a huge data set becomes easy. We can use Binary search
method for search if we have sorted data. So, Sorting become important here.
 They can be used in software and in conceptual problems to solve more advanced
problems.
INSERTION SORT
Insertion sort is a simple sorting algorithm that works by iteratively inserting each
element of an unsorted list into its correct position in a sorted portion of the list. It is like sorting
playing cards in your hands. You split the cards into two groups: the sorted cards and the
unsorted cards. Then, you pick a card from the unsorted group and put it in the right place in the
sorted group.
 We start with second element of the array as first element in the array is assumed to be
sorted.
 Compare second element with the first element and check if the second element is
smaller then swap them.
 Move to the third element and compare it with the first two elements and put at its correct
position
 Repeat until the entire array is sorted.
Pseudocode:
Algorithm InsertionSort(arr):
Input: An array arr of length n
Output: Sorted array arr
for i = 1 to n-1: // Start from the second element
key = arr[i] // Set the key as the current element
j=i-1 // Set j to the index just before i
// Shift elements of arr[0..i-1], that are greater than key, to one position ahead
while j >= 0 and arr[j] > key:
arr[j + 1] = arr[j] // Shift the element to the right
j=j-1 // Move left in the array
arr[j + 1] = key // Place the key in its correct position
return arr // Return the sorted array

Program
# Function to sort array using insertion sort
def insertionSort(arr):
for i in range(1, len(arr)):
key = arr[i]
j=i-1
# Move elements of arr[0..i-1], that are
# greater than key, to one position ahead
# of their current position
while j >= 0 and key < arr[j]:
arr[j + 1] = arr[j]
j -= 1
arr[j + 1] = key
# A utility function to print array of size n
def printArray(arr):
for i in range(len(arr)):
print(arr[i], end=" ")
print()
# Driver method
if __name__ == "__main__":
arr = [12, 11, 13, 5, 6]
insertionSort(arr)
printArray(arr)
# This code is contributed by Hritik Shah.
Output
5 6 11 12 13
Time Complexity of Insertion Sort
 Best case: O(n), If the list is already sorted, where n is the number of elements in the list.
 Average case: O(n2), If the list is randomly ordered
 Worst case: O(n2), If the list is in reverse order
Space Complexity of Insertion Sort
 Auxiliary Space: O(1), Insertion sort requires O(1) additional space, making it a space-
efficient sorting algorithm.
Advantages of Insertion Sort:
 Simple and easy to implement.
 Stable sorting algorithm.
 Efficient for small lists and nearly sorted lists.
 Space-efficient as it is an in-place algorithm.
 Adoptive. the number of inversions is directly proportional to number of swaps. For
example, no swapping happens for a sorted array and it takes O(n) time only.
Disadvantages of Insertion Sort:
 Inefficient for large lists.
 Not as efficient as other sorting algorithms (e.g., merge sort, quick sort) for most cases
Example

Illustration
arr = {23, 1, 10, 5, 2}
Initial:
 Current element is 23
 The first element in the array is assumed to be sorted.
 The sorted part until 0th index is : [23]
First Pass:
 Compare 1 with 23 (current element with the sorted part).
 Since 1 is smaller, insert 1 before 23 .
 The sorted part until 1st index is: [1, 23]
Second Pass:
 Compare 10 with 1 and 23 (current element with the sorted part).
 Since 10 is greater than 1 and smaller than 23 , insert 10 between 1 and 23 .
 The sorted part until 2nd index is: [1, 10, 23]
Third Pass:
 Compare 5 with 1 , 10 , and 23 (current element with the sorted part).
 Since 5 is greater than 1 and smaller than 10 , insert 5 between 1 and 10
 The sorted part until 3rd index is : [1, 5, 10, 23]
Fourth Pass:
 Compare 2 with 1, 5, 10 , and 23 (current element with the sorted part).
 Since 2 is greater than 1 and smaller than 5 insert 2 between 1 and 5 .
 The sorted part until 4th index is: [1, 2, 5, 10, 23]
Final Array:
 The sorted array is: [1, 2, 5, 10, 23]

HEAP SORT
Heap sort is a comparison-based sorting technique based on Binary Heap Data
Structure. It can be seen as an optimization over selection sort where we first find the max (or
min) element and swap it with the last (or first). We repeat the same process for the remaining
elements.
First convert the array into a max heap using heapify, Please note that this happens in-
place. The array elements are re-arranged to follow heap properties. Then one by one delete the
root node of the Max-heap and replace it with the last node and heapify. Repeat this process
while size of heap is greater than 1.
 Rearrange array elements so that they form a Max Heap.
 Repeat the following steps until the heap contains only one element:
o Swap the root element of the heap (which is the largest element in current heap)
with the last element of the heap.
o Remove the last element of the heap (which is now in the correct position). We
mainly reduce heap size and do not remove element from the actual array.
o Heapify the remaining elements of the heap.
 Finally we get sorted array.
Heapsort Pseudocode:

Step 1: Build Max Heap


Algorithm BuildMaxHeap(arr):
n = length of arr
for i = n // 2 - 1 down to 0: // Start from the last non-leaf node
MaxHeapify(arr, i, n) // Ensure the subtree rooted at i satisfies the max-heap property

Step 2: MaxHeapify
Algorithm MaxHeapify(arr, i, n):
largest = i // Assume the current node is the largest
left = 2 * i + 1 // Left child index
right = 2 * i + 2 // Right child index
// Check if the left child exists and is greater than the current node
if left < n and arr[left] > arr[largest]:
largest = left
// Check if the right child exists and is greater than the largest so far
if right < n and arr[right] > arr[largest]:
largest = right
// If the largest is not the current node, swap and recursively heapify
if largest != i:
swap(arr[i], arr[largest])
MaxHeapify(arr, largest, n) // Recursively heapify the affected subtree
Step 3: Heapsort
Algorithm Heapsort(arr):
n = length of arr
// Step 1: Build a max heap
BuildMaxHeap(arr)
// Step 2: Extract elements from the heap one by one
for i = n - 1 down to 1: // Start from the last element
swap(arr[0], arr[i]) // Move the current root (maximum) to the end
MaxHeapify(arr, 0, i) // Heapify the root element to restore max-heap property
return arr // Sorted array
Program
# To heapify a subtree rooted with node i
# which is an index in arr[].
def heapify(arr, n, i):
# Initialize largest as root
largest = i
# left index = 2*i + 1
l=2*i+1
# right index = 2*i + 2
r=2*i+2
# If left child is larger than root
if l < n and arr[l] > arr[largest]:
largest = l
# If right child is larger than largest so far
if r < n and arr[r] > arr[largest]:
largest = r
# If largest is not root
if largest != i:
arr[i], arr[largest] = arr[largest], arr[i] # Swap
# Recursively heapify the affected sub-tree
heapify(arr, n, largest)
# Main function to do heap sort
def heapSort(arr):
n = len(arr)
# Build heap (rearrange array)
for i in range(n // 2 - 1, -1, -1):
heapify(arr, n, i)
# One by one extract an element from heap
for i in range(n - 1, 0, -1):
# Move root to end
arr[0], arr[i] = arr[i], arr[0]
# Call max heapify on the reduced heap
heapify(arr, i, 0)
def printArray(arr):
for i in arr:
print(i, end=" ")
print()
# Driver's code
arr = [9, 4, 3, 8, 10, 2, 5]
heapSort(arr)
print("Sorted array is ")
printArray(arr)
Output
Sorted array is
2 3 4 5 8 9 10
Time Complexity: O(n log n)
Auxiliary Space: O(log n), due to the recursive call stack. However, auxiliary space can be O(1)
for iterative implementation.

Advantages of Heap Sort

 Efficient Time Complexity: Heap Sort has a time complexity of O(n log n) in all cases.
This makes it efficient for sorting large datasets. The log n factor comes from the height
of the binary heap, and it ensures that the algorithm maintains good performance even
with a large number of elements.

 Memory Usage: Memory usage can be minimal (by writing an iterative heapify() instead
of a recursive one). So apart from what is necessary to hold the initial list of items to be
sorted, it needs no additional memory space to work

 Simplicity: It is simpler to understand than other equally efficient sorting algorithms


because it does not use advanced computer science concepts such as recursion.

Disadvantages of Heap Sort

 Costly: Heap sort is costly as the constants are higher compared to merge sort even if the
time complexity is O(n Log n) for both.

 Unstable: Heap sort is unstable. It might rearrange the relative order.

 Inefficient: Heap Sort is not very efficient because of the high constants in the time
complexity.
H::'i:f ~Ol r £xo.!9r~

[5 /4 1
8I 2.. .1 9 / 8

'g___~o't' b r, ~.

-g+e.r I
. b£Y I
I
[Link] dsil.J=ion of dv lJ · [Link].J w~ih (1_ Y

-
JO
pti. H V\ t BI J ~ wi.s e. Sw o. p
eru1 atk1 ·
8tep=-3.
~
j,h o.y ,~ •

1 0 [6,
D,~,°')
--
~/cD
[ ~/=f]

~~~- ~ 8tp lo
I
lh. ~'Yr ~, (],n C)..Y'"ftJ •

®
sj Li-, 4, S- b , 8-, °!]
1
(!, bdV :lJ [ it J 51 • I

801 ~ i (I C1S CQ_ I? d.; VJ ~ '


[2.. > 4 1 5 J b , Is 1 9] t{Jy ck,
Ll &iri-j rno.:•,(> h"lil(J •

You might also like