CS3401: Algorithms Course Overview
CS3401: Algorithms Course Overview
COURSE OBJECTIVES:
• To understand and apply the algorithm analysis techniques on searching and sorting algorithms
• To critically analyze the efficiency of graph algorithms
• To understand different algorithm design techniques
• To solve programming problems using state space tree
• To understand the concepts behind NP Completeness, Approximation algorithms and
randomized algorithms.
UNIT I INTRODUCTION
Algorithm analysis: Time and space complexity - Asymptotic Notations and its properties Best
case, Worst case and average case analysis – Recurrence relation: substitution method - Lower
bounds – searching: linear search, binary search and Interpolation Search, Pattern search: The
naïve string- matching algorithm - Rabin-Karp algorithm - Knuth-Morris-Pratt algorithm. Sorting:
Insertion sort – heap sort
COURSE OUTCOMES: At the end of this course, the students will be able to:
CO1: Analyze the efficiency of algorithms using various frameworks
CO2: Apply graph algorithms to solve problems and analyze their efficiency.
CO3: Make use of algorithm design techniques like divide and conquer, dynamic programming
and greedy techniques to solve problems
CO4: Use the state space tree method for solving problems.
CO5: Solve problems using approximation algorithms and randomized algorithms
ALGORITHM
Definition:
An algorithm is a sequence of unambiguous instruction for solving a problem, for obtaining
a required output for any legitimate input in a finite amount of time.
Characteristics of an algorithm:
Not all procedures can be called an algorithm. An algorithm should have the following
characteristics
Unambiguous − Algorithm should be clear and unambiguous. Each of its steps (or
phases), and their inputs/outputs should be clear and must lead to only one meaning.
Input − An algorithm should have 0 or more well-defined inputs.
Output − An algorithm should have 1 or more well-defined outputs, and should match the
desired output.
Finiteness − Algorithms must terminate after a finite number of steps.
Feasibility − Should be feasible with the available resources.
Independent − An algorithm should have step-by-step directions, which should be
independent of any programming code.
ALGORITHM ANALYSIS
Definition
Algorithm analysis is the process of determining the time and space complexity of an
algorithm, which are measures of the algorithm's efficiency.
Algorithm analysis deals with the execution or running time of various operations involved.
The running time of an operation can be defined as the number of computer instructions executed
per operation.
Algorithm Complexity
Suppose X is an algorithm and n is the size of input data, the time and space used by the
algorithm X are the two main factors, which decide the efficiency of X.
Time Factor − Time is measured by counting the number of key operations such as
comparisons in the sorting algorithm.
Space Factor − Space is measured by counting the maximum memory space required by
the algorithm.
The complexity of an algorithm f(n) gives the running time and/or the storage space required by
the algorithm in terms of n as the size of input data.
Space Complexity
Space complexity refers to the amount of memory required by an algorithm as a function of
the size of the input, and is also typically expressed using big O notation.
To analyze the space complexity of an algorithm, we need to consider the amount of memory
used by the algorithm, and how the amount of memory used changes as the size of the input
increases. This can be done by counting the number of variables used by the algorithm, and how
the number of variables used changes as the size of the input increases. The amount of memory
used is then used to determine the algorithm's space complexity using big O notation.
Space complexity of an algorithm represents the amount of memory space required by the
algorithm in its life cycle. The space required by an algorithm is equal to the sum of the following
two components −
A fixed part that is a space required to store certain data and variables, that are independent
of the size of the problem. For example, simple variables and constants used, program size,
etc.
A variable part is a space required by variables, whose size depends on the size of the
problem. For example, dynamic memory allocation, recursion stack space, etc.
Space complexity S(P) of any algorithm P is S(P) = C + SP(I), where C is the fixed part and
S(I) is the variable part of the algorithm, which depends on instance characteristic I.
Space complexity, on the other hand, is a measure of how much memory an algorithm uses as
a function of the size of the input. Like time complexity, it is typically expressed using big O
notation. For example, an algorithm with a space complexity of O(n) uses more memory as the
input size (n)increases.
Type of Space complexities:
O(1) or constant space: the algorithm uses the same amount of memory regardless of the size
of the input.
O(n) or linear space: the algorithm's memory usage increases linearly with the size of the input.
O(n^2) or quadratic space: the algorithm's memory usage increases quadratically with the size
of the input.
O(2^n)or exponential space: the algorithm's memory usage increases exponentially with the
size of the input.
Order of Growth:
A typical order of growth for time complexity on a graph, from fastest to slowest, would be:
O(1) (constant) < O(log n) (logarithmic) < O(n) (linear) < O(n log n) < O(n^2) (quadratic) < O(2^n)
(exponential) < O(n!) (factorial).
Key points about the order of growth:
Constant (O(1)): Operations that take the same amount of time regardless of input size.
Logarithmic (O(log n)): Time grows proportionally to the logarithm of the input size,
meaning it increases slowly as the input gets larger.
Linear (O(n)): Time grows directly with the input size.
Log-linear (O(n log n)): A combination of linear and logarithmic growth.
Quadratic (O(n^2)): Time grows proportionally to the square of the input size.
Exponential (O(2^n)): Time grows exponentially with the input size, usually considered
very inefficient for large inputs.
Factorial (O(n!)): Time grows very quickly based on the factorial function, typically
impractical for large inputs.
Order of growth for various time complexities, demonstrating how their values change as the
input size n increases. We'll calculate the values for n=1,2,4,8,16,32 to illustrate how each time
complexity grows as the input size doubles.
ASYMPTOTIC NOTATION AND ITS PROPERTIES
Asymptotic notation
Asymptotic notation is a mathematical notation used to describe the behavior of an
algorithm as the size of the input (usually denoted by n) becomes arbitrarily large.
Types Asymptotic Notations:
1. O Notation (Big Oh)
2. Ω Notation (Big Omega)
3. θ Notation (Big Theta)
4. Little Oh o Notation
5. Little Omega ω Notation
Key Property:
SEARCHING
Various searching techniques can be applied on the data structures to retrieve certain data.
A search operation is said to be successful only if it returns the desired element or data;
otherwise, the searching method is unsuccessful.
There are two categories these searching techniques fall into. They are −
1. Sequential Searching
2. Interval Searching
[Link] Searching
As the name suggests, the sequential searching operation traverses through each element
of the data sequentially to look for the desired data. The data need not be in a sorted manner for
this type of search.
Example − Linear Search
LINEAR SEARCH
Definition
Linear search is a type of sequential searching algorithm. In this method, every element
within the input array is traversed and compared with the key element to be found. If a match is
found in the array the search is said to be successful; if there is no match found the search is said
to be unsuccessful and gives the worst-case time complexity.
Procedure : Linear Search
The algorithm for linear search is relatively simple. The procedure starts at the very first index of
the input array to be searched.
Step 1 − Start from the 0th index of the input array, compare the key value with the value present
in the 0th index.
Step 2 − If the value matches with the key, return the position at which the value was found.
Step 3 − If the value does not match with the key, compare the next element in the array.
Step 4 − Repeat Step 3 until there is a match found. Return the position at which the match was
found.
Step 5 − If it is an unsuccessful search, print that the element is not present in the array and exit
the program.
Algorithm
Linear Search ( Array Arr, Value a ) // Arr is the name of the array, and a is the searched
element.
Pseudocode
procedure linear_search (list, value)
for each item in the list
if match item == value
return the item's location
end if
end for
end procedure
Program
def linear_search(a, n, key):
count = 0
for i in range(n):
if(a[i] == key):
print("The element is found at position", (i+1))
count = count + 1
if(count == 0):
print("The element is not present in the array")
a = [14, 56, 77, 32, 84, 9, 10]
n = len(a)
key = 32
linear_search(a, n, key)
key = 3
linear_search(a, n, key)
Output
The element is found at position 4
The element is not present in the array
Analysis
Linear search traverses through every element sequentially therefore, the best case is
when the element is found in the very first iteration. The best-case time complexity would
be O(1).
However, the worst case of the linear search method would be an unsuccessful search that
does not find the key value in the array, it performs n iterations. Therefore, the worst-case time
complexity of the linear search algorithm would be O(n).
Example
Let us look at the step-by-step searching of the key element (say 47) in an array using the linear
search method.
Step 1
The linear search starts from the 0th index. Compare the key element with the value in the
0th index, 34.
Step 4
Now the element in 3rd index, 27, is compared with the key value, 47. They are not equal so the
algorithm is pushed forward to check the next element.
Step 5
Comparing the element in the 4th index of the array, 47, to the key 47. It is figured that both the
elements match. Now, the position in which 47 is present, i.e., 4 is returned.
4th_index_array
Now we compare the value stored at location 4, with the value being searched, i.e. 31. We
find that the value at location 4 is 27, which is not a match. As the value is greater than 27 and
we have a sorted array, so we also know that the target value must be in the upper portion of the
array.
location_4_value_27
We change our low to mid + 1 and find the new mid value again.
low = mid + 1
mid = low + (high - low) / 2
Our new mid is 7 now. We compare the value stored at location 7 with our target value 31.
at_loaction_7
The value stored at location 7 is not a match, rather it is less than what we are looking for. So, the
value must be in the lower part from this location.
location_7_not_ match
Hence, we calculate the mid again. This time it is 5.
at_location_5
We compare the value stored at location 5 with our target value. We find that it is a match.
location_5_matched
We conclude that the target value 31 is stored at location 5.
Implementation
Binary search is a fast search algorithm with run-time complexity of Ο(log n). This search
algorithm works on the principle of divide and conquer. For this algorithm to work properly, the
data collection should be in a sorted form.
Time and space complexity of Binary Search
Time Complexity:
o Best Case: O(1)
o Average Case: O(log n)
o Worst Case: O(log n)
Space Complexity: O(1), If the recursive call stack is considered then the auxiliary space
will be O(log n).
Applications of Binary Search Algorithm
Binary search can be used as a building block for more complex algorithms used in
machine learning, such as algorithms for training neural networks or finding the optimal
hyperparameters for a model.
It can be used for searching in computer graphics such as algorithms for ray tracing or
texture mapping.
It can be used for searching a database.
Advantages of Binary Search
Binary search is faster than linear search, especially for large arrays.
More efficient than other searching algorithms with a similar time complexity, such as
interpolation search or exponential search.
Binary search is well-suited for searching large datasets that are stored in external
memory, such as on a hard drive or in the cloud.
Disadvantages of Binary Search
The array should be sorted.
Binary search requires that the data structure being searched be stored in contiguous
memory locations.
Binary search requires that the elements of the array be comparable, meaning that they
must be able to be ordered.
INTERPOLATION SEARCH
Definition
Interpolation search finds a particular item by computing the probe position. Initially, the
probe position is the position of the middle most item of the collection
Interpolation search is an improved variant of binary search. This search algorithm works
on the probing position of the required value. For this algorithm to work properly, the data
collection should be in a sorted form and equally distributed.
If a match occurs, then the index of the item is returned. To split the list into two parts,
we use the following method −
where −
A = list
Lo = Lowest index of the list
Hi = Highest index of the list
A[n] = Value stored at index n in the list
If the middle item is greater than the item, then the probe position is again calculated in
the sub-array to the right of the middle item. Otherwise, the item is searched in the sub-array to
the left of the middle item. This process continues on the sub-array as well until the size of
subarray reduces to zero.
Algorithm: Interpolation Search
1. Start searching data from middle of the list.
2. If it is a match, return the index of the item, and exit.
3. If it is not a match, probe position.
4. Divide the list using probing formula and find the new middle.
5. If data is greater than middle, search in higher sub-list.
6. If data is smaller than middle, search in lower sub-list.
7. Repeat until match.
Pseudocode
A → Array list
N → Size of A
X → Target Value
Procedure Interpolation_Search()
Set Lo → 0
Set Mid → -1
Set Hi → N-1
While X does not match
if Lo equals to Hi OR A[Lo] equals to A[Hi]
EXIT: Failure, Target not found
end if
Set Mid = Lo + ((Hi - Lo) / (A[Hi] - A[Lo])) * (X - A[Lo])
if A[Mid] = X
EXIT: Success, Target found at Mid
else
if A[Mid] < X
Set Lo to Mid+1
else if A[Mid] > X
Set Hi to Mid-1
end if
end if
End While
End Procedure
Analysis
Runtime complexity of interpolation search algorithm is Ο(log (log n)) as compared to Ο(log n)
of BST in favorable situations.
Program
def interpolation_search( data, arr):
lo = 0
hi = len(arr) - 1
mid = -1
comparisons = 1
index = -1
while(lo <= hi):
print("Comparison ", comparisons)
print("lo : ", lo)
print("list[", lo, "] = ")
print(arr[lo])
print("hi : ", hi)
print("list[", hi, "] = ")
print(arr[hi])
comparisons = comparisons + 1
#probe the mid point
mid = lo + (((hi - lo) * (data - arr[lo])) // (arr[hi] - arr[lo]))
print("mid = ", mid)
#data found
if(arr[mid] == data):
index = mid
break
else:
if(arr[mid] < data):
#if data is larger, data is in upper half
lo = mid + 1
else:
#if data is smaller, data is in lower half
hi = mid - 1
print("Total comparisons made: ")
print(comparisons-1)
return index
arr = [10, 14, 19, 26, 27, 31, 33, 35, 42, 44]
#find location of 33
location = interpolation_search(33, arr)
#if element was found
if(location != -1):
print("Element found at location: ", (location+1))
else:
print("Element not found.")
Output
Comparison 1
lo : 0
list[ 0 ] = 10
hi : 9
list[ 9 ] = 44
mid = 6
Total comparisons made:
1
Element found at location: 7
Example
To understand the step-by-step process involved in the interpolation search, let us look at an
example and work around it.
Consider an array of sorted elements given below −
array_of_sorted_elements
Let us search for the element 19.
Solution
Unlike binary search, the middle point in this approach is chosen using the formula −
at_index_2
Comparing the key element given, that is 19, to the element present in the mid index, it is found
that both the elements match.
Therefore, the element is found at index 2.
Time Complexity for Interpolation Search:
Time Complexity
Best Case: O(1) — the target is found immediately.
Average Case: O(logn) — the search works efficiently with uniformly distributed data.
Worst Case: O(n) — when the data is not uniformly distributed, and the probe position
leads to inefficient searches (degrades to linear search).
Space Complexity:
The space complexity of interpolation search is O(1), as it only uses a few extra variables
to store the low, high, and pos indices.
Applications
1. Uniformly distributed data.
2. Predicting values based on known data.
3. Memory address calculation in systems programming.
Advantages
1. Faster than binary search for uniformly distributed data.
2. Adaptive to data distribution.
3. Efficient in uniformly distributed datasets.
4. Low space complexity O(1).
Disadvantages 4.
1. Poor performance with non-uniformly distributed data (worst-case O(n).
2. Requires sorted data.
3. More complex to implement compared to binary search.
4. Can degrade to linear search on sparse datasets
Time complexity of linear, binary and Interpolation Search
PATTERN SEARCH
There are multiple pattern searching algorithms that works in different ways. The main goal to
design these type of algorithms is to reduce the time complexity. The traditional approach may
take lots of time to complete the pattern searching task for a longer text.
Algorithm
1. Start at the first position of the text T (from index i = 0).
2. For each position i in the text, check if the substring of T starting at i and having the same
length as P matches the pattern P.
3. If a match is found, print the starting position i.
4. Continue this process until all positions in T are checked.
Pseudocode:
Algorithm NaiveStringMatch(text, pattern):
Input:
- text: a string of length n
- pattern: a string of length m
Output:
- The starting indices of all occurrences of pattern in text
n = length of text
m = length of pattern
for i = 0 to (n - m): // Traverse text from 0 to n - m
if text[i:i+m] == pattern: // Compare substring of text with pattern
print("Pattern found at index", i)
Explanation:
1. Input: You have two strings: text (of length n) and pattern (of length m).
2. Loop: The loop starts from index i = 0 and goes until i = n - m because that's the last
index where the pattern can fit inside the text.
3. Substring Comparison: For each index i, we compare the substring text[i:i+m] (which is
the substring of length m starting from index i) with the pattern.
4. Match Found: If a match is found, the algorithm prints the index i where the match
starts.
5. Output: The algorithm prints all the starting positions where the pattern is found in the
text.
Program
def naive_string_match(text, pattern):
n = len(text)
m = len(pattern)
# Loop through all positions in the text where the pattern could fit
for i in range(n - m + 1):
# Compare the substring of text with the pattern
if text[i:i+m] == pattern:
print(f"Pattern found at index {i}")
# Test the function
text = "abcabcabc"
pattern = "abc"
naive_string_match(text, pattern)
Output
Pattern found at index 0
Pattern found at index 3
Pattern found at index 6
Time Complexity:
Best case: O(n) (If the first position matches)
Worst case: O(n * m) (When there is no match, and we compare each substring of length
m with the pattern)
fxQnipla. : &how q-1,._ (?.;.1Y1pl)¥i,gon rfta.. No.\ve. olh;~ N~
P = ~ oor T: o oc, o / o c c ,c I c c,o 1 •
'
I L. s- b , lt ~ (0
I
II 1'1- ~ 14
(!) 0
I
0 e> I 0 c:, 0 ·l 0 I .0 0 0 J
j
'
. '
. ,,
fiotclt £011-fld . ir1 ind,.x J]
P<Jli~t> C) 0 0 J
~
Tut '
0 0
!
C) C) \ C>
.• I
C) G l I
(!J I C> C) D , I
J\ ' i\ ,, '
.. I,
' i,
' ,, l ?;, No ~ch-
pcJ:t.,n . C!) G 0 I
I
I
I C C {) C) C> C) \
C, C C 0
'' ', ', . I '
0 I
' I I -
~ No ma.l-ch .
j
0 0 0 I
.I'
\✓ :::>N o tlletf-ch •
Poitvrn I
'
-◊Io C) j
--I [Link] 0 0 0 0 l 0 0 0 l 0 \ C D 0 I
I ... , '"
\
,---
. I'
.,
"' 1PJ:Ji fcll-tld o± i
I
1
0 D 0 )
p~ln
a 0
(!) 0 0 0 ( o O D I O f C O O I
~➔ No [Link].h
...,_i.....,..s. .~.,----,
0 c, 0 J
&ft '13ht ol\4.- posi l:io~
to 1 ,,._ ) I
0 O OD\ o DD I 0 \ C> o o ·J
--:;) No mal-~ .
o o o I
0 ' 'L
0 I 1- ~
t-f-------1
--.....> No Holc..h.
C) 0 O /
l- I~ f4
0 C) 00 l O CJ O l b J O o o J
[Link] [Link] oi
l I
l1
\ I ' -
RABIN-KARP ALGORITHM
The Rabin-Karp Algorithm is a more efficient string matching algorithm that uses
hashing to find patterns in a text. Instead of comparing the pattern with every substring of the
text directly, it computes a hash value for the pattern and for substrings of the text, and then
compares these hash values. If two substrings have the same hash value, only then do we check
if they are actually identical. This approach significantly reduces the number of comparisons,
especially when dealing with large texts and patterns.
1. Hash Function:
A hash function is used to generate a hash value for both the pattern and the
substrings of the text. We usually use a rolling hash function to efficiently compute
hashes for all substrings of T.
2. Rolling Hash:
A rolling hash is a technique that helps in calculating the hash for the next
substring using the hash of the current substring. This allows us to avoid recalculating the
hash from scratch for each new substring.
3. Spurious Hit:
When the hash value of the pattern matches with the hash value of a window
of the text but the window is not the actual pattern then it is called a spurious hit.
Spurious hit increases the time complexity of the algorithm. In order to minimize
spurious hit, we use good hash function. It greatly reduces the spurious hit.
Algorithm:
1. Calculate the hash of the pattern (P).
2. Calculate the hash of the first substring of T of length m (same as the pattern).
3. Compare the hashes of the pattern and the first substring of T. If they are equal, check if
the actual substrings are the same (because different strings can have the same hash,
though rare).
4. Slide over the text: For the next substring, calculate the new hash using the rolling hash
technique and compare it with the hash of the pattern.
5. Repeat this process for each substring in the text.
Pseudocode:
# Calculate d^(m-1) % q
for i in range(m - 1):
h = (h * d) % q
Output:
Pattern found at index 0
Pattern found at index 3
Pattern found at index 6
Pattern found at index 9
1·1 ...,, ... I (' c_ V- -
Hhe_LJ ~ (b x (~)
\_ old
- C Id X
0
bn -1)\ + C new\ ) n,od p. "
• _f-r< 1 Text
Q....1
6-e 2.
c.. •3
<?1 - ',-
e - s-
h(p) f -b
a a b j -j
lfJ t2 ~ 4 h-a
~ 'f' "I - 9
hCV)b ~
I
h(p_) J -f◊
h-:: b f f- l t I
\_-- ',./
t { I
2..
'
P~ r mo.b:h ::- ~ C__au b)
....3 -3 3 fit-
Ex·. 2- c,.. ~ \
1e~t o. b c__ b C Q_
b9'-2-
c..- 3
ci - Lt
e-s
p<illR-1n f-b
HObh eoct- foY
5,1
h-8
palhrn : - 1 ~ : : [o j '
.' ·-9
2- f-6t 5:.. ID J :c- lO
,~xt a b c__ ej
l L 1\ ]
I t 2-t J
(
b
£x:3
C. e.. 0. - .;+~+, :: 7
0.. - \
b:. I() b-1-
h :- 3 -i -J
c, : d
Q.2--: b d-4
--
3~ 3-2- 3-3 e-s-
H = 4x lo +~xlo + lx10 C3 ::: a. .
f- l, ·
l.. \
+2X.!O + I ')(ID
()
:: 4-x Jo ~-)
h- &
:: J+o o + 2-0 +I = 1t 2 l l -~
'
j - l0
Text : c.. c_ a
J ~ I
'l-- l Cl
I-} :JXIO +3Xlb + IKID C. c_ 0-- e. c... 0..
J \ t J
~ 3o o + 2> 0 + I :: 33) 33\
300+2>0-tl
/1on h urxkt- .ror rr,lili)_ fOY l'\ILX.t 1-\ov:ih [Link] f01 Tux t
KNUTH-MORRIS-PRATT ALGORITHM
The Knuth-Morris-Pratt (KMP) algorithm is an efficient string-matching algorithm
that avoids unnecessary comparisons by utilizing previously gathered information about the
pattern. Instead of starting from the beginning each time when a mismatch occurs, KMP uses a
"partial match" table (often called the prefix function or failure function) to skip ahead
intelligently.
while i < n:
if pattern[i] == pattern[length]:
length += 1
LPS[i] = length
i += 1
else:
if length != 0:
length = LPS[length - 1] // Use the previous LPS value
else:
LPS[i] = 0
i += 1
return LPS
Step 2: Searching the Text
arduino
Copy
Algorithm KMP_Search(text, pattern):
n = length of text
m = length of pattern
LPS = ComputeLPSArray(pattern)
i = 0 // Index for text
j = 0 // Index for pattern
while i < n:
if text[i] == pattern[j]:
i += 1
j += 1
if j == m:
print("Pattern found at index", i - j)
j = LPS[j - 1] // Use LPS to shift the pattern
elif i < n and text[i] != pattern[j]:
if j != 0:
j = LPS[j - 1] // Use LPS to shift the pattern
else:
i += 1
Example:
Let’s walk through an example using the pattern "ABAB" and the text "ABABABAB".
Pattern: "ABAB"
Text: "ABABABAB"
1. Compute the LPS Array for the pattern "ABAB":
o LPS = [0, 0, 1, 2]
2. Search the pattern in the text using the LPS array:
o At position i = 0, j = 0: text[0] == pattern[0] → match → move i and j to 1.
o At position i = 1, j = 1: text[1] == pattern[1] → match → move i and j to 2.
o At position i = 2, j = 2: text[2] == pattern[2] → match → move i and j to 3.
o At position i = 3, j = 3: text[3] == pattern[3] → match → pattern found at index 0
→ set j = LPS[j - 1] = 2.
o Continue this process until the whole text is processed.
Time Complexity:
Preprocessing time (LPS array): O(m), where m is the length of the pattern.
Search time: O(n), where n is the length of the text.
Overall time complexity: O(n + m).
This is much faster than the naive string-matching algorithm, which has a worst-case time
complexity of O(n * m).
Advantages:
The KMP algorithm improves the brute-force string search by reducing the number of
character comparisons through the use of the LPS array.
It guarantees O(n + m) time complexity, which is optimal for string searching
algorithms.
Given text: abcxabcdabxabcdabcdabcy
Given pattern : abcdabcy
Step 1: we will construct the prefix table for the given pattern as follows.
0 1 2 3 4 5 6 7
a b c D a b c Y
0 0 0 0 1 2 3 0
Step2: now start matching search for pattern against the text with the help of prefix table.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 2
2
a b c x a b c d a b x a b c d a b c d a B c Y
√ √ √ x
a b c d a b c y
a b c x a b c d a b X a b c d a b c d a B c Y
a b c d a b c Y
a b c x a b c d a b X a b c d a b c d a B c Y
√ √ √ √ √ √ x
a b c d a b c y
Text index of unmatched character – prefixtable[pattern index – 1]
=10 – prefixtable[6-1]
=10- 2 = 8
Step 5:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
a b c x a b c d a b X a b c d a b c d a B c Y
√ √ x
a b c d a b c y
a b c x a b c d a b X a b c d a b c d a B c Y
a b c d a b c y
Text[10] is not matching with pattern[0]. Hence shift one position next.
Step 7:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
a b c x a b c d a b X a b c d a b c d a B c Y
√ √ √ √ √ √ √ x
a b c d a B c y
a b c x a b c d a b X a b c d a b c d a B c Y
√ √ √ √ √ √ √ √
a b c d a B c Y
'
tnolLh in ~ t;t and on o c.tute11LC2..... of [Link]
w<L· aw°(j 1k Infor maIJo 'il B-n.J r eofo. rt
1k C?DIYJpo.n;\on , fof. [Link]'fun ~ef ef [Link].dPr8 ft•111
1k. Gxt ' ~ aarun and ~ai n with NZAt rt'ICYeo.f1"1/niJ
~ i-/:-1·0 " od "fu'( , if'\D.. ~hcxv-acfo,; from pcJ1im> a.r a-
tncx.t'c.W · '1fuo UIV~~ '(e.c,kW,'. itie- e-tfrden1J 13a
I ,
~
[Link]- ~rri.s - pra±l iorir/htn t,oe_ Wlr~J'
L P.s [knaet Pre,jiv. 8U-ffitl
• LPs lLo~~t £[Link] 8(.[Link] ~]_An~:
n-f ,~ ea.. h pasit,'on tn ik_
Jha.. lJ>s orr°o lj'1oreb , c.
•
..
•
I
. ~
;\
I
l /I> •
\ .
1 I • } 1 , , !
' l 1 • , I
~
\ : : . '
.
l .· . • • I \
I A~OY;tfi ll')
1--0cr·calb c ~ )
3 Jtw, tcu.t
LP0 [_lonJ~{ Pre_fii Jl,Lfti 1].
Cl 0.. bet ~
@ A8c_fl~
-:::>
LP.s = [? o
'··
,_
.. ::a
I
® AGcAB ® ABCp A
~
p -.s
A ~ 0 11 .,.
R B A ;:().
R~ C.A
f,~ <: ~ f)!?,@ B ~0. flGc.. (1 c_ A
f\G>c C.11~
F,~c A GcFt~
"1a,1ho d ~ : -,
s- 3 6 -, ' ')_. .3 l.\- 5" b
~ ' --i..- 1-f
A R ~ B A A A
;....--"- ft A A 13 Fr A A
LP-.S le :2- 0 ?-
~
G)
f G) ~ ~ s-
> 4-
\
A-
"l-
G
0 p, R A- B A
-> .,
@ AA
(!)
'©
'
@
C)
~-
' Q
(
:!>
2-
L\ s-- l> .
~ A
.3 (1
~ r-)
@ RA A ~
a \ @ C) I . 2-- 0 (, @
~ 1-- (J) 4 ~ ;
0 A'
~
\ 1-- 3 't A- R 13 A A A
@ R R Ft g ' -~ ' ~
C I 2-
>
0 0 2- C)
2- 0
& L , ).. 3 '+ 5' b 1 8 0, ID l ~ G) 4 -5"" b , R
~AAA cAAA AAC ® 8_Mc A ~ A f)
0 I 2--0 I '1.-3~
• ').-6.) '1 S- l, 1 i .,
G) A A R t_ A A '1ft_!l
'-----
O I 2-C)l 'l-333
I 'l- 3 G) 5 b ., \ °lt,<>
CD @ f) ff ft CJ A A
0 \ 2-
e_ A 0 ½,
0 \ 2-3 3 3 Lt
•'~~ _s-(,1 t
0\ • B
A £l c A A3.
\..V I 2- o I 2.3
C> t
6 7 C\-
6
Q
I 10 JJ
(_
A A g A A
M ,€) 3 1-i ;- b , 6
~RR 8 A AC A A
Cf
- CJ I .2-01 '2-
~ \ l-Qli- If 6 ""1 8 81
cv e AGflRc_AA ~
o, or 2-~ 12-_3
0 I D J 2- ~ I ~ 1 0 '> b ""! 8 ', 10
U3I A-ABAAcf,A'BA
" - ,.,,,. ..,..._ __.....
D 10 I 2.-01 2-31t
~ ~ ~ @ b , t ~ I• /J
(?;,,. I
C) I C) I 2- 0
~ A f\13, A AC.-fr A BAA
c, I C) I 2-. 0 J 2- 3 Lt S-
0 1 0 I 2- C) l
A f\ l)f1A c_~ Pt 8 ft A
L.l>.S • [CJ I D I 2..- 0 I 2- 3 J+ .5J
/ext: .A
, A A R A BA A A f3 A
Pal: A ARA
O I 'l- 3
L.P-S [ o I 1- 3. J
.0 te:AtDJ , _ pa±[j]
'
1-r=-)
J -t; I
len3Th lrJJ "- 4 't > ~nlp~ )
j = &in (pat) -4
~ !ev{pat~
• lf= 4 - )
;ncL.x :. ', -j
:; 1+-1t
.. =- o'
J
~ l "- .3>
@ t::5 Ll -1J
j ;:
-;>Bf> ;
➔ f)
j = r p0
LPS ~ G) 1 '.I- 3 J
l
'.: I ps D-0 j = Q
mistrolch. O 2- 3
J ► lp.s[o] A A R (-)
T _. ;; ['1 A R A A B A A Fl 8 A
i
J
t) \ \...
L.\>S [ 0 I -'l.- 3.
iJ -:c2 • lps[2--1J
tni_s rri ol:Ji ' ' 1p..s [1J
J =- I
SORTING
Definition
A Sorting Algorithm is used to rearrange a given array or list of elements in an order.
Sorting is provided in library implementation of most of the programming languages.
There is a wide range of applications for these algorithms, including searching algorithms,
database algorithms, divide and conquer methods, and data structure algorithms
Sorting Basics
In-place Sorting:
An in-place sorting algorithm uses constant space for producing the output (modifies the
given array only. Examples: Selection Sort, Bubble Sort, Insertion Sort and Heap Sort.
Internal Sorting:
Internal Sorting is when all the data is placed in the main memory or internal memory. In
internal sorting, the problem cannot take input beyond allocated memory size.
External Sorting :
External Sorting is when all the data that needs to be sorted need not to be placed in
memory at a time, the sorting is called external sorting. External Sorting is used for the
massive amount of data. For example Merge sort can be used in external sorting as the
whole array does not have to be present all the time in memory,
Stable sorting:
When two same items appear in the same order in sorted data as in the original array
called stable sort. Examples: Merge Sort, Insertion Sort, Bubble Sort.
Hybrid Sorting:
A sorting algorithm is called Hybrid if it uses more than one standard sorting algorithms
to sort the array. The idea is to take advantages of multiple sorting algorithms. For
example IntroSort uses Insertions sort and Quick Sort.
Types of Sorting Techniques
There are various sorting algorithms are used in data structures. The following two
types of sorting algorithms can be broadly classified:
1. Comparison-based: We compare the elements in a comparison-based sorting algorithm)
2. Non-comparison-based: We do not compare the elements in a non-comparison-based
sorting algorithm)
Sorting algorithm
Applications
When you have hundreds of datasets you want to print, you might want to arrange them
in some way.
Once we get the data sorted, we can get the k-th smallest and k-th largest item in O(1)
time.
Searching any element in a huge data set becomes easy. We can use Binary search
method for search if we have sorted data. So, Sorting become important here.
They can be used in software and in conceptual problems to solve more advanced
problems.
INSERTION SORT
Insertion sort is a simple sorting algorithm that works by iteratively inserting each
element of an unsorted list into its correct position in a sorted portion of the list. It is like sorting
playing cards in your hands. You split the cards into two groups: the sorted cards and the
unsorted cards. Then, you pick a card from the unsorted group and put it in the right place in the
sorted group.
We start with second element of the array as first element in the array is assumed to be
sorted.
Compare second element with the first element and check if the second element is
smaller then swap them.
Move to the third element and compare it with the first two elements and put at its correct
position
Repeat until the entire array is sorted.
Pseudocode:
Algorithm InsertionSort(arr):
Input: An array arr of length n
Output: Sorted array arr
for i = 1 to n-1: // Start from the second element
key = arr[i] // Set the key as the current element
j=i-1 // Set j to the index just before i
// Shift elements of arr[0..i-1], that are greater than key, to one position ahead
while j >= 0 and arr[j] > key:
arr[j + 1] = arr[j] // Shift the element to the right
j=j-1 // Move left in the array
arr[j + 1] = key // Place the key in its correct position
return arr // Return the sorted array
Program
# Function to sort array using insertion sort
def insertionSort(arr):
for i in range(1, len(arr)):
key = arr[i]
j=i-1
# Move elements of arr[0..i-1], that are
# greater than key, to one position ahead
# of their current position
while j >= 0 and key < arr[j]:
arr[j + 1] = arr[j]
j -= 1
arr[j + 1] = key
# A utility function to print array of size n
def printArray(arr):
for i in range(len(arr)):
print(arr[i], end=" ")
print()
# Driver method
if __name__ == "__main__":
arr = [12, 11, 13, 5, 6]
insertionSort(arr)
printArray(arr)
# This code is contributed by Hritik Shah.
Output
5 6 11 12 13
Time Complexity of Insertion Sort
Best case: O(n), If the list is already sorted, where n is the number of elements in the list.
Average case: O(n2), If the list is randomly ordered
Worst case: O(n2), If the list is in reverse order
Space Complexity of Insertion Sort
Auxiliary Space: O(1), Insertion sort requires O(1) additional space, making it a space-
efficient sorting algorithm.
Advantages of Insertion Sort:
Simple and easy to implement.
Stable sorting algorithm.
Efficient for small lists and nearly sorted lists.
Space-efficient as it is an in-place algorithm.
Adoptive. the number of inversions is directly proportional to number of swaps. For
example, no swapping happens for a sorted array and it takes O(n) time only.
Disadvantages of Insertion Sort:
Inefficient for large lists.
Not as efficient as other sorting algorithms (e.g., merge sort, quick sort) for most cases
Example
Illustration
arr = {23, 1, 10, 5, 2}
Initial:
Current element is 23
The first element in the array is assumed to be sorted.
The sorted part until 0th index is : [23]
First Pass:
Compare 1 with 23 (current element with the sorted part).
Since 1 is smaller, insert 1 before 23 .
The sorted part until 1st index is: [1, 23]
Second Pass:
Compare 10 with 1 and 23 (current element with the sorted part).
Since 10 is greater than 1 and smaller than 23 , insert 10 between 1 and 23 .
The sorted part until 2nd index is: [1, 10, 23]
Third Pass:
Compare 5 with 1 , 10 , and 23 (current element with the sorted part).
Since 5 is greater than 1 and smaller than 10 , insert 5 between 1 and 10
The sorted part until 3rd index is : [1, 5, 10, 23]
Fourth Pass:
Compare 2 with 1, 5, 10 , and 23 (current element with the sorted part).
Since 2 is greater than 1 and smaller than 5 insert 2 between 1 and 5 .
The sorted part until 4th index is: [1, 2, 5, 10, 23]
Final Array:
The sorted array is: [1, 2, 5, 10, 23]
HEAP SORT
Heap sort is a comparison-based sorting technique based on Binary Heap Data
Structure. It can be seen as an optimization over selection sort where we first find the max (or
min) element and swap it with the last (or first). We repeat the same process for the remaining
elements.
First convert the array into a max heap using heapify, Please note that this happens in-
place. The array elements are re-arranged to follow heap properties. Then one by one delete the
root node of the Max-heap and replace it with the last node and heapify. Repeat this process
while size of heap is greater than 1.
Rearrange array elements so that they form a Max Heap.
Repeat the following steps until the heap contains only one element:
o Swap the root element of the heap (which is the largest element in current heap)
with the last element of the heap.
o Remove the last element of the heap (which is now in the correct position). We
mainly reduce heap size and do not remove element from the actual array.
o Heapify the remaining elements of the heap.
Finally we get sorted array.
Heapsort Pseudocode:
Step 2: MaxHeapify
Algorithm MaxHeapify(arr, i, n):
largest = i // Assume the current node is the largest
left = 2 * i + 1 // Left child index
right = 2 * i + 2 // Right child index
// Check if the left child exists and is greater than the current node
if left < n and arr[left] > arr[largest]:
largest = left
// Check if the right child exists and is greater than the largest so far
if right < n and arr[right] > arr[largest]:
largest = right
// If the largest is not the current node, swap and recursively heapify
if largest != i:
swap(arr[i], arr[largest])
MaxHeapify(arr, largest, n) // Recursively heapify the affected subtree
Step 3: Heapsort
Algorithm Heapsort(arr):
n = length of arr
// Step 1: Build a max heap
BuildMaxHeap(arr)
// Step 2: Extract elements from the heap one by one
for i = n - 1 down to 1: // Start from the last element
swap(arr[0], arr[i]) // Move the current root (maximum) to the end
MaxHeapify(arr, 0, i) // Heapify the root element to restore max-heap property
return arr // Sorted array
Program
# To heapify a subtree rooted with node i
# which is an index in arr[].
def heapify(arr, n, i):
# Initialize largest as root
largest = i
# left index = 2*i + 1
l=2*i+1
# right index = 2*i + 2
r=2*i+2
# If left child is larger than root
if l < n and arr[l] > arr[largest]:
largest = l
# If right child is larger than largest so far
if r < n and arr[r] > arr[largest]:
largest = r
# If largest is not root
if largest != i:
arr[i], arr[largest] = arr[largest], arr[i] # Swap
# Recursively heapify the affected sub-tree
heapify(arr, n, largest)
# Main function to do heap sort
def heapSort(arr):
n = len(arr)
# Build heap (rearrange array)
for i in range(n // 2 - 1, -1, -1):
heapify(arr, n, i)
# One by one extract an element from heap
for i in range(n - 1, 0, -1):
# Move root to end
arr[0], arr[i] = arr[i], arr[0]
# Call max heapify on the reduced heap
heapify(arr, i, 0)
def printArray(arr):
for i in arr:
print(i, end=" ")
print()
# Driver's code
arr = [9, 4, 3, 8, 10, 2, 5]
heapSort(arr)
print("Sorted array is ")
printArray(arr)
Output
Sorted array is
2 3 4 5 8 9 10
Time Complexity: O(n log n)
Auxiliary Space: O(log n), due to the recursive call stack. However, auxiliary space can be O(1)
for iterative implementation.
Efficient Time Complexity: Heap Sort has a time complexity of O(n log n) in all cases.
This makes it efficient for sorting large datasets. The log n factor comes from the height
of the binary heap, and it ensures that the algorithm maintains good performance even
with a large number of elements.
Memory Usage: Memory usage can be minimal (by writing an iterative heapify() instead
of a recursive one). So apart from what is necessary to hold the initial list of items to be
sorted, it needs no additional memory space to work
Costly: Heap sort is costly as the constants are higher compared to merge sort even if the
time complexity is O(n Log n) for both.
Inefficient: Heap Sort is not very efficient because of the high constants in the time
complexity.
H::'i:f ~Ol r £xo.!9r~
[5 /4 1
8I 2.. .1 9 / 8
'g___~o't' b r, ~.
-g+e.r I
. b£Y I
I
[Link] dsil.J=ion of dv lJ · [Link].J w~ih (1_ Y
-
JO
pti. H V\ t BI J ~ wi.s e. Sw o. p
eru1 atk1 ·
8tep=-3.
~
j,h o.y ,~ •
1 0 [6,
D,~,°')
--
~/cD
[ ~/=f]
~~~- ~ 8tp lo
I
lh. ~'Yr ~, (],n C)..Y'"ftJ •
®
sj Li-, 4, S- b , 8-, °!]
1
(!, bdV :lJ [ it J 51 • I