2
O(n ) sorting algorithms
Selection sort and insertion sort are both O(n2)
O(n2) sorting is infeasible for n over 100000
Merge sort
A different strategy?
Divide array in two equal parts
Separately sort left and right half
Combine the two sorted halves to get the full array
sorted
Divide-and-Conquer
In computer science, divide and conquer is an algorithm design paradigm. A
divide-and-conquer algorithm recursively breaks down a problem into two or
more sub-problems of the same or related type, until these become simple
enough to be solved directly.
Divide And Conquer
This technique can be divided into the following three parts:
Divide: This involves dividing the problem into smaller sub-problems.
Conquer: Solve sub-problems by calling recursively until solved.
Combine: Combine the sub-problems to get the final solution of the whole
problem.
To use the divide and conquer algorithm, recursion is used
Combining sorted lists
Given two sorted lists A and B, combine into a
sorted list C
Compare first element of A and B
Move it into C
Repeat until all elements in A and B are over
Merging A and B
Merging two sorted lists
32 74 89
21 55 64
Merging two sorted lists
32 74 89
21 55 64
21
Merging two sorted lists
32 74 89
21 55 64
21 32
Merging two sorted lists
32 74 89
21 55 64
21 32 55
Merging two sorted lists
32 74 89
21 55 64
21 32 55 64
Merging two sorted lists
32 74 89
21 55 64
21 32 55 64 74
Merging two sorted lists
32 74 89
21 55 64
21 32 55 64 74 89
Merge Sort
Sort A[0] to A[n/2 - 1]
Sort A[n/2] to A[n-1]
Merge sorted halves into C[0 .. n-1]
How do we sort the halves?
Recursively, using the same strategy!
Merge Sort
43 32 22 78 63 57 91 13
Merge Sort
43 32 22 78 63 57 91 13
43 32 22 78
Merge Sort
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
Merge Sort
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
43 32 22 78
Merge Sort
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
Merge Sort
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
43 32
Merge Sort
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
43 32 22 78
Merge Sort
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
43 32 22 78 63 57
Merge Sort
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
Merge Sort
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
32 43 22 78 63 57 91 13
43 32 22 78 63 57 91 13
Merge Sort
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
32 43 22 78 63 57 91 13
43 32 22 78 63 57 91 13
Merge Sort
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
32 43 22 78 57 63 91 13
43 32 22 78 63 57 91 13
Merge Sort
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
32 43 22 78 57 63 13 91
43 32 22 78 63 57 91 13
Merge Sort
43 32 22 78 63 57 91 13
22 32 43 78 63 57 91 13
32 43 22 78 57 63 13 91
43 32 22 78 63 57 91 13
Merge Sort
43 32 22 78 63 57 91 13
22 32 43 78 13 57 63 91
32 43 22 78 57 63 13 91
43 32 22 78 63 57 91 13
Merge Sort
13 22 32 43 57 63 78 91
22 32 43 78 13 57 63 91
32 43 22 78 57 63 13 91
43 32 22 78 63 57 91 13
Merging sorted lists
Combine two sorted lists A and B into C
If A is empty, copy B into C
If B is empty, copy A into C
Otherwise, compare first element of A and B and
move the smaller of the two into C
Repeat until all elements in A and B have been
moved
Merging
function Merge(A,m,B,n,C)
// Merge A[0..m-1], B[0..n-1] into C[0..m+n-1]
i = 0; j = 0; k = 0;
// Current positions in A, B, C respectively
while (k < m+n)
// Case 1: Move head of A into C
if (j==n or A[i] <= B[j])
C[k] = A[i]; i++; k++;
// Case 2: Move head of B into C
if (i==m or A[i] > B[j])
C[k] = B[j]; j++; k++;
Merge Sort
To sort A[0..n-1] into C[0..n-1]
If n is 1, nothing to be done
Otherwise
Sort A[0..n/2-1] into L (left)
Sort A[n/2..n-1] into R (right)
Merge L and R into C
Merge Sort
Left = 0 and right = n
function MergeSort(A,left,right,C)
// Sort the segment A[left..right-1] into C
if (right - left == 1) // Base case
C[0] = A[left]
if (right - left > 1) // Recursive call
mid = (left+right)/2
MergeSort(A,left,mid-1,L)
MergeSort(A,mid,right-1,R)
Merge(L,mid-left,R,right-mid,C)
Merging sorted lists
Combine two sorted lists A and B into C
If A is empty, copy B into C
If B is empty, copy A into C
Otherwise, compare first element of A and B and
move the smaller of the two into C
Repeat until all elements in A and B have been
moved
Merging
function Merge(A,m,B,n,C)
// Merge A[0..m-1], B[0..n-1] into C[0..m+n-1]
i = 0; j = 0; k = 0;
// Current positions in A,B,C respectively
while (k < m+n)
// Case 1: Move head of A into C
if (j==n or A[i] <= B[j])
C[k] = A[i]; i++; k++
// Case 2: Move head of B into C
if (i==m or A[i] > B[j])
C[k] = B[j]; j++; k++
Analysis of Merge
How much time does Merge take?
Merge A of size m, B of size n into C
In each iteration, we add one element to C
Size of C is m+n
m+n ≲2 max(m,n)
Hence O(max(m,n)) = O(n) if m ≈ n
Merge Sort
To sort A[0..n-1] into C[0..n-1]
If n is 1, nothing to be done
Otherwise
Sort A[0..n/2-1] into L (left)
Sort A[n/2..n-1] into R (right)
Merge L and R into C
Analysis of Merge Sort
…
t(n): time taken by Merge Sort on input of size n
Assume, for simplicity, that n = 2k
t(n) = 2t(n/2) + n
Two subproblems of size n/2
Merging solutions requires time O(n/2+n/2) = O(n)
Solve the recurrence by unwinding
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
L1.47
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
T(n)
L1.48
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn
T(n/2) T(n/2)
L1.49
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn
cn/2 cn/2
T(n/4) T(n/4) T(n/4) T(n/4)
L1.50
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn
cn/2 cn/2
cn/4 cn/4 cn/4 cn/4
(1)
L1.51
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn
cn/2 cn/2
h = lg n cn/4 cn/4 cn/4 cn/4
(1)
L1.52
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn cn
cn/2 cn/2
h = lg n cn/4 cn/4 cn/4 cn/4
(1)
L1.53
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn cn
cn/2 cn/2 cn
h = lg n cn/4 cn/4 cn/4 cn/4
(1)
L1.54
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn cn
cn/2 cn/2 cn
h = lg n cn/4 cn/4 cn/4 cn/4 cn
…
(1)
L1.55
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn cn
cn/2 cn/2 cn
h = lg n cn/4 cn/4 cn/4 cn/4 cn
…
(1) #leaves = n (n)
L1.56
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn cn
cn/2 cn/2 cn
h = lg n cn/4 cn/4 cn/4 cn/4 cn
…
(1) #leaves = n (n)
Total(n lg n)
Analysis of Merge Sort …
t(1) = 1
Analysis of Merge Sort …
t(1) = 1
t(n) = 2t(n/2) + n
Analysis of Merge Sort …
t(1) = 1
t(n) = 2t(n/2) + n
= 2 [ 2t(n/4) + n/2 ] + n = 2 2 t(n/22) + 2n
Analysis of Merge Sort …
t(1) = 1
t(n) = 2t(n/2) + n
= 2 [ 2t(n/4) + n/2 ] + n = 2 2 t(n/22) + 2n
2 3 2 3 3
= 2 [ 2t(n/2 ) + n/2 ] + 2n = 2 t(n/2 ) + 3n
…
Analysis of Merge Sort …
t(1) = 1
t(n) = 2t(n/2) + n
= 2 [ 2t(n/4) + n/2 ] + n = 2 2 t(n/22) + 2n
2 3 2 3 3
= 2 [ 2t(n/2 ) + n/2 ] + 2n = 2 t(n/2 ) + 3n
…
= 2 k t(n/2k) + kn
Analysis of Merge Sort …
t(1) = 1
t(n) = 2t(n/2) + n
= 2 [ 2t(n/4) + n/2 ] + n = 2 2 t(n/22) + 2n
2 3 2 3 3
= 2 [ 2t(n/2 ) + n/2 ] + 2n = 2 t(n/2 ) + 3n
…
= 2 k t(n/2k) + kn
When n = 2k
Hence, n/2k = 1,
so t(n/2k) = 1
Analysis of Merge Sort …
t(1) = 1
t(n) = 2t(n/2) + n
= 2 [ 2t(n/4) + n/2 ] + n = 2 2 t(n/22) + 2n
2 3 2 3 3
= 2 [ 2t(n/2 ) + n/2 ] + 2n = 2 t(n/2 ) + 3n
…
= 2 k t(n/2k) + kn
k k
When n = 2k, Hence, n/2 = 1, so t(n/2 ) = 1
log n means log2 n unless otherwise specified!
Analysis of Merge Sort …
t(1) = 1
t(n) = 2t(n/2) + n
2 2
= 2 [ 2t(n/4) + n/2 ] + n = 2 t(n/2 ) + 2n
= 2 2 [ 2t(n/23) + n/22 ] + 2n = 2 3 t(n/23) + 3n
…
= 2 k t(n/2k) + kn
When n = 2k, Hence, n/2k = 1, so t(n/2k) = 1
When n = 2k , Hence, k = log n
log n means log2 n unless otherwise specified!
t(n) = 2 k t(n/2k) + kn = n + (log n) n = n + n log n = O(n log n)
O(n log n) sorting
Recall that O(n log n) is much more efficient than
O(n2)
Assuming 108 operations per second, feasible
input size goes from 10,000 to 10,000,000
(10 million or 1 crore)
Example 1: Sorting …
Assume,
Telephone directory for mobile phone users in India
India has about 1 lakhs = 105 phones
Naïve n2 algorithm requires 1010 operations
108 operations per second ⟹ 102 seconds
Typical functions t(n)…
Input log n n n log n n2 n3 2n n!
10 3.3 10 33 100 1000 1000 106
100 6.6 100 66 104 106 1030 10157
1000 10 1000 104 106 109
104 13 104 105 108 1012
105 17 105 106 1010
106 20 106 107
107 23 107 108
108 27 108 109
109 30 109 1010
1010 33 1010
Variations on merge
Union of two sorted lists (discard duplicates)
If A[i] == B[j], copy A[i] to C[k] and increment i,j,k
Intersection of two sorted lists
If A[i] < B[j], increment i
If B[j] < A[i], increment j
If A[i] == B[j], copy A[i] to C[k] and increment i,j,k
Exercise: List difference: elements in A but not in B
Merge Sort: Shortcomings
Merging A and B creates a new array C
No obvious way to efficiently merge in place
Extra storage can be costly
Inherently recursive
Recursive call and return are expensive
Quicksort
Alternative approach
Extra space is required to merge
Merging happens because elements in left half
must move right and vice versa
Can we divide so that everything to the left is
smaller than everything to the right?
No need to merge!
Divide and conquer without merging
Suppose the median value in A is m
Move all values ≤ m to left half of A
Right half has values > m
This shifting can be done in place, in time O(n)
Recursively sort left and right halves
A is now sorted! No need to merge
t(n) = 2t(n/2) + n = O(n log n)
Divide and conquer without merging
How do we find the median?
Sort and pick up middle element
But our aim is to sort!
Instead, pick up some value in A — pivot
Split A with respect to this pivot element
Quicksort
Choose a pivot element
Typically the first value in the array
Partition A into lower and upper parts with respect
to pivot
Move pivot between lower and upper partition
Recursively sort the two partitions
Quicksort
High level view
Quicksort
High level view
43 32 22 78 63 57 91 13
Quicksort
High level view
43 32 22 78 63 57 91 13
Quicksort
High level view
43 32 22 78 63 57 91 13
Quicksort
High level view
13 32 22 43 63 57 91 78
Quicksort
High level view
13 22 32 43 57 63 78 91
Quicksort: Partitioning
Quicksort: Partitioning
43 32 22 78 63 57 91 13
Quicksort: Partitioning
43 32 22 78 63 57 91 13
Quicksort: Partitioning
43 32 22 78 63 57 91 13
Quicksort: Partitioning
43 32 22 78 63 57 91 13
Quicksort: Partitioning
43 32 22 78 63 57 91 13
Quicksort: Partitioning
43 32 22 78 63 57 91 13
Quicksort: Partitioning
43 32 22 78 63 57 91 13
Quicksort: Partitioning
43 32 22 78 63 57 91 13
Quicksort: Partitioning
43 32 22 78 63 57 91 13
Quicksort: Partitioning
43 32 22 78 63 57 91 13
Quicksort: Partitioning
43 32 22 13 63 57 91 78
Quicksort: Partitioning
13 32 22 43 63 57 91 78
Quicksort: Implementation
Quicksort(A,l,r) // Sort A[l..r-1]
if (r - 1 <= l) return; // Base case
// Partition with respect to pivot, a[l]
yellow = l+1;
for (green = l+1; green < r; green++)
if (A[green] <= A[l])
swap(A,yellow,green);
yellow++;
swap(A,l,yellow-1); // Move pivot into place
// Recursive calls
Quicksort(A, l, yellow-1);
Quicksort(A, yellow, r);
Quicksort:
Another Partitioning Strategy
Quicksort:
Another Partitioning Strategy
43 32 22 78 63 57 91 13
Quicksort:
Another Partitioning Strategy
43 32 22 78 63 57 91 13
Quicksort:
Another Partitioning Strategy
43 32 22 78 63 57 91 13
Quicksort:
Another Partitioning Strategy
43 32 22 78 63 57 91 13
Quicksort:
Another Partitioning Strategy
43 32 22 13 63 57 91 78
Quicksort:
Another Partitioning Strategy
43 32 22 13 63 57 91 78
Quicksort:
Another Partitioning Strategy
43 32 22 13 63 57 91 78
Quicksort:
Another Partitioning Strategy
43 32 22 13 63 57 91 78
Quicksort:
Another Partitioning Strategy
13 32 22 43 63 57 91 78
Quicksort
Choose a pivot element
Typically the first value in the array
Partition A into lower and upper parts with respect
to pivot
Move pivot between lower and upper partition
Recursively sort the two partitions
Analysis of Quicksort
Partitioning with respect to pivot takes O(n)
If pivot is median
Each partition is of size n/2
t(n) = 2t(n/2) + n = O(n log n)
Worst case?
Analysis of Quicksort
Worst case
Pivot is maximum or minimum
One partition is empty Other
is size n-1
t(n) = t(n-1) + n = t(n-2) + (n-1) + n
= … = 1 + 2 + … + n = O(n2)
Already sorted array is worst case input!
Analysis of Quicksort
But …
Average case is O(n log n)
Sorting is a rare example where average case
can be computed
What does average case mean?
Quicksort: Average case
Assume input is a permutation of {1,2,…,n}
Actual values not important
Only relative order matters
Each input is equally likely (uniform probability)
Calculate running time across all inputs
Expected running time can be shown O(n log n)
Quicksort: randomization
Worst case arises because of fixed choice of pivot
We chose the first element
For any fixed strategy (last element, midpoint), can
work backwards to construct O(n2) worst case
Instead, choose pivot randomly
Pick any index in [0..n-1] with uniform probability
Expected running time is again O(n log n)
Iterative Quicksort
Recursive calls work on disjoint segments of array
No recombination of results required
Can use an explicit stack to simulate recursion
Stack only needs to store left and right
endpoints of interval to be sorted
Quicksort in practice
In practice, Quicksort is very fast
Typically the default algorithm for in-built sort
functions
Spreadsheets
Built in sort function in programming
languages
A machine needs a minimum of 400 sec to sort 2000
elements by Quick sort. The minimum time needed to
sort 300 elements will be approximately _____
Sorting –
Concluding remarks
130
In-place sorting
An in-place sorting algorithm sorts an array without
using any extra memory, or only a small amount of extra
storage. This means that the algorithm modifies the
original array to produce the sorted output.
Stable sorting
A sorting algorithm is said to be stable if it
maintains the relative order of records in the case
of equality of keys.
Stable sorting
Sorting on multiple criteria
Assume students are listed in alphabetical order
Now sort students by marks
After sorting, are students with equal marks still
in alphabetical order?
Stability is crucial in applications like spreadsheets
Sorting column B should not disturb previous
sort on column A
Stable sorting
…
Quicksort, as shown, is not stable
Swap operation during partitioning disturbs
original order
Merge sort is stable if we merge carefully
Do not allow elements from right to overtake
elements from left
Favour left list when breaking ties
Adaptive sorting
The sorting algorithms in which the order of
elements affects the time complexity of the
sorting algorithm is known as an adaptive sorting
Which is the best?
Typically Quicksort
Be careful to avoid worst-case
Randomize choice of pivot element
Mergesort is used for “external” sorting
Database tables do not fit in memory
Need to sort on disk
Which is the best?
Other O(n log n) algorithms exist
Heap sort
Naive O(n2) not used except when data is small
Hybrid algorithms
Use divide and conquer for large n
Switch to insertion sort when n small (e.g. n <16)