06 Searching and Sorting Algorithms
06 Searching and Sorting Algorithms
Searching and sorting algorithms are fundamental topics in computer science. They are widely used in various
applications, such as databases, web search engines, cryptography, artificial intelligence, and more. In this
lecture, we will learn about some of the most common searching and sorting algorithms, their properties, and
how to implement them in code.
• Linear search
• Binary search
• Hash table search
You will also need to know how to implement a binary search tree.
The search is terminated once the target element is located, or it reaches the end of the array.
Steps:
ASRJC H2 Computing 1
>>> Searching and Sorting Algorithms
Python Implementation of Linear Search
# Example
arr = [5, 9, 0, 1, 3, 6, 7]
result = lin_search(arr, 3)
if result == -1:
print("Target not found")
else:
print("Target is at index", result)
For this algorithm to work, the data collection needs to be in sorted form.
Binary search looks for a target item by comparing the middle element of the array. If a match occurs, the index
is returned. If the middle element is greater, then search in the sub-array to the left of the middle element.
Otherwise, search in the sub-array to the right.
Steps:
1. Define the target value (target) you want to search for. Also, initialise variables for leftmost (left) and
rightmost (right) indices of the search range.
Set left and right to the indices of the first and last elements in the array.
2. Check that left ≤ right. If not, target is not present in array and search is unsuccessful.
if mid_element == target:
return mid # target found
elif mid_element < target:
left = mid + 1 # search right
else:
right = mid - 1 # search left
# Example
arr = [1, 3, 5, 7, 9, 11, 13, 15] # must be a sorted array!
target = 3
result = bin_search(arr, target)
if result == -1:
print("Target not found")
else:
print("Target is at index", result)
ASRJC H2 Computing 3
>>> Searching and Sorting Algorithms
Binary search is a very efficient algorithm, but it requires the input array to be sorted.
For small datasets, difference in efficiencies between linear and binary search may not be significant. However,
if you need to perform multiple searches on the same data and if the dataset is large, the upfront cost of
sorting the data first and then applying binary search may be outweighed by the efficiency gained in
subsequent searches.
It builds the sorted array one element at a time by repeatedly taking the next element and inserting it into its
correct position in the already sorted portion of the array.
Steps:
1. Assume the first element in the array is already sorted. Start from the second element.
2. Compare and insert:
- Compare current element with elements before it, from right to left.
- Keep shifting elements to the right until correct position for the current element is found.
- Insert current element into its correct position in the sorted part of the array.
3. Repeat step 2 for each element until the entire array is sorted.
ASRJC H2 Computing 4
>>> Searching and Sorting Algorithms
def insertion_sort(arr):
# loop from the second element until the last element
for i in range(1, len(arr)):
key_element = arr[i] # element to be inserted
j = i – 1 # initialise variable to be used to identify the correct position
# once done shifting elements, insert key element in its correct position
arr[j + 1] = key_element
# Example
arr = [5, 9, 1, 7, 2, 12, 6, 3]
insertion_sort(arr)
print(arr)
In-place sorting
- It sorts the array by modifying it directly. Therefore, it does not require additional memory allocation
proportional to the input array size.
Stable sorting
- It preserves the relative order of equal elements in the sorted array.
Adaptive
- It performs well on nearly sorted or partially sorted arrays. Therefore it is useful in situations where
elements are continuously added to a sorted list and need to be maintained in a sorted order.
The worst case for insertion sort occurs when the array is initially in reverse order.
ASRJC H2 Computing 5
>>> Searching and Sorting Algorithms
2.2 Bubble Sort (Non-Optimised) Time complexity: O(n2)
Bubble sort is another simple and intuitive algorithm. It repeatedly steps through the array, comparing adjacent
elements and swaps them if they are in the wrong order.
Steps:
The following is an example. Notice that an array with length 5 requires a maximum of 4 passes.
# Example
arr = [5, 9, 1, 7, 2]
bubble_sort(arr)
print(arr)
Thinking point: What is one small change you can make to the code to produce an output in reversed order?
ASRJC H2 Computing 6
>>> Searching and Sorting Algorithms
2.2.1 Bubble Sort (Optimised)
Let’s apply the algorithm to [21, 5, 3, 10, 8, 7, 20, 19, 15]. By adding a print statement in the outer for loop, we
can see the output after each pass:
Pass 1: [5, 3, 10, 8, 7, 20, 19, 15, 21]
Pass 2: [3, 5, 8, 7, 10, 19, 15, 20, 21]
Pass 3: [3, 5, 7, 8, 10, 15, 19, 20, 21]
Pass 4: [3, 5, 7, 8, 10, 15, 19, 20, 21]
Pass 5: [3, 5, 7, 8, 10, 15, 19, 20, 21]
Pass 6: [3, 5, 7, 8, 10, 15, 19, 20, 21]
Pass 7: [3, 5, 7, 8, 10, 15, 19, 20, 21]
Pass 8: [3, 5, 7, 8, 10, 15, 19, 20, 21]
In our 4th pass, no swaps are made. This means that our array is already sorted by then!
Therefore, we may want to keep track of whether any swaps are made. If no swaps are made in a pass, we have
achieved a sorted array and so we can exit our loop early.
def bubble_sort(arr):
n = len(arr)
for i in range(n - 1): # requires max (n - 1) passes
swapped = False # flag to check if any swaps occur
If Cambridge wants you to write the Bubble Sort algorithm, always give the OPTIMISED
version!
ASRJC H2 Computing 7
>>> Searching and Sorting Algorithms
Characteristics of bubble sort
Simplicity
- Bubble sort is straightforward to implement. For small arrays, bubble sort is efficient, compared to
some of the sorting algorithms that use recursion (like quicksort or merge sort) which has overhead.
In-place sorting
- It sorts the array by modifying it directly. No additional memory allocation that is proportional to
the input array size is needed.
Stable sorting
- Preserves the relative order of equal elements in the sorted array.
Similar to insertion sort, the worst case scenario would be when the input array is in reverse order. Maximum
number of comparisons and swaps are required to sort the array.
For a nearly sorted array, insertion sort is still generally more efficient compared to bubble sort. For such a case,
insertion sort’s approach of inserting one element at a time in its correct position is effective and requires fewer
comparisons and swaps.
ASRJC H2 Computing 8
>>> Searching and Sorting Algorithms
2.3 Quicksort Time complexity: O(n2)
Quicksort is a highly efficient and widely-used sorting algorithm based on the divide-and-conquer paradigm.
It divides the array into sub-arrays based on a pivot element and recursively sorts these sub-arrays.
Steps:
# Here, we choose pivot as the middle element, but you can choose others.
pivot = arr[len(arr) // 2]
left = []
middle = []
right = []
# Example
arr = [5, 9, 1, 7, 2, 12, 6, 3]
print(quicksort(arr))
ASRJC H2 Computing 9
>>> Searching and Sorting Algorithms
Remarks:
1. List comprehension could possibly be used to partition the elements:
left = [x for x in arr if x < pivot] # all elements smaller than pivot
middle = [x for x in arr if x == pivot] # all elements equal to pivot
right = [x for x in arr if x > pivot] # all elements greater than pivot
2. While the above implementation is not in-place, it is possible to write code for an in-place quicksort
implementation that does not require the creation of sub-arrays (left, middle, right).
The worst-case scenario for Quicksort occurs when the algorithm consistently choose the smallest or
largest element in the array as the pivot in each step. This results in highly imbalanced partitions, reducing
the efficiency of the algorithm.
Having said that, on average, Quicksort performs at O(n log n) time complexity.
Depending on how the partitioning process is done, the relative order of equal elements may change. This is
especially the case if the partition method is used (refer to the last section). Quicksort focuses on rearranging
elements based on values in relation to the pivot without necessarily preserving the original order of the
elements. Therefore, Quicksort is generally considered not a stable sorting algorithm.
Choice of pivot
As mentioned, the pivot selection strategy is crucial to quicksort’s performance, as it directly relates to how
balanced the partitions will be. Here we discuss the main approaches:
• True median
The median is always the ideal pivot, as it splits the array into perfectly balanced partitions.
However, we are not able to directly identify the exact median of the array, unless we add an additional
layer of work, looping through the array (with O(n) time complexity) to identify it every time we perform
partitioning. Therefore, this approach is impractical for typical cases.
• Random pivot
Selecting a random element as the pivot is a popular strategy. The probability of hitting the worst case
is low due to the randomisation. Therefore, we expect O(n log n) time complexity for any input
distribution.
• Median-of-Three
This method examines the first, middle and last elements of the array, and chooses their median as the
pivot. It’s a good compromise between simplicity and effectiveness, often avoiding the worst case
scenario. Many quicksort implementations use this as their default strategy.
ASRJC H2 Computing 10
>>> Searching and Sorting Algorithms
• Middle element
In the code example that we have for quicksort, we chose the middle element as the pivot. It’s simple
and performs better than choosing the first or last elements on sorted or reverse-sorted arrays.
However, it is still deterministic and predictable. Someone could construct an array where the middle
element is always a poor choice, leading to worst-case behaviour.
Exercise
Construct an array containing 16 distinct integers from 1 to 16 that would lead to maximum operations
(worst-case scenario) using the quicksort algorithm given in the lecture notes (where the middle element is
chosen as the pivot).
Possible answer: [1, 3, 5, 7, 9, 11, 13, 15, 16, 14, 12, 10, 8, 6, 4, 2]
ASRJC H2 Computing 11
>>> Searching and Sorting Algorithms
2.4 Merge Sort Time complexity: O(n log n)
Merge sort is also an efficient algorithm based on the divide-and-conquer paradigm.
Steps:
1. Divide the array into smaller sub-arrays until each sub-array only has one element.
2. Merge the sub-arrays until a single sorted array is obtained.
def merge_sort(arr):
if len(arr) <= 1:
return arr # base case
return result
# Example
arr = [5, 9, 1, 2, 12, 6]
print(merge_sort(arr))
ASRJC H2 Computing 12
>>> Searching and Sorting Algorithms
Remark: Some might write the merge function as follows.
while left and right: # while none of the two lists are empty
if left[0] < right[0]:
[Link]([Link](0))
else:
[Link]([Link](0))
return result
This version of the code, while having the benefit of being more concise, is less efficient.
Each time one element is removed from the front of a list (.pop(0)), all elements to the right have to be
shifted one position to the left. The time complexity for this operation is O(n), where n is the list length.
If the input array is large, this particular implementation would be very slow.
ASRJC H2 Computing 13
>>> Searching and Sorting Algorithms
Section 3: Performance analysis of an algorithm
Time complexity
Time complexity quantifies the time taken by an algorithm in terms of the input size.
Note that it does NOT refer to the actual execution time on a machine.
We commonly consider the worst-case time complexity, and we consider only the asymptotic behaviour.
Therefore, we can use the “big O” notation to represent such worst-case time complexities. It describes an
upper bound on the growth rate of the time relative to the input size.
Below are some common growth rates, ranking from lowest to highest:
O(n log n)
O(n)
O(log n)
O(1)
Input size
Based on this chart, O(1), which stands for constant time complexity, is the best. If you see time complexities of
O(n2), O(2n), O(n!), you know that these are bad for large datasets.
ASRJC H2 Computing 14
>>> Searching and Sorting Algorithms
Let’s look at each complexity with examples.
ASRJC H2 Computing 15
>>> Searching and Sorting Algorithms
Let’s discuss the time complexities of the algorithms that we have discussed in this chapter.
If T(n) is the maximum number of comparisons required by the algorithm for an input size n,
T(n) = n.
If an array has size 8, it requires a maximum of 3 comparisons to either find the target or to determine that it
is absent in the input array.
We start with the second element. It has to be compared with the first element.
The third element has to be compared with the previous two elements.
…
The nth element has to be compared with the previous (n – 1) elements.
𝑛−1 1
T(n) = 1 + 2 + 3 + … + (n – 1) = 2
(1 + (𝑛 − 1)) = (𝑛2 − 𝑛).
2
The number of swaps for this worst-case scenario is the same as the number of comparisons.
For “big O” notation, we only consider the dominant term, which in this case is the quadratic term.
Therefore, it is O(n2).
Total number of swaps is also the same. Therefore, time complexity is also O(n2).
• To mitigate such a worst-case scenario, we may opt for a variation, known as Randomised Quicksort. The
choice of pivot is made randomly, often leading to a more balanced partition, which means better
performance.
Merge Sort Space Complexity: O(n). A temporary array is needed to store the merged elements.
ASRJC H2 Computing 17
>>> Searching and Sorting Algorithms
Section 4: Quicksort continued – the Partition Method
For the quicksort algorithm, we have demonstrated the simple approach by creating two new subarrays to
store values smaller than and greater than the pivot value. This is easy to understand but uses extra memory.
If we want to conserve memory space, we might opt for an in-place partitioning, where elements are
rearranged within the original array.
If we choose the rightmost element (i.e. 5) as the pivot, we hope to rearrange the elements so that we get:
Here we introduce the partition method, with the rightmost element chosen as the pivot:
def partition(arr, left, right):
pivot = arr[right]
i = left - 1
for j in range(left, right):
if arr[j] <= pivot:
i += 1
arr[i], arr[j] = arr[j], arr[i]
# insert pivot
arr[i + 1], arr[right] = arr[right], arr[i + 1]
return i + 1 # returns pivot index
# Test
arr = [9, 3, 7, 1, 2, 8, 6, 5]
quicksort(arr, 0, len(arr) - 1)
print("Sorted array: ", arr)
Here, we shall use index i to mark the last position of the “≤ pivot” zone, while j will scan from left to right – 1.
Refer to the walkthrough on the next page to understand how this process works.
ASRJC H2 Computing 18
>>> Searching and Sorting Algorithms
def partition(arr, left, right):
pivot = arr[right]
i = left - 1
for j in range(left, right):
if arr[j] <= pivot:
i += 1
arr[i], arr[j] = arr[j], arr[i]
# insert pivot
arr[i + 1], arr[right] = arr[right], arr[i + 1]
return i + 1 # returns pivot index
pivot = arr[7] = 5
i = left – 1 = –1
j=0
Currently, j = 0.
Perform comparison:
Is 9 ≤ 5? No.
Do NOT increment i and
do NOT swap
Proceed with j = 1.
Currently, j = 1.
Perform comparison:
Is 3 ≤ 5? Yes!
We found an element ≤ pivot.
Increment i, then
swap arr[i] with arr[j]
Proceed with j = 2.
Currently, j = 2.
Perform comparison:
Is 7 ≤ 5? No.
Do NOT increment i and
do NOT swap
Proceed with j = 3.
ASRJC H2 Computing 19
>>> Searching and Sorting Algorithms
Currently, j = 3.
Perform comparison:
Is 1 ≤ 5? Yes!
We found an element ≤ pivot.
Increment i, then
swap arr[i] with arr[j]
Proceed with j = 4.
Currently, j = 4.
Perform comparison:
Is 2 ≤ 5? Yes!
We found an element ≤ pivot.
Increment i, then
swap arr[i] with arr[j]
Proceed with j = 5.
Currently, j = 5.
Perform comparison:
Is 8 ≤ 5? No.
Do NOT increment i and
do NOT swap
Proceed with j = 6.
ASRJC H2 Computing 20
>>> Searching and Sorting Algorithms
Quick exercises
1. For the previous example, show the state of the array for each of the subsequent steps, until the array is
completely sorted using quicksort.
3. The following shows the partition method for the case where the first element is selected as the pivot:
# insert pivot
i -= 1
arr[left], arr[i] = arr[i], arr[left]
return i
• The partition operation is not stable – it does not preserve the relative order of equal elements.
ASRJC H2 Computing 21
>>> Searching and Sorting Algorithms
Recap
• Searching Algorithms
Linear def lin_search(arr, target):
for i in range(len(arr)):
Search
if arr[i] == target:
O(n) return i
return -1
if mid_element == target:
return mid # target found
elif mid_element < target:
return bin_search(arr, target, mid + 1, right) # search right
else:
return bin_search(arr, target, left, mid - 1) # search left
# Test
bin_search([2, 7, 12, 15, 16, 19], 16, 0, 5)
- Data structures such as binary search trees (BST) and hash tables can also support efficient
searching.
ASRJC H2 Computing 22
>>> Searching and Sorting Algorithms
• Sorting Algorithms
Insertion sort def insertion_sort(arr):
# loop from the second element until the last element
O(n2)
for i in range(1, len(arr)):
key_element = arr[i] # element to be inserted
- In-place j = i – 1
- Stable sort
- Adaptive while j >= 0 and arr[j] > key_element:
arr[j + 1] = arr[j] # shift the element to the right
j -= 1
left = []
middle = []
right = []
ASRJC H2 Computing 23
>>> Searching and Sorting Algorithms
Merge sort def merge_sort(arr):
# base case
O(n log n)
if len(arr) <= 1:
return arr
- Stable sort
- NOT in-place # divide array into 2 halves
- NOT mid = len(arr) // 2
left = arr[:mid]
adaptive right = arr[mid:]
return result
ASRJC H2 Computing 24