0% found this document useful (0 votes)
6 views70 pages

Searching and Sorting

The document discusses various sorting and searching algorithms, including Insertion Sort, Binary Search, and Quick Sort, detailing their strategies, correctness, and running times. It highlights the best, worst, and average case scenarios for these algorithms, emphasizing the importance of understanding their efficiency in different contexts. Additionally, it touches on the significance of sorting in algorithm design and its applications in searching and other computational problems.

Uploaded by

yikivob244
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views70 pages

Searching and Sorting

The document discusses various sorting and searching algorithms, including Insertion Sort, Binary Search, and Quick Sort, detailing their strategies, correctness, and running times. It highlights the best, worst, and average case scenarios for these algorithms, emphasizing the importance of understanding their efficiency in different contexts. Additionally, it touches on the significance of sorting in algorithm design and its applications in searching and other computational problems.

Uploaded by

yikivob244
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Example: Sorting

INPUT OUTPUT
sequence of numbers a permutation of the
sequence of numbers

a1, a2, a3,….,an b1,b2,b3,….,bn


Sort
2 5 4 10 7 2 4 5 7 10

Correctness
Correctness(requirements
(requirementsfor forthe
the Running
Runningtime
time
output)
output) Depends
Dependson on
For
Forany
anygiven
giveninputinputthe
thealgorithm
algorithm ••number
numberof ofelements
elements(n)(n)
halts
haltswith
withthetheoutput:
output: ••how
how(partially)
(partially)sorted
sorted
••bb1 <<bb2 <<bb3 <<…. ….<< bbnn they
theyare
are
1 2 3
••bb1,,bb2,,bb3,,…., b is a ••algorithm
1 2 3 …., bnn is a algorithm
permutation
permutationof ofaa1,,aa2,,aa3,….,a
,….,an
1 2 3 n
Insertion Sort
A 3 4 6 8 9 7 2 5 1
1 j n
i

Strategy
Strategy INPUT:
INPUT:A[1..n]
A[1..n]––an
anarray
arrayof
ofintegers
integers
OUTPUT:
OUTPUT:aapermutation
permutationofofAAsuch
suchthat
that
••Start
Start“empty
“emptyhanded”
handed” A[1]≤ A[2]≤ …≤A[n]
A[1]≤ A[2]≤ …≤A[n]
••Insert
Insertaacard
cardininthe
theright
right
position
positionof
ofthe
thealready
alreadysorted
sorted for
for j←2
j←2 to to nn dodo
hand
hand key ←
key ←A[j]
A[j]
••Continue
Continueuntil
untilall
allcards
cardsare
are Insert
InsertA[j]
A[j]into
intothe
thesorted
sortedsequence
sequence
inserted/sorted
inserted/sorted A[1..j-1]
A[1..j-1]
i←j-1
i←j-1
while
while i>0 i>0 andand A[i]>key
A[i]>key
do
do A[i+1]←A[i]
A[i+1]←A[i]
i--
i--
A[i+1]←key
A[i+1]←key
Analysis of Insertion Sort
cost times
for j←2 to n do c1 n
key←A[j] c2 n-1
Insert A[j] into the sorted 0 n-1
sequence A[1..j-1]
i←j-1 c3 n-1

n
c4 t
while i>0 and A[i]>key j =2 j


n
c5 (t j − 1)
do A[i+1]←A[i] j =2


n
i-- c6 j =2
(t j − 1)

A[i+1] Ã key c7 n-1

Total time = n(c1+c2+c3+c7) + ∑nj=2 tj (c4+c5+c6)


– (c2+c3+c5+c6+c7)
Best/Worst/Average Case
Total time = n(c1+c2+c3+c7) + ∑nj=2 tj (c4+c5+c6)
– (c2+c3+c5+c6+c7)

Best case: elements already sorted; tj=1,


running time = f(n), i.e., linear time.
Worst case: elements are sorted in
inverse order; tj=j, running time = f(n2), i.e.,
quadratic time
Average case: tj=j/2, running time = f(n2),
i.e., quadratic time
Best/Worst/Average Case (2)
For a specific size of input n, investigate
running times for different input instances:
Best/Worst/Average Case (3)
For inputs of all sizes:

worst-case

6n average-case
Running time

5n
best-case
4n

3n

2n

1n

1 2 3 4 5 6 7 8 9 10 11 12 …..
Input instance size
Best/Worst/Average Case (4)
Worst case is usually used: It is an upper-
bound and in certain application domains (e.g.,
air traffic control, surgery) knowing the worst-
case time complexity is of crucial importance
For some algorithms worst case occurs fairly
often
Average case is often as bad as the worst
case
Finding average case can be very difficult
Asymptotic Analysis
Goal: to simplify analysis of running time by
getting rid of ”details”, which may be affected by
specific implementation and hardware
like “rounding”: 1,000,001 ≈ 1,000,000
3n2 ≈ n2
Capturing the essence: how the running time of
an algorithm increases with the size of the input
in the limit.
Asymptotically more efficient algorithms are best for
all but small inputs
Searching
INPUT OUTPUT
• sequence of numbers • index of the found
(database) number or NIL
• a single number (query)

a1, a2, a3,….,an; q j

2 5 4 10 7; 5 2

2 5 4 10 7; 9 NIL
Binary Search
Idea: Divide and conquer, a key design technique
narrow down the search range in stages
findElement(22)
A recursive procedure
Algorithm BinarySearch(A, k, low, high)
if low > high then return Nil
else mid ← (low+high) / 2
if k = A[mid] then return mid
elseif k < A[mid] then
return BinarySearch(A, k, low, mid-1)
else return BinarySearch(A, k, mid+1, high)
2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37

low mid high

2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37

low mid high

2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37

low mid
An iterative procedure
INPUT: A[1..n] – a sorted (non-decreasing) array of
integers, key – an integer.
OUTPUT: an index j such that A[j] = k.
NIL, if ∀j (1 ≤ j ≤ n): A[j] ≠ k
2 4 5 7 8 9 12 17 19 22 25 27 28 33 37

low ←1
14

low mid high

high ← n 2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37

do 2 4 5 7 8 9 12 14
low

17 19 22
mid

25 27 28 33
high

37

mid ← (low+high)/2 low mid

if A[mid] = k then return mid


else if A[mid] > k then high ← mid-1
else low ← mid+1
while low <= high
return NIL
Running Time of Binary Search
The range of candidate items to be searched is
halved after each comparison

In the array-based implementation, access by


rank takes O(1) time, thus binary search runs in
O(log n) time
Searching in an unsorted array
INPUT: A[1..n] – an array of integers, q – an integer.
OUTPUT: an index j such that A[j] = q. NIL, if ∀j (1≤j≤n):
A[j] ≠ q
j←1
while j ≤ n and A[j] ≠ q
do j++
if j ≤ n then return j
else return NIL

Worst-case running time: O(n), average-case:


O(n)
We can’t do better. This is a lower bound for the
problem of searching in an arbitrary sequence.
The Problem

T&T is a large phone company, and they


want to provide caller ID capability:
given a phone number, return the caller’s
name
phone numbers range from 0 to r = 108 -1
There are n phone numbers, n << r.
want to do this as efficiently as possible
Using an unordered sequence

searching and removing takes O(n) time


inserting takes O(1) time
applications to log files (frequent
insertions, rare searches and removals)
Using an ordered sequence

searching takes O(log n) time (binary


search)
inserting and removing takes O(n) time
application to look-up tables (frequent
searches, rare insertions and removals)
Other Suboptimal ways
direct addressing: an array indexed by key:
takes O(1) time,
O(r) space where r is the range of numbers
(108)
huge amount of wasted space

(null) (null) Ankur (null) (null)


0000-0000 0000-0000 9635-8904 0000-0000 0000-0000
Binary Search
Narrow down the search range in stages
findElement(22)

3
Running Time

The range of candidate items to be searched is


halved after comparing the key with the middle
element
Binary search runs in O(lg n) time (remember
recurrence...)
What about insertion and deletion?

4
Quick Sort

Characteristics
sorts almost in "place," i.e., does not require
an additional array
very practical, average sort performance O(n
log n) (with small constant factors), but worst
case O(n2)

1
Quick Sort – the Principle

To understand quick-sort, let’s look at a


high-level description of the algorithm
A divide-and-conquer algorithm
Divide: partition array into 2 subarrays such
that elements in the lower part <= elements in
the higher part
Conquer: recursively sort the 2 subarrays
Combine: trivial since sorting is done in place

2
Partitioning
Linear time partitioning procedure
Partition(A,p,r) i i j j
01 x←A[r] 17 12 6 19 23 8 5 10
02 i←p-1
03 j←r+1 ≤ X=10 ≤ i j
04 while TRUE
10 12 6 19 23 8 5 17
05 repeat j←j-1
06 until A[j] ≤x
i j
07 repeat i←i+1
08 until A[i] ≥x 10 5 6 19 23 8 12 17
09 if i<j
10 then exchange A[i]↔A[j] j i
11 else return j
10 5 6 8 23 19 12 17

3
Quick Sort Algorithm

Initial call Quicksort(A, 1, length[A])


Quicksort(A,p,r)
01 if p<r
02 then q ←Partition(A,p,r)
03 Quicksort(A,p,q)
04 Quicksort(A,q+1,r)

4
Analysis of Quicksort

Assume that all input elements are distinct


The running time depends on the
distribution of splits

5
Best Case
If we are lucky, Partition splits the array evenly
T (n) = 2T (n / 2) + Θ(n)

6
Worst Case
What is the worst case?
One side of the parition has only one element
T (n) = T (1) + T (n − 1) + Θ(n)
= T (n − 1) + Θ(n)
n
= ∑ Θ( k )
k =1
n
= Θ( ∑ k )
k =1

= Θ( n 2 )

7
Worst Case (2)

8
Worst Case (3)
When does the worst case appear?
input is sorted
input reverse sorted
Same recurrence for the worst case of
insertion sort
However, sorted input yields the best case
for insertion sort!

9
Analysis of Quicksort
Suppose the split is 1/10 : 9/10
T (n) = T (n /10) + T (9n /10) + Θ(n) = Θ(n log n)!

10
An Average Case Scenario
Suppose, we alternate L(n) = 2U (n / 2) + Θ(n) lucky
lucky and unlucky U (n) = L(n − 1) + Θ(n) unlucky
cases to get an we consequently get
average behavior L(n) = 2( L(n / 2 − 1) + Θ(n / 2)) + Θ(n)
= 2 L(n / 2 − 1) + Θ (n)
n Θ(n) = Θ(n log n)

n Θ( n )
1 n-1

(n-1)/2 (n-1)/2 (n-1)/2+1 (n-1)/2


11
An Average Case Scenario (2)
How can we make sure that we are usually
lucky?
Partition around the ”middle” (n/2th) element?
Partition around a random element (works well in
practice)
Randomized algorithm
running time is independent of the input ordering
no specific input triggers worst-case behavior
the worst-case is only determined by the output of the
random-number generator

12
Why Sorting?
“When in doubt, sort” – one of the
principles of algorithm design. Sorting
used as a subroutine in many of the
algorithms:
Searching in databases: we can do binary
search on sorted data
A large number of computer graphics and
computational geometry problems
Closest pair, element uniqueness
1
Why Sorting? (2)

A large number of sorting algorithms are


developed representing different algorithm
design techniques.
A lower bound for sorting Ω(n log n) is
used to prove lower bounds of other
problems

2
Sorting Algorithms so far

Insertion sort, selection sort


Worst-case running time Θ(n2); in-place
Heap sort
Worst-case running time Θ(n log n).

3
Divide and Conquer

Divide-and-conquer method for algorithm


design:
Divide: if the input size is too large to deal with in a
straightforward manner, divide the problem into two or
more disjoint subproblems
Conquer: use divide and conquer recursively to solve
the subproblems
Combine: take the solutions to the subproblems and
“merge” these solutions into a solution for the original
problem

4
Merge Sort Algorithm
Divide: If S has at least two elements (nothing
needs to be done if S has zero or one elements),
remove all the elements from S and put them
into two sequences, S1 and S2 , each containing
about half of the elements of S. (i.e. S1 contains
the first ⎡n/2⎤ elements and S2 contains the
remaining ⎣n/2⎦ elements).
Conquer: Sort sequences S1 and S2 using
Merge Sort.
Combine: Put back the elements into S by
merging the sorted sequences S1 and S2 into
one sorted sequence
5
Merge Sort: Algorithm
Merge-Sort(A,
Merge-Sort(A, p, p, r)
r)
if
if pp << rr then
then
q←(p+r)/2
q←(p+r)/2
Merge-Sort(A,
Merge-Sort(A, p, p, q)
q)
Merge-Sort(A,
Merge-Sort(A, q+1,q+1, r)
r)
Merge(A,
Merge(A, p, p, q,
q, r)
r)

Merge(A,
Merge(A, p,
p, q,
q, r)
r)
Take
Take the
the smallest
smallest ofof the
the two
two topmost
topmost elements
elements of
of
sequences
sequences A[p..q]
A[p..q] and
and A[q+1..r]
A[q+1..r] andand put
put into
into the
the
resulting
resulting sequence.
sequence. Repeat
Repeat this,
this, until
until both
both sequences
sequences
are
are empty.
empty. Copy
Copy the
the resulting
resulting sequence
sequence into
into A[p..r].
A[p..r].

6
MergeSort (Example)

7
MergeSort (Example)

8
MergeSort (Example)

9
MergeSort (Example)

10
MergeSort (Example)

11
MergeSort (Example)

12
MergeSort (Example)

13
MergeSort (Example)

14
MergeSort (Example)

15
MergeSort (Example)

16
MergeSort (Example)

17
MergeSort (Example)

18
MergeSort (Example)

19
MergeSort (Example)

20
MergeSort (Example)

21
MergeSort (Example)

22
MergeSort (Example)

23
MergeSort (Example)

24
MergeSort (Example)

25
MergeSort (Example)

26
MergeSort (Example)

27
MergeSort (Example)

28
Merging Two Sequences (cont.)

29
Merging Two Sequences (cont.)

30
Merging Two Sequences (cont.)

31
Merging Two Sequences (cont.)

32
Merging Two Sequences (cont.)

33
Merge Sort Revisited
To sort n numbers
if n=1 done!
recursively sort 2 lists of
numbers ⎣n/2⎦ and ⎡n/2⎤
elements
merge 2 sorted lists in Θ(n)
time
Strategy
break problem into similar
(smaller) subproblems
recursively solve
subproblems
combine solutions to answer
34
Recurrences
Running times of algorithms with Recursive
calls can be described using recurrences
A recurrence is an equation or inequality that
describes a function in terms of its value on
smaller inputs
⎧ solving_trivial_problem if n = 1
T ( n) = ⎨
⎩num_pieces T (n / subproblem_size_factor) + dividing + combining if n > 1

Example: Merge Sort

⎧ Θ(1) if n = 1
T ( n) = ⎨
⎩2T (n / 2) + Θ(n) if n > 1 35
Solving Recurrences
Repeated substitution method
Expanding the recurrence by substitution and noticing
patterns
Substitution method
guessing the solutions
verifying the solution by the mathematical induction
Recursion-trees
Master method
templates for different classes of recurrences

36
Repeated Substitution Method
Let’s find the running time of merge sort (let’s
assume that n=2b, for some b).
⎧ 1 if n = 1
T (n) = ⎨
⎩ 2T ( n / 2) + n if n > 1

T (n) = 2T ( n / 2 ) + n substitute
= 2 ( 2T ( n / 4 ) + n / 2 ) + n expand
= 22 T (n / 4) + 2n substitute
= 22 (2T (n / 8) + n / 4) + 2n expand
= 23 T (n / 8) + 3n observe the pattern
T (n) = 2i T (n / 2i ) + in
= 2lg n T (n / n) + n lg n = n + n lg n
37
Repeated Substitution Method
The procedure is straightforward:
Substitute
Expand
Substitute
Expand

Observe a pattern and write how your expression
looks after the i-th substitution
Find out what the value of i (e.g., lgn) should be to get
the base case of the recurrence (say T(1))
Insert the value of T(1) and the expression of i into
your expression
38

You might also like