0% found this document useful (0 votes)
5 views63 pages

Heap Sort and Binary Tree Concepts

Uploaded by

anudeep1471999
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views63 pages

Heap Sort and Binary Tree Concepts

Uploaded by

anudeep1471999
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CS-204

Sorting
Heap Sort
Special Types of Trees
• Def: Full binary tree = a 4

binary tree in which each node


1 3
is either a leaf or has degree
2 16 9 10
exactly 2.
14 8 7
12
Full binary tree

• Def: Complete binary tree =


4

a binary tree in which all 1 3


leaves are on the same level 2 16 9 10
and all internal nodes have
degree 2. Complete binary tree

3
Almost Complete Binary Tree
• Canonical labeling of nodes: Label the Nodes in the level-
wise fashion from left to right, as shown below

• Almost Complete Binary Tree: A binary tree made up of the


first n nodes of a canonically labeled complete Binary Tree is
called Almost Complete Binary Tree.
Definitions
• Height of a node = the number of edges on the longest
simple path from the node down to a leaf
• Level of a node = the length of a path from the root to the
node
• Height of tree = height of root node

4 Height of root = 3

1 3
Height of (2)= 1 2 16 9 10 Level of (10)= 2
14 8

5
Useful Properties
 There are at most 2l nodes at level (or depth) l of a
binary tree
 A binary tree with maximum level d has at most 2d+1 -
1 nodes

d 1
d
2 1
n   2l   2d 1  1 4 Height of root = 3
l 0 2 1
1 3
Height of (2)= 1 2 16 9 10 Level of (10)= 2
14 8

6
The Heap Data Structure
• Def: A heap is an almost complete binary tree with the
following two properties:
– Structural property: all levels are full, except possibly the
last one, which is filled from left to right
– Max heap property: for any node x, Parent(x) ≥ x

8 From the heap property, it follows


that: The root is the maximum
7 4 element of the mx-heap
5 2

Max Heap

7
Array Representation of Heaps
• A heap can be stored as an array
A.
– Root of tree is A[1]
– Left child of A[i] = A[2i]
– Right child of A[i] = A[2i + 1]
– Parent of A[i] = A[ i/2 ]
– Heapsize[A] ≤ length[A]
• The elements in the subarray
A[(n/2+1) .. n] are leaves

8
Heap Types
• Max-heaps (largest element at root), have the
max-heap property:
– for all nodes i, excluding the root:
A[PARENT(i)] ≥ A[i]

• Min-heaps (smallest element at root), have the


min-heap property:
– for all nodes i, excluding the root:
A[PARENT(i)] ≤ A[i]

9
Adding/Deleting Nodes
• New nodes are always inserted at the bottom
level (left to right)
• Nodes are removed from the bottom level
(right to left)

10
Operations on Heaps
• Maintain/Restore the max-heap property
– MAX-HEAPIFY
• Create a max-heap from an unordered array
– BUILD-MAX-HEAP
• Sort an array in place
– HEAPSORT

11
Maintaining the Heap Property
• Suppose a node is smaller than a child
– Left and Right subtrees of i are max-heaps
• To eliminate the violation:
– Exchange with larger child
– Move down the tree
– Continue until node is not smaller than
children

12
Example
MAX-HEAPIFY(A, 2, 10)

A[2]  A[4]

A[2] violates the heap property A[4] violates the heap property

A[4]  A[9]

Heap property restored


13
Maintaining the Heap Property
• Assumptions: Alg: MAX-HEAPIFY(A, i, n)
– Left and Right 1. l ← LEFT(i)
subtrees of i 2. r ← RIGHT(i)
are max-heaps 3. if l ≤ n and A[l] > A[i]
– A[i] may be 4. then largest ←l
smaller than 5. else largest ←i
its children 6. if r ≤ n and A[r] > A[largest]
7. then largest ←r
8. if largest  i
9. then exchange A[i] ↔ A[largest]
10. MAX-HEAPIFY(A, largest, n)

14
MAX-HEAPIFY Running Time
• It checks a path starting from current node to leaf node. At
every level it performs exactly 2 comparisons. At max
length of this path is h. So total number of comparisons is
at most 2h. So complexity is O(h) or O(logn)

• Running time of MAX-HEAPIFY is O(lgn)

15
Building a Max-Heap

2 3

1 3
4 5 6 7

2 16 9 10
8 9 10

14 8 7
Building a Heap
• Convert an array A[1 … n] into a max-heap (n = length[A])
• The elements in the subarray A[(n/2+1) .. n] are leaves
• Apply MAX-HEAPIFY on elements between 1 and n/2

Alg: BUILD-MAX-HEAP(A) 1

1. n = length[A] 4

for i ← n/2 downto 1


2 3
2. 1 3
4 5 6 7

3. do MAX-HEAPIFY(A, i, n) 8
2 9 10
16 9 10
14 8 7

A: 4 1 3 2 16 9 10 14 8 7

17
Example: A 4 1 3 2 16 9 10 14 8 7

i=5 i=4 i=3


1 1 1

4 4 4
2 3 2 3 2 3

1 3 1 3 1 3
4 5 6 7 4 5 6 7 4 5 6 7

8
2 9 10
16 9 10 8
2 9 10
16 9 10 8
14 9 10
16 9 10
14 8 7 14 8 7 2 8 7

i=2 i=1
1 1 1

4 4 16
2 3 2 3 2 3

1 10 16 10 14 10
4 5 6 7 4 5 6 7 4 5 6 7

8
14 9 10
16 9 3 8
14 9 10
7 9 3 8
8 9 10
7 9 3
2 8 7 2 8 1 2 4 1
18
Running Time of BUILD MAX HEAP
Alg: BUILD-MAX-HEAP(A)
1. n = length[A]
2. for i ← n/2 downto 1
O(n)
3. do MAX-HEAPIFY(A, i, n) O(lgn)

 Running time: O(nlgn)


• This is not an asymptotically tight upper
bound

19
Running Time of BUILD MAX HEAP
• HEAPIFY takes O(h)  the cost of HEAPIFY on a node i is
proportional to the height of the node i in the tree
h
 T (n)   ni hi   2i h  i   O(n)
h

Height i 0 i 0 Level No. of nodes


h0 = 3 (lgn) i=0 20

h1 = 2 i=1 21

h2 = 1 i=2 22

h3 = 0 i = 3 (lgn) 23

hi = h – i height of the heap rooted at level i


ni = 2i number of nodes at level i
20
Running Time of BUILD MAX HEAP
h
T (n)   ni hi Cost of HEAPIFY at level i  number of nodes at that level
i 0
h
  2i h  i  Replace the values of ni and hi computed before
i 0
h
hi h
 h i
2 Multiply by 2h-i both at the nominator and denominator
i 0 2
h
k
2  k
h
Change variables: k = h - i
k 0 2

k
 n k
The sum above is smaller than the sum of all elements to 
k 0 2
 O(n) The sum above is smaller than 2

Running time of BUILD-MAX-HEAP: T(n) = O(n)


21
Heapsort
• Goal:
– Sort an array using heap representations

• Idea:
– Build a max-heap from the array
– Swap the root (the maximum element) with the last
element in the array
– “Discard” this last node by decreasing the heap size
– Call MAX-HEAPIFY on the new root
– Repeat this process until only one node remains
22
Example: A=[7, 4, 3, 1, 2]

MAX-HEAPIFY(A, 1, 4) MAX-HEAPIFY(A, 1, 3) MAX-HEAPIFY(A, 1, 2)

MAX-HEAPIFY(A, 1, 1)

23
Alg: HEAPSORT(A)

1. BUILD-MAX-HEAP(A) O(n)
2. for i ← length[A] downto 2
3. do exchange A[1] ↔ A[i] n-1 times

4. MAX-HEAPIFY(A, 1, i - 1) O(lgn)

• Running time: O(nlgn) --- Can


be shown to be Θ(nlgn)
24
Priority Queues

12 4

25
Operations
on Priority Queues
• Max-priority queues support the following
operations:
– INSERT(S, x): inserts element x into set S
– EXTRACT-MAX(S): removes and returns element of S
with largest key
– MAXIMUM(S): returns element of S with largest key
– INCREASE-KEY(S, x, k): increases value of element x’s
key to k (Assume k ≥ x’s current key value)
26
HEAP-MAXIMUM
Goal:
– Return the largest element of the heap

Running time: O(1)


Alg: HEAP-MAXIMUM(A)
1. return A[1]
Heap A:

Heap-Maximum(A) returns 7
27
HEAP-EXTRACT-MAX
Goal:
– Extract the largest element of the heap (i.e., return the max value
and also remove that element from the heap
Idea:
– Exchange the root element with the last
– Decrease the size of the heap by 1 element
– Call MAX-HEAPIFY on the new root, on a heap of size n-1

Heap A: Root is the largest element

28
Example: HEAP-EXTRACT-MAX
16 1

14 10 max = 16 14 10
8 7 9 3 8 7 9 3
2 4 1 2 4
Heap size decreased with 1

14

Call MAX-HEAPIFY(A, 1, n-1)


8 10
4 7 9 3
2 1

29
HEAP-EXTRACT-MAX
Alg: HEAP-EXTRACT-MAX(A, n)

1. if n < 1

2. then error “heap underflow”

3. max ← A[1]

4. A[1] ← A[n]

5. MAX-HEAPIFY(A, 1, n-1) remakes heap

6. return max
Running time: O(lgn)
30
HEAP-INCREASE-KEY
• Goal:
– Increases the key of an element i in the heap
• Idea:
– Increment the key of A[i] to its new value
– If the max-heap property does not hold anymore: traverse
a path toward the root to find the proper place for the
newly increased key
16

14 10
8 i 7 9 3
Key [i] ← 15 2 4 1

31
Example: HEAP-INCREASE-KEY
16 16

14 10 14 10
8 i 7 9 3 8 i 7 9 3
2 4 1 2 15 1

Key [i ] ← 15

16 16
i
14 10 15 10
i
15 7 9 3 14 7 9 3
2 8 1 2 8 1

32
HEAP-INCREASE-KEY
Alg: HEAP-INCREASE-KEY(A, i, key)

1. if key < A[i]


2. then error “new key is smaller than current key”
3. A[i] ← key
4. while i > 1 and A[PARENT(i)] < A[i] 16

5. do exchange A[i] ↔ A[PARENT(i)]


14 10
6. i ← PARENT(i)
8 i 7 9 3
2 4 1
• Running time: O(lgn) Key [i] ← 15

33
MAX-HEAP-INSERT
• Goal:
16
– Inserts a new element into a
14 10
max-heap
8 7 9 3
• Idea: 2 4 1 -

– Expand the max-heap with a 16


new element whose key is -
14 10
– Calls HEAP-INCREASE-KEY to set
8 7 9 3
the key of the new node to its 2 4 1 15
correct value and maintain the
max-heap property
34
Example: MAX-HEAP-INSERT
Insert value 15: Increase the key to 15
- Start by inserting - Call HEAP-INCREASE-KEY on A[11] = 15
16 16

14 10 14 10
8 7 9 3 8 7 9 3
2 4 1 - 2 4 1 15

The restored heap containing


the newly added element

16 16

14 10 15 10

8 15 9 3 8 14 9 3

2 4 1 7 2 4 1 7
35
MAX-HEAP-INSERT
16
Alg: MAX-HEAP-INSERT(A, key, n)
14 10
1. heap-size[A] ← n + 1 8 7 9 3
2 4 1 -
2. A[n + 1] ← -

3. HEAP-INCREASE-KEY(A, n + 1, key)
Running time: O(lgn)

36
Summary
• We can perform the following operations on heaps:
– MAX-HEAPIFY O(lgn)
– BUILD-MAX-HEAP O(n)
– HEAP-SORT O(nlgn)
– MAX-HEAP-INSERT O(lgn)
– HEAP-EXTRACT-MAX O(lgn)
– HEAP-INCREASE-KEY O(lgn)
Average
– HEAP-MAXIMUM O(1) O(lgn)

37
Quick Sort
QuickSort Design
• Follows the divide-and-conquer paradigm.
• Divide: Partition (separate) the array A[p..r] into two
(possibly empty) subarrays A[p..q–1] and A[q+1..r].
– Each element in A[p..q–1] <= A[q].
– A[q] < each element in A[q+1..r].
– Index q is computed as part of the partitioning procedure.
• Conquer: Sort the two subarrays by recursive calls to
quicksort.
• Combine: The subarrays are sorted in place – no work is
needed to combine them.
• How do the divide and conquer steps of quicksort
compare with those of merge sort?
Pseudocode
Quicksort(A, p, r)
if p < r then Partition(A, p, r)
q := Partition(A, p, r); x:= A[r]
Quicksort(A, p, q – 1); i=p – 1;
Quicksort(A, q + 1, r) for j := p to r – 1 do
if A[j]  x then
A[p..r] i := i + 1;
A[i]  A[j]
5 A[i + 1]  A[r];
return i + 1
A[p..q – 1] A[q+1..r]
Partition 0 1 2 3 4 5 6 7
5 Z Y X

5 >5

i j
Example
Position of I and j after of line 3
p r
initially: 2 5 8 3 9 4 1 7 10 6 note: pivot (x) = 6
i j
Partition(A, p, r)
next iteration: 2 5 8 3 9 4 1 7 10 6 1 x:= A[r]
i j 2 i=p – 1;
3 for j := p to r – 1 do
next iteration: 2 5 8 3 9 4 1 7 10 6 4 if A[j]  x then
i j 5 i := i + 1;
6 A[i]  A[j]
next iteration: 2 5 8 3 9 4 1 7 10 6
7 A[i + 1]  A[r];
i j
8 return i + 1
next iteration: 2 5 3 8 9 4 1 7 10 6
i j
Example (Continued)
next iteration: 2 5 3 8 9 4 1 7 10 6
i j

next iteration: 2 5 3 4 9 8 1 7 10 6
Partition(A, p, r)
i j
x:= A[r]
next iteration: 2 5 3 4 1 8 9 7 10 6
i=p – 1;
i j
for j := p to r – 1 do
next iteration: 2 5 3 4 1 8 9 7 10 6
if A[j]  x then
i j
i := i + 1;
after final swap: 2 5 3 4 1 6 9 7 10 8
A[i]  A[j]
i j
A[i + 1]  A[r];
return i + 1
Partitioning
• Select the last element A[r] in the subarray A[p..r] as
the pivot – the element around which to partition.
• As the procedure executes, the array is partitioned into
four (possibly empty) regions.
1. A[p..i ] — All entries in this region are <= pivot.
2. A[i+1..j – 1] — All entries in this region are > pivot.
3. A[r] = pivot.
4. A[j..r – 1] — Not known how they compare to pivot.
• The above hold before each iteration of the for loop,
and constitute a loop invariant.
Correctness of Partition
• Use loop invariant.
• Initialization:
– Before first iteration
• A[p..i] and A[i+1..j – 1] are empty – Conds. 1 and 2 are satisfied
(trivially).
• r is the index of the pivot Partition(A, p, r)
– Cond. 3 is satisfied.
x:= A[r]
• Maintenance: i=p – 1;
– Case 1: A[j] > x for j := p to r – 1 do
if A[j]  x then
• Increment j only.
i := i + 1;
• Loop Invariant is maintained. A[i]  A[j]
A[i + 1]  A[r];
return i + 1
Correctness of Partition
Case 1:

p i j r
>x x

x >x
p i j r
x

x >x
Correctness of Partition
• Case 2: A[j]  x – Increment j
– Increment i • Condition 2 is maintained.

– Swap A[i] and A[j] – A[r] is unaltered.


• Condition 1 is maintained. • Condition 3 is maintained.

p i j r
x x

x >x
p i j r
x

x >x
Correctness of Partition
• Termination:
– When the loop terminates, j = r, so all elements in A
are partitioned into one of the three cases:
• A[p..i]  pivot
• A[i+1..j – 1] > pivot
• A[r] = pivot
• The last two lines swap A[i+1] and A[r].
– Pivot moves from the end of the array to between the
two subarrays.
– Thus, procedure partition correctly performs the
divide step.
Quicksort Analysis
• Assume that keys are random, uniformly
distributed.
• What is best case running time?
– Recursion:
1. Partition splits array in two sub-arrays of size n/2
2. Quicksort each sub-array
Quicksort Analysis
• Assume that keys are random, uniformly
distributed.
• What is best case running time?
– Recursion:
1. Partition splits array in two sub-arrays of size n/2
2. Quicksort each sub-array
– Depth of recursion tree?
Quicksort Analysis
• Assume that keys are random, uniformly
distributed.
• What is best case running time?
– Recursion:
1. Partition splits array in two sub-arrays of size n/2
2. Quicksort each sub-array
– Depth of recursion tree? O(log2n)
Quicksort Analysis
• Assume that keys are random, uniformly
distributed.
• What is best case running time?
– Recursion:
1. Partition splits array in two sub-arrays of size n/2
2. Quicksort each sub-array
– Depth of recursion tree? O(log2n)
– Number of accesses in partition?
Quicksort Analysis
• Assume that keys are random, uniformly
distributed.
• What is best case running time?
– Recursion:
1. Partition splits array in two sub-arrays of size n/2
2. Quicksort each sub-array
– Depth of recursion tree? O(log2n)
– Number of accesses in partition? O(n)
Quicksort Analysis
• Assume that keys are random, uniformly
distributed.
• Recurrence relation T(n)=2T(n/2)+(n)
• Running time: O(n log2n)
Quicksort Analysis
• Assume that keys are random, uniformly
distributed.
• Best case running time: O(n log2n)
• Worst case running time?
– Recursion:
1. Partition splits array in two sub-arrays:
• one sub-array of size 0
• the other sub-array of size n-1
2. Quicksort each sub-array
– Depth of recursion tree?
Quicksort Analysis
• Assume that keys are random, uniformly
distributed.
• Best case running time: O(n log2n)
• Worst case running time?
– Recursion:
1. Partition splits array in two sub-arrays:
• one sub-array of size 0
• the other sub-array of size n-1
2. Quicksort each sub-array
– Depth of recursion tree? O(n)
Quicksort Analysis
• Assume that keys are random, uniformly
distributed.
• Best case running time: O(n log2n)
• Worst case running time?
– Recursion:
1. Partition splits array in two sub-arrays:
• one sub-array of size 0
• the other sub-array of size n-1
2. Quicksort each sub-array
– Depth of recursion tree? O(n)
– Number of accesses per partition?
Quicksort Analysis
• Assume that keys are random, uniformly
distributed.
• Best case running time: O(n log2n)
• Worst case running time?
– Recursion:
1. Partition splits array in two sub-arrays:
• one sub-array of size 0
• the other sub-array of size n-1
2. Quicksort each sub-array
– Depth of recursion tree? O(n)
– Number of accesses per partition? O(n)
Quicksort Analysis
• Balanced partitioning
Quicksort Analysis
• mix of “good” and “bad” splits
A randomized version of quicksort
• RANDOMIZED-PARTITION (A, p, r)
1. i = RANDOM(p, r)
2. exchange A[r] with A[i]
3. return PARTITION(A, p, r)

• RANDOMIZED-QUICKSORT(A,p,r)
1. if p < r
1. q = RANDOMIZED-PARTITION(A,p,r)
2. RANDOMIZED-QUICKSORT(A,p,q -1)
3. RANDOMIZED-QUICKSORT(A,q+ 1, r)
Expected running time
• The running time of QUICKSORT is dominated by the
time spent in the PARTITION procedure.
• Each time the PARTITION procedure is called, it
selects a pivot element, and this element is never
included in any future recursive calls to QUICKSORT
and PARTITION.
• Thus, there can be at most n calls to PARTITION over
the entire execution of the quicksort algorithm.
Expected Running Time of Partition
• One call to PARTITION takes O(1) time plus an
amount of time that is proportional to the number of
iterations of the for loop in lines 3–6

if we can count the total


Partition(A, p, r)
number of times that line 4 is 1. x:= A[r],
executed, we can bound the 2. i=p – 1;
3. for j := p to r – 1 do
total time spent in the for 4. if A[j]  x then
loop during the entire 5. i := i + 1;
execution of QUICKSORT. 6. A[i]  A[j]
7. A[i + 1]  A[r];
8. return i + 1
• once a pivot x is chosen with zi < x < zj , we
know that zi and zj will not be compared at
any subsequent time.
• Pr {zi is compared to zj } = Pr {zi or zj is first
pivot chosen from Zij }
= Pr {zi is the first pivot chosen from Zij } +
Pr {zj is the first pivot chosen from Zij }
=1/(j-i+1)+ 1/(j-i+1)= 2/(j-i+1)

You might also like