0% found this document useful (0 votes)
6 views27 pages

CS3401-Algorithms Unit 3 Notes Final

The document discusses various algorithm design techniques including Divide and Conquer, Dynamic Programming, and Greedy Techniques. It covers specific algorithms such as Merge Sort, Quick Sort, and Matrix-chain multiplication, explaining their methodologies, time complexities, and examples. Additionally, it highlights the principles of optimality and overlapping subproblems in dynamic programming.

Uploaded by

siva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views27 pages

CS3401-Algorithms Unit 3 Notes Final

The document discusses various algorithm design techniques including Divide and Conquer, Dynamic Programming, and Greedy Techniques. It covers specific algorithms such as Merge Sort, Quick Sort, and Matrix-chain multiplication, explaining their methodologies, time complexities, and examples. Additionally, it highlights the principles of optimality and overlapping subproblems in dynamic programming.

Uploaded by

siva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT III ALGORITHM DESIGN TECHNIQUES

Divide and Conquer methodology: Finding maximum and minimum - Merge sort - Quick sort. Dynamic
programming: Elements of dynamic programming — Matrix-chain multiplication - Multi stage graph —
Optimal Binary Search Trees. Greedy Technique: Elements of the greedy strategy - Activity-selection problem
–- Optimal Merge pattern — Huffman Trees.

Divide and Conquer methodology:


A divide and conquer algorithm is a strategy of solving a large problem by
1. breaking the problem into smaller sub-problems
2. solving the sub-problems, and
3. combining them to get the desired output.
To use the divide and conquer algorithm, recursion is used.

How Divide and Conquer Algorithms Work?


Here are the steps involved:
1. Divide: Divide the given problem into sub-problems using recursion.
2. Conquer: Solve the smaller sub-problems recursively. If the sub problem is small enough, then solve it
directly.
3. Combine: Combine the solutions of the sub-problems that are part of the recursive process to solve the
actual problem.

Algorithm
DAC(a, i, j)
{
if(small(a, i, j))
return(Solution(a, i, j))
else
mid = divide(a, i, j) // f1(n)
b = DAC(a, i, mid) // T(n/2)
c = DAC(a, mid+1, j) // T(n/2)
d = combine(b, c) // f2(n)
return(d)
}
Finding Maximum and Minimum
Problem Statement:
To find the maximum and minimum numbers in a given array numbers[] of size n .

Naïve Method
Naïve method is a basic method to solve any problem. In this method, the maximum and minimum number
can be found separately. To find the maximum and minimum numbers, the following straightforward
algorithm can be used.
Algorithm:
Max-Min-Element (numbers[])
max := numbers[1]
min := numbers[1]
for i = 2 to n do
if numbers[i] > max then
max := numbers[i]
if numbers[i] < min then
min := numbers[i]
return (max, min)

Analysis
The number of comparison in Naive method is 2n - 2. The number of comparisons can be reduced using the
divide and conquer approach.

Finding the min and max using Divide and Conquer Approach
In this approach,
 the array is divided into two halves.
 Then using recursive approach maximum and minimum numbers in each halves are found.
 Later, return the maximum of two maxima of each half and the minimum of two minima of each half.

In this given problem, the number of elements in an array is y−x+1 , where y is greater than or equal to x.
Max−Min(x,y) will return the maximum and minimum values of an array numbers[x...y].

Algorithm:
Max - Min(x, y)
if y – x ≤ 1 then
return (max(numbers[x], numbers[y]), min((numbers[x], numbers[y]))
else
(max1, min1):= maxmin(x, ⌊((x + y)/2)⌋)
(max2, min2):= maxmin(⌊((x + y)/2) + 1)⌋,y)
return (max(max1, max2), min(min1, min2))

Analysis
Let T(n) be the number of comparisons made by Max−Min(x,y), where the number of elements n=y−x+1. If T(n)
represents the numbers, then the recurrence relation can be represented as

Compared to Naïve method, in divide and conquer approach, the number of comparisons is less.

Time Complexity: O(n).


Merge Sort
Merge Sort is one of the most popular sorting algorithms that is based on the principle of Divide and
Conquer Algorithm. Here, the array of elements to be sorted is divided into halves and then all the sub array
are divided further recursively until each array of single element is reached. Then the individual elements are
merged in the sorted order leading to all the elements are in the sorted order finally.

Divide and Conquer Strategy in merge sort


Using the Divide and Conquer technique, we divide a problem into subproblems. When the solution to each
subproblem is ready, we 'combine' the results from the subproblems to solve the main problem.
Suppose we had to sort an array A. A subproblem would be to sort a sub-section of this array starting at index
p and ending at index r, denoted as A[p..r].
[Link]
If q is the half-way point between p and r, then we can split the subarray A[p..r] into two arrays A[p..q] and
A[q+1, r].
[Link]
In the conquer step, we try to sort both the subarrays A[p..q] and A[q+1, r]. If we haven't yet reached the base
case, we again divide both these subarrays and try to sort them.
[Link]
When the conquer step reaches the base step and we get two sorted
subarrays A[p..q] and A[q+1, r] for array A[p..r], we combine the results by creating a sorted array A[p..r] from
two sorted subarrays A[p..q] and A[q+1, r].

Example
Algorithm

Time Complexity
Best Case Complexity: O(n*log n) Worst Case Complexity: O(n*log n) Average Case Complexity: O(n*log n)
Quick Sort
The Quick Sort algorithm follows the principal of divide and Conquer. It first picks up the partition element
called ‘Pivot’, which divides the list into two sub lists such that all the elements in the left sub list are
smaller than pivot and all the elements in the right sub list are greater than the pivot. The same process is
applied on the left and right sub lists separately. This process is repeated recursively until each sub list
containing more than one element.

Diivide-and-conquer strategy
[Link] by partition (rearranging) the array A[p:r] into two (possibly empty) subarrays A[p:q – 1] (the low
side) and A[q+1:r] (the high side) such that each element in the low side of the partition is less than or equal
to the pivot A[q]which is, in turn, less than or equal to each element in the high side. Compute the index q of
the pivot as part of this partitioning procedure.
[Link] by calling quicksort recursively to sort each of the subarrays A[p:q – 1] and A[q+1:r]
[Link] by doing nothing: because the two subarrays are already sorted, no work is needed to combine
them.

All elements in A[p:q – 1] are sorted and less than or equal to and A[q] and all elements A[q+1:r] are sorted
and greater than or equal to the pivot A[q] The entire subarray A[p:r] be sorted!
The QUICKSORT procedure implements quicksort. To sort an entire n-element array A[1:n], the initial call is
QUICKSORT.A;1;n/.

Algorithm:

Example
Time Complexity
Best Case Complexity: O(n*log n) Worst Case Complexity: O(n) Average Case Complexity: O(n*log n)
:
Dynamic Programming
Dynamic programming is a technique that can be used to solve optimization problems by breaking them down
into smaller subproblems and reusing solutions to those subproblems to solve the larger problem more
efficiently. It follows the principle of optimality

Basic Features of Dynamic programming :-


 Get all the possible solution and pick up best and optimal solution.
 Work on principal of optimality.
 Define sub-parts and solve them using recursively.
 Less space complexity But more Time complexity.
 Dynamic programming saves us from having to recompute previously calculated sub- solutions.
 Difficult to understanding.

Principle of optimality
The principle of optimality states that in an optimal sequence of choices or decisions , each subsequence
must also be optimal

Elements of dynamic programming:


1. Overlapping subproblems:
A problem is said to have overlapping subproblems if the same subproblem is solved multiple times during the
course of solving the larger problem. In dynamic programming, the solutions to these subproblems are stored
in a table or cache, so that they can be reused later.
2. Optimal substructure:
A problem is said to have optimal substructure if the optimal solution to the problem can be constructed from
the optimal solutions to its subproblems. In other words, if we know the optimal solutions to the subproblems,
we can combine them to obtain the optimal solution to the larger problem.
3. Memoization: Memoization is a technique in dynamic programming where the solutions to subproblems are
stored in a table or cache so that they can be reused later. This can significantly reduce the time complexity of
the algorithm, especially for problems with overlapping subproblems.
4. Recurrence relation: A recurrence relation is a way of expressing a solution to a problem in terms of solutions
to its subproblems. In dynamic programming, the recurrence relation is used to build the table or cache of
solutions to the subproblems.
5. Bottom-up approach: In a bottom-up approach, we start by solving the subproblems and then combine the
solutions to obtain the solution to the larger problem. This is in contrast to a top- down approach, where we
start with the larger problem and recursively break it down into subproblems.
6. State transition: State transition is the process of moving from one state to another in the course of solving
a dynamic programming problem. Each state represents a particular subproblem, and the state transition
defines the relationship between the subproblems.

Matrix-chain multiplication
Matrix Chain Multiplication problem uses dynamic programming to determining an optimal chain order for
multiplying sequence of matrices that has low cost

Problem Statement:
Given a sequence (chain) (A1, A2……An) of n matrices of dimension P1 x P2, P2 x P3,…..,Pn x pn+1 resp. The goal is
to compute the matrix productA1,A2…An . The problem is to find the order in which the matrices are multiplied aso
that it takes the minimum number of scalar multiplications input is the sequence of dimensions (p0,p1,p2…pn)

Matrix multiplication is associative, and so all parenthesizations yield the same product. For example, if the chain
of matrices is (A1, A2, A3, A4) then we can fully parenthesize the product (A1A2A3A4) in five distinct ways:
1:-(A1(A2(A3A4))) ,
2:-(A1((A2A3)A4)),
3:- ((A1A2)(A3A4)),
4:-((A1(A2A3))A4),
5:-(((A1A2)A3)A4) .

We can multiply two matrices A and B only if they are compatible. Ie the number of columns of A must equal
the number of rows of B. If A is a p x q matrix and B is a q x r matrix,the resulting matrix C is a p x r matrix.

The time to compute C is dominated by the number of scalar multiplications is pqr. we shall express costs in
terms of the number of scalar multiplications.

For example, if we have three matrices (A1,A2,A3) and its cost is (10x100),(100x5),(5x500) respectively. so we
can calculate the cost of scalar multiplication is 10*100*5=5000 if ((A1A2)A3), 10*5*500=25000 if (A1(A2A3)),
and so on cost calculation.

Note that in the matrix-chain multiplication problem, we are not actually multiplying matrices. Our goal is
only to determine an order for multiplying matrices that has the lowest cost.

Example:
Solution
Time Compexity: Since three nested loops , the time complexity is O
Multi Stage Graph
Multistage graph G = (V, E, W) is a weighted directed graph in which vertices are partitioned into k ≥ 2 disjoint
sub sets V = {V1, V2, …, Vk} such that if edge (u, v) is present in E then u ∈ Vi and v ∈ Vi+1, 1 ≤ i ≤ k. The goal of
multistage graph problem is to find minimum cost path from source to destination vertex.

 The input to the algorithm is a k-stage graph, n vertices are indexed in increasing order of stages.
 The algorithm operates in the backward direction, i.e. it starts from the last vertex of the graph and
proceeds in a backward direction to find minimum cost path.
 Minimum cost of vertex j ∈ Vi from vertex r ∈ Vi+1 is defined as, Cost[j] = min{ c[j, r] + cost[r] }
 where, c[j, r] is the weight of edge <j, r> and cost[r] is the cost of moving from end vertex to vertex r.

Algorithm

Complexity Analysis of Multistage Graph


If graph G has |E| edges, then cost computation time would be O(n + |E|). The complexity of tracing the minimum
cost path would be O(k), k < n. Thus total time complexity of multistage graph using dynamic programming would
be O(n + |E|).
Example:
Find minimum path cost between vertex s and t for following multistage graph using dynamic
programming.

Solution:
Solution to multistage graph using dynamic programming is constructed as, Cost[j] = min{c[j, r] + cost[r]}
Here, number of stages k = 5, number of vertices n = 12, source s = 1 and target t = 12
Initialization:
Cost[n] = 0 ⇒ Cost[12] = 0. p[1] = s ⇒ p[1] = 1
p[k] = t ⇒ p[5] = 12. r = t = 12.
Stage 4:

Cost[9] = 4, Cost[10]=2, Cost[11]=5


Stage 3:
Vertex 6 is connected to vertices 9 and 10:
Cost[6] = min{ c[6, 10] + Cost[10], c[6, 9] + Cost[9] } = min{5 + 2, 6 + 4} = min{7, 10} = 7
p[6] = 10
Vertex 7 is connected to vertices 9 and 10:
Cost[7] = min{ c[7, 10] + Cost[10], c[7, 9] + Cost[9] }= min{3 + 2, 4 + 4} = min{5, 8} = 5
p[7] = 10
Vertex 8 is connected to vertex 10 and 11:
Cost[8] = min{ c[8, 11] + Cost[11], c[8, 10] + Cost[10] }= min{6 + 5, 5 + 2} = min{11, 7} = 7 p[8] = 10

Stage 2:
Vertex 2 is connected to vertices6, 7 and 8:
Cost[2] = min{ c[2, 6] + Cost[6], c[2, 7] + Cost[7], c[2, 8] + Cost[8] }= min{4 + 7, 2 + 5, 1 + 7} = min{11, 7, 8} = 7
p[2] = 7
Vertex 3 is connected to vertices 6and 7:
Cost[3] = min{ c[3, 6] + Cost[6], c[3, 7] + Cost[7] }= min{2 + 7, 7 + 5} = min{9, 12} = 9
p[3] = 6
Vertex 4 is connected to vertex 8:
Cost[4] = c[4, 8] + Cost[8] = 11 + 7 = 18
p[4] = 8
Vertex 5 is connected to vertices 7 and 8:
Cost[5] = min{ c[5, 7] + Cost[7], c[5, 8] + Cost[8] }= min{11 + 5, 8 + 7} = min{16, 15} = 15 p[5] = 8
Stage 1:
Vertex 1 is connected to vertices 2, 3, 4 and 5:
Cost[1] = min{ c[1, 2] + Cost[2], c[1, 3] + Cost[3], c[1, 4] + Cost[4], c[1, 5] + Cost[5]}
= min{ 9 + 7, 7 + 9, 3 + 18, 2 + 15 }
= min { 16, 16, 21, 17 } = 16 p[1] = 2
Trace the solution:
p[1] = 2
p[2] = 7
p[7] = 10
p[10] = 12

Minimum cost path is : 1 – 2 – 7 – 10 – 12


Cost of the path is : 9 + 2 + 3 + 2 = 16

Optimal Binary Search Tree


Optimal Binary Search Tree extends the concept of Binary searc tree. Binary Search Tree (BST) is a nonlinear data
structure which is used in many scientific applications for reducing the search time. In BST, left child is smaller
than root and right child is greater than root. This arrangement simplifies the search procedure.

Optimal Binary Search Tree (OBST) is very useful in dictionary search. The probability of searching is different for
different words. OBST has great application in translation.
If we translate the book from English to German, equivalent words are searched from English to German
dictionary and replaced in translation. Words are searched same as in binary search tree order.
Binary search tree simply arranges the words in lexicographical order. Words like ‘the’, ‘is’, ‘there’ are very
frequent words, whereas words like ‘xylophone’, ‘anthropology’ etc. appears rarely.
It is not a wise idea to keep less frequent words near root in binary search tree.
Instead of storing words in binary search tree in lexicographical order, we shall arrange them according to their
probabilities. This arrangement facilitates few searches for frequent words as they would be near the root. Such
tree is called Optimal Binary Search Tree.

Consider the sequence of nkeys K = < k1, k2, k3, …, kn> of distinct probability in sorted order such that
k1< k2< … <kn. Words between each pair of key lead to unsuccessful search, so for n keys, binary search tree
contains n + 1 dummy keys di, representing unsuccessful searches.

Two different representation of BST with same five keys {k1, k2, k3, k4, k5} probability is shown in following figure
With n nodes, there exist (2n)!/((n + 1)! * n!) different binary search trees. An exhaustive search for optimal
binary search tree leads to huge amount of time.
The goal is to construct a tree which minimizes the total search cost. Such tree is called optimal binary search
tree. OBST does not claim minimum height. It is also not necessary that parent of sub tree has higher priority
than its child.
Dynamic programming can help us to find such optima tree.

Binary search trees with 5 keys


Mathematical formulation
 We formulate the OBST with following observations
 Any sub tree in OBST contains keys in sorted order ki…kj, where 1 ≤ i ≤ j ≤ n.
 Sub tree containing keys ki…kj has leaves with dummy keys di-1….dj.
 Suppose kr is the root of sub tree containing keys ki…..kj. So, left sub tree of root kr
contains keys
ki….kr-1 and right sub tree contain keys kr+1 to kj. Recursively, optimal sub trees are constructed
from the left and right sub trees of kr.
 Let e[i, j] represents the expected cost of searching OBST. With n keys, our aim is to find
and minimize e[1, n].
 Base case occurs when j = i – 1, because we just have the dummy key di-1 for this case.
Expected search cost for this case would be e[i, j] = e[i, i – 1] = qi-1.
 For the case j ≥ i, we have to select any key kr from ki…kj as a root of the tree.
 With kr as a root key and sub tree ki…kj, sum of probability is defined as

(Actual key starts at index 1 and dummy key starts at index 0)


Thus, a recursive formula for forming the OBST is stated below :

e[i, j] gives the expected cost in the optimal binary search tree.
Algorithm for Optimal Binary Search Tree
The algorithm for optimal binary search tree is specified below :

Algorithm OBST(p, q, n)
// e[1…n+1, 0…n ] : Optimal sub tree
// w[1…n+1, 0…n] : Sum of probability
// root[1…n, 1…n] : Used to construct OBST
for i ← 1 to n + 1 do
e[i, i – 1] ← qi – 1
w[i, i – 1] ← qi – 1
end

for m ← 1 to n do
for i ← 1 to n – m + 1 do
j ← i + m – 1 e[i, j] ←

w[i, j] ← w[i, j – 1] + pj + qj
for r ← i to j do
t ← e[i, r – 1] + e[r + 1, j] + w[i, j]

if t < e[i, j] then e[i, j] ←


t root[i, j] ← r
end end
end end
return (e, root)

Complexity Analysis of Optimal Binary Search Tree


It is very simple to derive the complexity of this approach from the above algorithm. It uses three nested loops.
Statements in the innermost loop run in Q(1) time. Thus, the OBST algorithm runs in cubic time

Example
Problem: Let p (1 : 3) = (0.5, 0.1, 0.05) q(0 : 3) = (0.15, 0.1, 0.05, 0.05) Compute and construct OBST for
above values using Dynamic approach.

Solution:
Here, given that

i 0 1 2 3

pi 0.5 0.1 0.0


5
qi 0.1 0.1 0.0 0.0
5 5 5

Recursive formula to solve OBST problem is


Where,

Initially,
Now, we will compute e[i, j]
Initially,

e[1, 0] = q0 = 0.15 (∵ j = i – 1)
e[2, 1] = q1 = 0.1 (∵ j = i – 1)
e[3, 2] = q2 = 0.05 (∵ j = i – 1)
e[4, 3] = q3 = 0.05 (∵ j = i – 1)
e[1, 1] = min { e[1, 0] + e[2, 1] + w(1, 1) }
= min { 0.15 + 0.1 + 0.75 } = 1.0
e[2, 2] = min { e[2, 1] + e[3, 2] + w(2, 2) }
= min { 0.1 + 0.05 + 0.25 } = 0.4
e[3, 3] = min { e[3, 2] + e[4, 3] + w(3, 3) }
= min { 0.05 + 0.05 + 0.15 } = 0.25
e[1, 3] is minimum for r = 1, so r[1, 3] = 1
e[2, 3] is minimum for r = 2, so r[2, 3] = 2
e[1, 2] is minimum for r = 1, so r[1, 2] = 1
e[3, 3] is minimum for r = 3, so r[3, 3] = 3
e[2, 2] is minimum for r = 2, so r[2, 2] = 2

e[1, 1] is minimum for r = 1, so r[1, 1] = 1


Let us now construct OBST for given data.
r[1, 3] = 1, so k1 will be at the root.
k2….3 are on right side of k1
r[2, 3] = 2, So k2 will be the root of this sub tree.
k3 will be on the right of k2.
Thus, finally, we get.
Greedy Technique
A greedy strategy is a method that makes locally optimal choices at each step with the hope of finding a
global optimum. That is making the best decision at each step will lead to an overall optimal solution.
 Note: However, the strategy may not always lead to optimal solutions.
A choice made in greedy approach should be
1. Feasible: it should satisfy the constraints
2. Locally optimal: best choice at the current sub problem
3. Irrevocable: Choices once made can’t be changed on subsequent steps

Elements of Greedy Technique


1. Greedy choice property:
A greedy strategy works by making locally optimal choices at each step. This means that at each step, we
choose the best possible option without considering the future consequences. The choice that appears to be
the best at the moment is made, without worrying about whether this choice will lead to the best overall
solution.
2. Optimal substructure property:
A problem exhibits optimal substructure if an optimal solution to the problem contains optimal solutions to its
subproblems. This means that we can break down the problem into smaller subproblems, solve each
subproblems optimally, and then combine the solutions to get the overall optimal solution.
3. Greedy algorithm:
A greedy algorithm is a method for finding an optimal solution that uses a greedy strategy. The algorithm
starts with an empty solution and iteratively adds elements to the solution, always choosing the element that
appears to be the best at the moment. Once all the elements have been added, the solution is complete.
4. Proof of correctness:
To prove that a greedy algorithm is correct, we need to show that it always produces an optimal solution. This
can be done using a direct proof where we show that any suboptimal solution can be transformed into an
optimal solution without violating the greedy choice property.

Examples of problems that can be solved using a greedy strategy:


Coin change problem, the activity selection problem, and the Huffman coding problem.

Comparing Greedy approach with Dynamic Programming


Activity Selection Problem
Activity Selection problem is a problem of selecting non-conflicting activities based on start and end time
with a goal of selecting a max-size set of mutually compatible activities.

Need: It is needed to schedule several competing activities that require a common resource

Problem statement:
"Given a set of n activities with their start and finish times, we need to select maximum number of non-
conflicting activities that can be performed one at a time."
Solution:
The Activity Selection problem follows Greedy approach i.e. at every step, we can make a choice that looks
best at the moment to get the optimal solution of the complete problem.
Objective:
is to complete maximum number of activities.
Strategy:
 So, choosing the activity which is going to finish first will leave us maximum time to adjust the later
activities.
 This is the intuition that greedily choosing the activity with earliest finish time will give us an optimal
solution. Thereby making the greedy choice at every step produces an optimal solution
 If we sort elements based on their starting time, the activity with least starting time could take the
maximum duration for completion, therefore we won't be able to maximize number of activities.

Algorithm

Analysis
Time Complexity:
When activities are sorted by their finish time: O(N)
When activities are not sorted by their finish time, the time complexity is O(N log N) due to complexity of
sorting
Performance: Can be solved in O(N logN) time using a simple greedy approach if we use a Dynamic
Programming approach, the time complexity will be O(N^3) that is lower performance.

Example 1:
Given set of 6 activities with their start and finish time

Task 0 1 2 3 4 5 6
Start 1 3 2 0 5 8 11
Finish 3 4 5 7 9 10 12
Note : Sort the activities based on the finish time in ascending order( if not sorted ) take time of O(log N)
Example:

First the activity 0 gets selected. As the activity 1 has starting time which is equal to the finish time of activity
0, it gets selected. Activities 2 and 3 have smaller starting time than finish time of activity 1, so they get
rejected. Based on similar comparisons, activities 4 and 6 also get selected, whereas activity 5 gets rejected. In
this example, in all the activities 0, 1, 4

Solution: the activities selected are { 0,1,4,6}

Example 2 : Find the maximum number of activities set using the greedy approach

Solution:
The resulting set of selected activities is {a1; a4; a8; a11}
Time complexity: The complexity of this problem is O(n log n) when the list is not sorted. When the sorted list
is provided the complexity will be O(n).
Optimal Merge Pattern
 Problem Statement:
To Merge a set of sorted files of different length into a single sorted file. We need to find an optimal
solution, where the resultant file will be generated in minimum time.
 If the number of sorted files are given, there are many ways to merge them into a single sorted file.
This merge can be performed pair wise. Hence, this type of merging is called as 2-way merge patterns.
 As, different pairings require different amounts of time, in this strategy we want to determine an
optimal way of merging many files together. At each step, two shortest sequences are merged.
 To merge a p-record file and a q-record file requires possibly p + q record moves, the obvious choice
being, merge the two smallest files together at each step.
 Two-way merge patterns can be represented by binary merge trees.
 Let us consider a set of n sorted files {f1, f2, f3, …, fn}. Initially, each element of this is considered as a
single node binary tree. To find this optimal solution, the following algorithm is used.

Algorithm:
TREE (n)
for i := 1 to n – 1 do
declare new node
[Link] := least (list)
[Link] := least (list)
[Link]) := (([Link]).weight)+
(([Link]).weight)
insert (list, node);
return least (list);

At the end of this algorithm, the weight of the root node represents
 the optimal cost is given by ∑di *xi
To get optimal solution that is optimal merge pattern, merge pair of smallest size files
Merging the files according to their size in an ascending order,
Hence, merge operations can be performed on this sequence.
In this context, we are now going to solve the problem using this algorithm.

Example
Let us consider the given files, f1, f2, f3, f4 and f5 with 20, 30, 10, 5 and 30 number of elements respectively.
Step 1 : Consider all the files individually as the nodes

Step 2: Merge the smallest files that is merging f3 and f4. So M1 = merge f3 and f4 => 5 + 10 = 15

Step 3 : Merge the next smallest files ,M1 with f1 that is M2 = merge M1 and f1=>15+20 = 35
Step 4 Merge the next smallest files, f2 and f5, that is M3=merge f2 and f5=>30 +30 = 60

Step 5 : Merge the next smallest files ,M2 with M3 that is M4=merge M2 and M3 =>35 +60 = 95

Hence, the optimal cost is 95+35+60+15 = 205 = Distance (path length)

Huffman Tree
Huffman codes compress data very effectively: savings of 20% to 90% are typical, depending on the
characteristics of the data being compressed. We consider the data to be a sequence of characters.
Huffman’s greedy algorithm uses a table giving how often each character occurs (i.e., its frequency) to build
up an optimal way of representing each character as a binary string.

Suppose we have a 100,000-character data file that we wish to store compactly. We observe that the
characters in the file occur with the frequencies given by Figure 16.3. That is, only 6 different characters
appear, and the character a occurs 45,000 [Link] have many options for how to represent such a file of
information. Here,
we consider the problem of designing a binary character code (or code for short)

Huffman Code
 Huffman coding provides codes to characters such that the length of the code depends on the relative
frequency or weight of the corresponding character. Huffman codes are of variable-length, and
without any prefix (that means no code is a prefix of any other). Any prefix-free binary code can be
displayed or visualized as a binary tree with the encoded characters stored at the leaves.
 Huffman tree or Huffman coding tree defines as a full binary tree in which each leaf of the tree
corresponds to a letter in the given alphabet.
 The Huffman tree is treated as the binary tree associated with minimum external path weight that
means, the one associated with the minimum sum of weighted path lengths for the given set of
leaves. So the goal is to construct a tree with the minimum external path weight.
An example is given below- Letter frequency
table
Letter z k m c u d l e
Frequency 2 7 24 32 37 42 42 120

Huffman code

Letter Freq Code Bits

e 120 0 1

d 42 101 3

l 42 110 3

u 37 100 3

c 32 1110 4

m 24 11111 5

k 7 111101 6

z 2 111100 6

Solution:

Analysis:
The time complexity of the Huffman coding algorithm is O(n log n), where n is the number of unique
characters or symbols in the input data. This complexity arises from building the Huffman tree by
repeatedly merging nodes and constructing the variable-length codes
Example 2

Solution:

The steps of Huffman’s algorithm for the frequencies given in Figure 16.3. Each part shows the contents of
the queue sorted into increasing order by frequency. At each step, the two trees with lowest frequencies
are merged. Leaves are shown as rectangles containing a character and its frequency. Internal nodes are
shown as circles containing the sum of the frequencies of their children. An edge connecting an internal
node with its children is labeled 0 if it is an edge to a left child and 1 if it is an edge to a right child. The
codeword for a letter is the sequence of labels on the edges connecting the root to the leaf for that letter.
(a) The initial set of n D 6 nodes, one for each letter. (b)–(e) Intermediate stages. (f) The final tree.

You might also like