DDA - Module 1 - Introduction
DDA - Module 1 - Introduction
Disjoint Sets – Disjoint set operations, Union and find algorithms, spanning trees, connected
components and bi-connected components.
1.1 ALGORITHM
An algorithm is any well defined computational procedure that takes some values as input and
produces some values as output. It is thus a sequence of computational steps that transform
input into output.
An algorithm is composed of finite steps each of which may require one or more operations.
Each operation may be characterized as either simple or complex.
An algorithm is a step by step procedure for performing some tasks in a finite amount of time.
According to [Link] a pioneer in the computer science discipline an algorithm must have
the following 5 basic properties
1
1. Input
2. Output
3. Finiteness
4. Definiteness
5. Effectiveness
Input: An algorithm has 0 or more inputs that are given to it initially before it begins (or)
dynamically as it runs.
Outputs: An algorithm has 1 or more outputs that have a specified relation with the inputs. An
algorithm produces at least one or more outputs.
Definiteness: Each operation specified in an algorithm must have definite meaning. Each step
of the algorithm must be precisely defined. Instructions such as “compute x/0 “ or “subtract 7
or 6 to x” are not permitted because it is not clear which of the two possibilities should be done
or what the result is.
Effectiveness: Each operation of an algorithm should be effective. i.e. The operation must be
able to be carried out in a finite amount of time. Tracing of each step should be possible.
2
Pseudocode is a way of describe an algorithm or program in a natural language like English,
instead of a specific programming language. It is a blue print for writing code, and is used to
design, plan and evaluate algorithms and programs.
3
else
statement2;
10) Case statement
case(expression)
{
condition1: statement1;
condition2: statement2;
------
}
11) Input & Output
12) Structure: algorithm is a procedure consisting of a heading and body.
Algorithm name_of_algorithm(parameters)
{
----
----
}
Ex 2) Algorithm to find largest number from the elements of a single dimensional array
Algorithm find_large(a,n)
// a is an array of size n
{
large:=a[1];
for i:= 2 to n do
if (a[i]>large) then large:=a[i];
return large;
}
4
Ex 3) Algorithm to find smallest number from the elements of a single dimensional array
Algorithm find_small(a,n)
// a is an array of size n
{
small := a[1];
for i:= 2 to n do
if (a[i]< small) then small := a[i];
return small;
}
5
Algorithm linear_search(a,n,tar)
// a is an array of size n
{
pos:=0;
for i:= 1 to n do
if (a[i]=tar) then pos:=i;
return pos;
}
6
}
}
Ex 9) Algorithm to sort elements of a single dimensional array using selection sort technique.
Algorithm selection_sort(a,n)
// a is an array of size n
{
for i := n downto 2 do
{
k:= 1;
for j:= 2 to i do
if (a[j]>a[k]) then k:=j;
temp:=a[i];
a[i]:= a[k];
a[k]:=temp;
}
}
1.3 RECURSION
7
Ex 12) Recursive algorithm to find ncr
Algorithm ncr(n,r)
{
if (r=0) then return 1;
else
if (r=1) then return n;
else
if (r=n) then return 1;
else
return ( ncr(n-1,r)+ncr(n-1,r-1));
}
8
1.4 DESIGN AND ANALYSIS OF ALGORITHMS
4) Prove Correctness
Once an algorithm has been specified, next we have to prove its correctness. Usually
testing is used for proving correctness.
9
5) Analyse Algorithm
Analysing the algorithm means studying the algorithm behavior, i.e. calculating the
time complexity and space complexity. If the time complexity is more, then we will
use one more designing technique such that time complexity should be minimum.
6)Coding an Algorithm
After completion of all phases successfully, then we will code an algorithm. Coding
should not depend on any programming language. We use general notation (pseudo
code) and English language statements.
7)Testing of Algorithms
Space Complexity
Time Complexity
Algorithm analysis refers to the task of determining the computing time and storage
requirements of an algorithm. It is also known as performance analysis which enables us to
select an efficient algorithm.
10
1)By checking the correctness of an algorithm.
2)By measuring the time and space complexity of an algorithm.
To Compute the analysis of algorithm, two phases are required. They are
1)Priori analysis
2)Posteriori analysis
1)Priori analysis: This is one of the creative analysis of algorithms. In this we obtain a function
which bounds the algorithms computing time.
Suppose there is a statement in the middle of a program. We wish to determine the total time
that the statement will spend for the execution. This requires
i)The statements frequency count i.e. the number of times the statement will be executed
ii)The time taken for one execution
The product of these two numbers is the total time.
The priori analysis is mainly concerned with the order of determining the magnitude. The
notation used in priori analysis are Big-oh(O), Omega(Ω), theta(Θ) and small-oh(o). Priori
analysis of computing time ignores all of the factors, which are machine or programming
language dependent and only concentrates on determining the order of the magnitude of the
frequency of execution of the statements.
2)Posteriori Analysis: in this we will collect the actual statistics about the algorithm, in
conjunction of the time and space while executing. Once the algorithm is written it has to be
tested. Testing a program consists of two major phases.
i) Debugging: It is the process of executing programs on sample data sets that determine
whether we get proper results. If faulty results occur, it has to be corrected.
ii) Profiling: it is the process of executing a correct program on actual data sets and
measuring the time and space it takes to compute the results during execution. The
actual time taken by the algorithm to process the data is called profiling.
Space Complexity: The space complexity of an algorithm is the amount of memory it needs
to run to completion.
11
Time Complexity: The time complexity of an algorithm is the amount of computer time it
needs to run to completion.
The space needed by the algorithms is seen to be the sum of the following components:
1) A fixed part that is independent of the characteristics of the inputs and outputs. This
part typically includes the instruction space, space for simple variables, space for
constants etc.
2) A variable part that consists of the space needed by component variables whose size
is dependent on the particular problem instance being solved; the space needed by
reference variables and the recursion stack space.
The space requirement S(P) of any algorithm P may therefore be written as S(P) = c +
Sp(instance characteristics).
----------------------------------------------------------------------------------------------------------------
Algorithm to find sum of two numbers
int findsum(a,b)
int a,b;
{
int sum;
sum=a+b;
return sum;
}
12
float evaluate(a,b,c)
float a,b,c;
{
float sum
sum = a+b+b*c + (a+b-c)/(a+b)+4.0;
return (sum);
}
In this algorithm the problem instance is characterized by the specific values of a, b and c.
making the assumption that one word is adequate to store the values of each of a, b and c, and
the result, we see that space needed by evaluate function is independent of the instance
characteristics. Sp (instance characteristics)=0
Space requirement s[p] = 4+0 =4
One space for each a, b, c and sum i.e. total 4 words.
Space complexity is O(1)
--------------------------------------------------------------------------------------------------------------
Sp (instance characteristics) =0
Space requirement s[p] = 3+0 =3
One space for each n, i, sum i.e. total 3 words.
Space complexity is O(1)
---------------------------------------------------------------------------------------
Algorithm to find sum of n natural numbers using recursion
int findsum(n)
int n;
13
{
if (n==0)
return ( 0);
else
return(n+findsum(n-1));
}
14
Space needed by n is one word; space needed by a is n words; space needed by I and sum one
word each.
S(addition)=3+n
Space complexity O(n)
int add(a,n)
int a[],n;
{
if (n==0)
return 0;
else
return (a[n-1]+add(a,n-1));
}
For example, assume that the array contains 5 elements. The recursive function will find sum
of elements as follows.
The instances are characterized by n. The recursion stack space includes space for the formal
parameters, the local variables, and the return address. Assume that the return address requires
only one word of memory. Each call to add function requires at least three words( n, return
address, pointer to a[]). Since the depth of recursion is n+1, the recursion stack space needed
is >=(3(n+1) words.
15
Recursion –stack space = 3(n+1) = 3n + 3 = O(n)
--------------------------------------------------------------------------------------------------------------
Algorithm to find factorial of a given number using recursive method.
Int factorial(int n)
{
if (n==1)
retrun(1);
else
return(n*factorial(n-1));
}
0 1 1 2 3 5 8 13 21 ------
16
int fibonacci(n)
int n;
{
int first, second, next, i;
first=0;
second=1;
for (i=3; i<=n; i++)
{
next = first + second;
first=second;
second=next;
}
return(next);
}
The time taken by a program P is the sum of the compile time and the run time. The compile
time does not depend on the instance characteristics. Also we may assume that a compiled
program will run several times without recompilation. Consequently, we concern ourselves
with just the run time of a program. This runtime is denoted by tp(instance characteristics).
17
-------------------------------------------------------------------------------------------------------
Algorithm to find sum of two numbers
int findsum(a,b)
int a,b;
{
int sum;
sum=a+b; --------------------------- 1
return sum; --------------------------- 1
}
Total 2 executable instructions
Time complexity is O(1)
--------------------------------------------------------------------------------------------------------
For Example: Let us consider the following algorithm.
float evaluate(a,b,c)
float a,b,c;
{
float sum
sum = a+b+b*c + (a+b-c)/(a+b)+4.0; -------------- 1
return (sum); -------------- 1
}
Total 2 executable instructions. As 2 is constant
Time complexity is O(1)
-----------------------------------------------------------------------------------------------------------
18
Algorithm to find sum of n natural numbers using recursion
int findsum(n)
int n;
{
if (n==0) -------------------------- 1
return ( 0); ------------------------- 1
else
return(n+findsum(n-1)); ------------------------- 1 + x
}
This is a recursive function
Every recursive function it checks condition and returns something.
Every recursive call will have 2 executable instructions.
The function will be called n+1 times
So total executable instructions will be 2*(n+1) = 2n+2
Time complexity = O(n)
19
sum = 0; 1 1 1
for (i=0;i<n;i++) 1 n+1 n+1
sum = sum+a[i]; 1 n n
return sum; 1 1 1
} 0 - 0
Total 2n+3
The number of steps per execution and the frequency of each of the statements in addition
algorithm have been listed. The total number of steps required by the algorithm is determined
to be 2n+3. It is important to note that the frequency of for statement is n+1 and not n. this is
so because i has to be incremented to n+1 before the for loop can terminate.
int add(a,n) 0 - - 0 0
int a[], n; 0 - - 0 0
{ 0 - - 0 0
if (n=0) 1 1 1 1 1
return 0; 1 1 0 1 0
else 0 - - 0 0
return a[n]+add(a,n-1); 1+x 0 1 0 1+x
} 0 - - 0 0
Total 2 2+x
x=tadd(a,n-1)
Notice that under the s/e column, the else clause has been given a count of 1 + t add(a,n-1)
it includes all steps that get executed as a result of the invocation of add() from the else clause.
The frequency and total steps columns have been split into two parts: one for the case n=0 and
the other for the case n>0.
20
Fig.1.4 Sum of elements of array by using recursion
21
int fibonacci(n) steps
int n;
{
int first, second, next, i;
first=0; --------------------------- 1
second=1; ---------------------------- 1
for (i=3; i<=n; i++) ---------------------------- n-2+1
{
next = first + second; ------------------------------ n-2
first=second; ----------------------------- n-2
second=next; ------------------------------n-2
}
return(next); ---------------------------- 1
}
Total steps = 2+ n-2+1 + n-2+n-2+n-2+1 = 4n-4 = O(n)
---------------------------------------------------------------------------------------------------------------
Algorithm to find factorial of a given number using iteration method.
int factorial(int n)
{
int i,fn;
fn=1; -------------------------------------- 1 statement
for(i=1;i<=n;i++) -------------------------------- n+1 statements
fn=fn*i; ---------------------------------- n statements
return(fn); -------------------------------- 1 statement
} ------------------------------total statements = 1+n+1+n+1 = 2n+3
int factorial(int n)
{
if (n==1)
retrun(1);
else
return(n*factorial(n-1));
22
}
Time Complexity:
Every time the function called it executes two statements
if statement and return statement.
The factorial function will be called repeatedly n times. So total statements that are executed
are 2n. The time complexity will be O(n).
The size of n is either increased or decreased. The time analysis may not be always linearly
proportional, because it depends upon what kind of basic operations we have in the algorithm.
Therefore, while analyzing an algorithm’s time efficiency the order of growth of n is also
important. We expect the algorithms must execute faster, but for as n varies the execution time
varies. To understand the meaning of the order of growth, let us list the most common
computing time functions.
23
6 n3 Cubic
7 2n Exponential
8 n! Factorial
The functions shown are quite common in the analysis of algorithms. These functions have
strong significance in terms of their relative values. The growth of these common functions
typical values for n can be considered
n log n n log n n2 n3 2n n!
1 0 0 1 1 2 1
2 1 2 4 8 4 2
4 2 8 16 64 16 24
8 3 24 64 512 256 40320
16 4 64 256 4096 65536 20922789888000
32 5 160 1024 32768 4.2x109 2.6x1035
105 17 1.7x106 1010 1015 -
106 20 2.0x107 1012 1018 -
The function log n grows very slowly compared to any other function with respect to n. But
2n and n! grows very fast.
These notations provide a concise way to express the behavior of an algorithm’s time or
space complexity as the input size approaches infinity.
24
1.7.1 Big-Oh Notation (O):
Big-O notation represents the upper bound of the running time of an algorithm. Therefore,
it gives the worst-case complexity of an algorithm
It returns the highest possible output value (Big-O) for a given input.
Big-O (Worst Case) It is defined as the condition that allows an algorithm to complete
statement execution in the longest amount of time possible.
If f(n) describes the running time of an algorithm, f(n) is O(g(n)) if there exist a positive
constant C and n0 such that, 0 ≤ f(n) ≤ cg(n) for all n ≥ n0 . It is graphically shown in
fig.1.5.
It returns the highest possible output value (Big-O) for a given input.
The execution time serves as an upper bound on the algorithm’s time complexity.
25
Fig.1.6 Big-Oh Notation Graph
For example, Consider the case of Insertion Sort. It takes linear time in the best case
and quadratic time in the worst case. We can safely say that the time complexity of
the Insertion sort is O(n2).
Big-oh notation gives an upper bound and a function f(n). The upper bound of f(n) indicates
that the function f(n) will be the worst case that it doesnot consume more than this
computing time.
Problems on O nation
26
So 𝑐0 =50
5n+50<=6n for all n>=𝑛0 Hence f(n)=O(g(n)), Where c=6 and 𝑐0 =50
350
300
250
200
150
100
50
0
10 20 30 40 50 51
f(n) c*g(n)
n 5n+50 6n Condition
(50n+50<=6n)
10 100 60 False
20 150 120 False
30 200 180 False
40 250 240 False
50 300 300 True
51 301 306 True
Dominate term = 3n , Let g(n)=n
27
So 𝑐0 =8
So 𝑐0 =4
28
Step1: Select the dominate term from 2𝑛3 − 2𝑛2
So 𝑐0 =1
2𝑛 + 3𝑛4 <=2𝑛+1
29
𝑛0 = 13
Other examples
30
Ex 2) Given f(n) = 20n3-3 then prove that f(n) = O(n3)
f(n) = 20n3-3
20n3 – 3 <= 20n3 for all n>=1
The above inequality can be satisfied using the definition of Big-oh notation by setting
c=20 and g(n)=n3, n0=1 Therefore f(n) = O(n3)
31
Definition: f(n)=Ω(g(n)) iff there exists two positive constants c and n0 such that
f(n)>=c.g(n) for all n>= n0
Omega notation represents the lower bound of the running time of an algorithm. Thus, it
provides the best case complexity of an algorithm.
The execution time serves as a lower bound on the algorithm’s time complexity.
It is defined as the condition that allows an algorithm to complete statement execution in the
shortest amount of time.
Let g and f be the function from the set of natural numbers to itself. The function f is said to
be Ω(g), if there is a constant c > 0 and a natural number n0 such that c*g(n) ≤ f(n) for all n
≥ n0 . It is graphically shown in fig 1.6
Let us consider the same Insertion sort example here. The time complexity of Insertion Sort
can be written as Ω(n), but it is not very useful information about insertion sort, as we are
generally interested in worst-case and sometimes in the average case.
1 15 10 False
2 45 40 True
Problems on Omega Notation
10𝑛2+5>=10*𝑛2
.
10𝑛2+5>=10*𝑛2 for all n>=𝑛0 and c=10 where 𝑛0 = 2
80
70
60
50
40
30
20
10
0
1 2 3 4 5
f(n) c*g(n)
33
Examples
100n+6>=10n, c=10, 𝑛0 = 1
6 ∗ 2𝑛 + 𝑛2 ≥ 10 ∗ 2𝑛
c=6, , 𝑛0 = 1
𝑛4 + 3𝑛3 + 2𝑛 + 1 ≥ 1 ∗ (𝑛3 + 4)
c=1 and 𝑛0 = 1
34
Ex 7: Following statements are false
3𝑛 + 3 = Ω(1).
10𝑛2 + 4𝑛 + 2 = Ω(n).
Ex 8: Prove that
𝑓(𝑛)
lim ≠0
𝑛→∞ 𝑔(𝑛)
4
7𝑛 + 4 7+𝑛
lim = lim =7≠0
𝑛→∞ 𝑛 𝑛→∞ 1
1
𝑁𝑜𝑡𝑒: =0
∞
We have to show
𝑓(𝑛)
lim =0
𝑛→∞ 𝑔(𝑛)
3𝑛 3
lim = lim =0∗3 =0
𝑛→∞ 𝑛 2 𝑛→∞ 𝑛
Definition: Let g and f be the function from the set of natural numbers to itself. The function
f is said to be Θ(g), if there are constants c1, c2 > 0 and a natural number n0 such that c1*
g(n) ≤ f(n) ≤ c2 * g(n) for all n ≥ n0
35
Theta notation encloses the function from above and below. Since it represents the upper and
the lower bound of the running time of an algorithm, it is used for analysing the average-
case complexity of an algorithm. It is graphically shown in fig 1.7
Theta (Average Case) You add the running times for each possible input combination and
take the average in the average case.
The execution time serves as both a lower and upper bound on the algorithm’s time
complexity.
It exists as both, most, and least boundaries for a given input value.
A simple way to get the Theta notation of an expression is to drop low-order terms and
ignore leading constants. For example, Consider the expression 3n3 + 6n2 + 6000 =
Θ(n3), the dropping lower order terms is always fine because there will always be a
number(n) after which Θ(n3) has higher values than Θ(n2) irrespective of the constants
involved.
36
{ 100 , log (2000) , 10^4 } belongs to Θ(1)
{ (n/4) , (2n+3) , (n/100 + log(n)) } belongs to Θ(n)
{ (n^2+n) , (2n^2) , (n^2+log(n))} belongs to Θ( n2)
Examples
1
Example :1 𝑓 (𝑛) = 2 𝑛2 − 3𝑛 𝑡ℎ𝑒𝑛 𝑠ℎ𝑜𝑤 𝑡ℎ𝑎𝑡 𝑓(𝑛) = Θ(𝑛2 )
1
Step 1: Select dominate term from 𝑛2 − 3𝑛
2
Step 2:
1
𝑐1 ∗ 𝑛2 ≤ 2 𝑛2 − 3𝑛 ≤ 𝑐2 ∗ 𝑛2
1 3
𝑐1 ≤ − ≤ 𝑐2
2 𝑛
Step 3: Select c1
n 1 3 Value
−
2 𝑛
1 -ve
1 3
−
2 1
2 1 3 -ve
−
2 2
3 1 3 -ve
−
2 3
4 1 3 -ve
−
2 4
37
5 1 3 -ve
−
2 5
6 1 3 0
−
2 6
7 1 3 1
−
2 7 14
1
c1=14 and n0=7
Part-II
Find c2
1 3
− ≥ 𝐶2
2 𝑛
1
C2=2
1 1 3 1
≤ − ≤
14 2 𝑛 2
1 1
c1=14 , c2=2 and n0=7
PART-1
38
C1=1
1 ∗ (1 + 1) ≤ 1 + 3 + 5 + 1
2<=10
PART-II
C2=10
10<=10*2
10<=20
n0=1
3n<=3n+2<=4n
3*2<=3+2<=4*2
6<=6<=8
For some functions the lower bound and the upper bound may be same. i.e. big Oh and omega
will have the same function. For example, to find minimum or maximum of an array of
elements the computing time is O(n) and Ω(n).
39
There exists a special notation to denote for functions having the same time complexity for
lower and upper bounds and this notation is called Theta θ Notation.
Def: f(n)=θ (g(n)) iff there exists three positive constants c 1 c2 and n0 with the constraint that
c1 g(n) <= f(n) <= c2 (g(n) for all n >= n0
Ex 1) f(n)=1/2n2-3n θ(n2)
Ex2 f(n)=3n+2 θ(n)
Ex3 f(n)=10n2+4n+2 θ(n2)
Ex 4 f(n)=10xlogn+4 θ(log n)
Definition: f(n) = o(g(n), iff f(n) < c.g(n) for any constants c > 0, n0>0 and n>n0.
or
f ( n)
lim 0
n g ( n)
The asymptotic upper bound provided by O-notation may or may not be asymptotically tight.
The bound 2n2=O(n2) is asymptotically tight, but the bound 2n = O(n2) is not. We use o-
notation to denote an upper bound that is not asymptotically tight.
Therefore f(n)=o(n!)
Big O is used as a tight upper bound on the growth of an algorithm’s effort (this effort is
described by the function f(n)), even though, as written, it can also be a loose upper bound.
“Little o” (o) notation is used to describe an upper bound that cannot be tight.
In the domain of algorithm analysis, the little o notation is a valuable tool used to describe
the behavior of functions as they approach certain limits. When we say that a
function f(n) is o(g(n)), we are essentially stating that f(n) grows slower than g(n) as n
40
approaches infinity. In simpler terms, if f(n) = o(g(n)), it means that g(n) grows faster
than f(n).
Thus, Little o means loose upper-bound of f(n). Little o is a rough estimate of the
maximum order of growth whereas Big-O may be the actual order of growth.
Best Case: It is the minimum number of steps that can be executed for a given parameter.
Worst case: It is the maximum number of steps that can be executed for a given parameter.
Average case: It is the average number of steps executed for a given parameter.
41
Average Case: If we want to search an element 8, whether it is present in the array or not .
first A[1] is compared with 8, no match occurs. Compare A[2],A[3] and A[4] with 8, no match
occurs. Up to now 4 comparisons taken place. Now compare A[5] and 8 so match occurs. The
number of comparisons are 5. It is observed that search takes average number of comparisons.
So it comes under average case. If there are n elements, then we require n/2 comparisons. Time
complexity is O(n/2) which is O(n). we can neglect constant.
Worst Case: If we want to search an element 13, whether it is present in the array or not. First
A[1] is compared with 13. No match occurs. Continue this process until element is found or
the list exhausted. The element is found at 9th comparison. So number of comparisons are 9. It
is observed that search takes maximum number of comparisons. So it comes under worst case.
Time complexity is O(n)
Note: If the element is not found in the list then we have to search entire list, so it comes under
worst case.
Ex 2) Binary Search: The array elements are in ascending/descending order. The target is
compared with middle element. If it is equal, we stop our search otherwise we repeat the same
process either in the first half or second half of the array depending on the value of target.
Best case time complexity: O(1) : If the target found in the first comparison then it is
called best case
Assume that the number of elements is considered as 2 m as every time the list is divided into
two halfs.
n = 2m
log (n) = log(2m )
log (n) = m log(2 )
m = log(n)/log(2)
m = log(2n) i.e. log n base 2
m = log (n)
Maximum number of comparisons will be m in a binary search.
42
1.9 PROBABILISTIC ANALYSIS AND AMORTIZED
ANALYSIS
43
1.10 DISJOINT SETS
SET:
Set is a collection of distinct elements.
For example a set s1 can be represented as S1={1,2,5,10}
Set Representation:
The set will be represented as the tree structure, where all children will store the address
of parent/root node. The root node will store NULL at the place of parent address. In the
given set of elements any element can be selected as root node. Generally, we select first
node as root node.
DISJOINT SETS:
A pair of sets which does not have any common element are called disjoint sets.
For example set A = {2,3} and set B={4,5}. Set A and B are disjoint sets.
Set C={3,4,5} and set D={3,6,7}. Set C and D are not disjoint as both sets C and D are
having 3 as common elements.
44
Each set can be represented as a tree. Notice that for each set we have linked the nodes
from the children to the parent, rather than our usual method of linking from parent to
children.
A B C
These sets can be represented by storing every element of the set in the same array. Since
the set elements are numbered 1 through n, we represent the tree nodes using an array
P[1:n], where n is the maximum number of elements. The ith element of this array
represents the tree node that contains element i. This array element gives the parent
pointer of the corresponding tree node. Notice that the root nodes have parent as -1.
i [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
parent -1 1 1 -1 4 1 4 -1 8 8
UNION Operation:
UNION(i,j) it means the elements of Set i and elements of Set j are combined. If we want to
represent UNION operation in the form of a tree; the UNION(i,j) i is parent ; j is the child.
45
UNION of S1 and S2 𝑆1 U𝑆2
To obtain the union of two sets, all that has to be done is to set the parent field of one of the
roots to the other root. This can be accomplished easily if with each set name, we keep a pointer
to the root of the tree representing the set.
FIND Operation:
FIND(i) implies that it finds the root node of ith node, in other words, it returns the name of the
set.
UNION(1,3)
46
FIND(1) ------------1
FIND(3)-------------1 since its parent is 1 (i.e root node is 1)
Algorithm FIND(i) //find the root for the tree containing elements
{
integer i,j
j = i;
while (parent[j]>0) do
j:=parent[j];
return j;
}
To perform find operation along with the tree structure we need to maintain the name of each
set. So, we require one more data structure to store the set names. The data structure contains
two fields. One is the set name and the other is the pointer to root.
Although the SimpleUnion(i,j) and SimpleFind(i) algorithms are easy to state, their
performance characteristics are not very good. For example, consider the sets
47
Fig.1.16 Number of sets
The sequence of Union operations results the degenerate tree as below. Since the time taken
for union is constant, the n-1 sequence of union can be processed in time O(n).
The weighting rule for union is a technique for joining two trees in a disjoint set data structure.
It involves making the root of smaller tree point to the root of larger tree. This technique is also
known as weighted quick union.
If the number of nodes in a tree i is less than the number in tree j, then make j the parent of i,
otherwise make i the parent of j. Both arguments of UNION must be roots.
To implement the weighting rule, we need to know how many nodes are there in every tree. To
do this easily, we maintain a count field in the root of every tree. If i is a root node, then count[i]
48
equals the number of nodes in that tree. Since all the nodes other than the roots of trees have a
positive number in the P field, we can maintain the count in the P field of roots as a negative
number.
Using this convention, the time required to perform a union has increased somewhat but
still bounded by a constant. i.e O(1).
UNION (1,2) the number of nodes in tree 1 is equal to the number of nodes in tree 2
UNION (1,3) the number of nodes in tree 1 is 2. The number of nodes in tree3 is 1
So 2>1 make parent as 1 and 3 as child
Apply Union for all the n sets he final tree will be as follows
49
Algorithm WeightedUNION(i,j)
{
// UNION sets with roots i and j, i ≠ j, using weighting rule
// P[i] = -count[i] and P[j] = -count[j]
temp:=p[i]+p[j];
if (p[i]>p[j] then
{
// i has fewer nodes
p[i]:=j;
p[j]:=temp;
}
else
{
// j has fewer nodes
p[j]:=i;
p[i]:=temp;
}
}
The collapsing rule refers to a technique used during the FIND operation where, when
traversing the path from a node to its root, every node on that path is directly linked to the root,
effectively collapsing the path and improving the efficiency of future find operations on these
nodes. If a node is on the path to the root, set its parent to the root itself.
If j is a node on the path from i to its root then set Parent(j)←root(i). A more sophisticated
algorithm of FIND using collapsing rule is given below.
Algorithm Find(i)
{
j←i;
while (parent[j]>0)
{
j←parent[j];
50
}
k←i;
while(k≠j)
{
t←parent[k];
parent[k]←j;
k←t;
}
return(j);
}
51
Fig.1.19 Trees achieving worst-case bond
If SimpleFind() is used each Find(8) requires going up three parent link fields upward to reach
the root node. Similarly for 8 FIND operations it will require 8*3=24 moves.
If we use FIND algorithm with collapsing rule, the first FIND(8) requires going up 3 link fields
(moves) and then resetting 3 links. Each of the remaining 7 FINDs requires going up only 1
move (link field), the total cost is now only 13 moves.
A spanning tree for a connected undirected graph G=(V,E) is a sub group of G that is an
undirected tree and contains all the vertices of g. A spanning tree of a graph should include all
the vertices and a subset of edges.
A Spanning tree of a graph is any tree that includes every vertex in the graph.
52
A Spanning tree of a graph G is a sub graph of G that is a tree and contains all the vertices of
G containing no circuit or cycle.
An edge of a spanning tree is called a branch.
An edge in the graph that is not in the spanning tree is called a Chord.
You may notice that all the spanning trees differ from each other significantly, however for
each structure
i) The vertex set is same as that of graph G
ii) The edge set is a subset of G(E) and
iii) There is no cycle.
Such a structure is called spanning tree of graph. Take any vertex V as an initial partial tree
and add edges one by one so that each edge joins a new vertex to the partial tree.
It spans the graph, i.e. it includes every vertex of the graph. It is a minimum cost spanning tree
i.e. the total weights of all the edges is as low as possible.
If a graph consisting of n vertices then the possible spanning trees are nn-2, for above example
n=3, i.e 33-2=3 spanning trees.
53
Fig.1.21 Graph and its Spanning Trees
Example 3: Consider the following graph and some of the tree structures for the graph.
54
A A
B C
D E
G
B C D E F H
G H
Fig.1.22 Graph
A A
B C
B C D E
D
E
F
F
G G H
H
Fig.1.22 Graph and some of the tree structures for the graph.
Frequently we encounter weighted graphs and we need to build a sub-graph that must include
every vertex in the graph. To construct such a graph with least weight or least cost we must not
have cycles in it.
55
Fig.1.23 Graph
Consider the above graph. The MST for this graph could be building a least cost
communication network.
We begin by first selecting an edge with least cost, it can between any
two vertices of graph G. Subsequently from the set of remaining edges,
we can select another least cost edge and so on. Each time an edge is
picked, we determine whether or not the inclusion of this edge into the spanning tree being
constructed creates a cycle. If it does this edge is discarded. If no cycle is created, this edge is
included in the spanning tree being constructed.
The minimum cost is BA.
Now we have
Vertices that may be EDGE COST
attached
BF 8
F
AF 7
G BG 9
C AC 4
D AD 5
56
Vertices that may EDGE COST
be attached
BF 8
F AF 8
DF 10
BG 9
G
CG 10
CE 8
E
DE 9
AF is the minimum cost edge. Therefore, we add it to the partial tree. Now we have
57
Fig.1.24 Minimal Spanning Tree
CONNECTED COMPONENT
A connected component of graph G is said to be a maximally connected induced subgraph.
This means that it is a connected induced subgraph which is not a proper subgraph by itself of
any other connected subgraph of G.
Articulation Point (or Cut Vertex): An articulation point in a connected graph is a vertex
(together with the removal of any incident edges) that, if deleted, would break the graph
into two or more pieces..
58
Fig 1.24 An example graph
Biconnected
Components
59
Biconnected graphs and articulation points are of great interest in the design of network
algorithms, because these are the “critical" points, whose failure will result in the network
becoming disconnected.
Let us consider the typical case of vertex v, where v is not a leaf and v is not the root. Let
w1, w2, . . . . . . . wk be the children of v. For each child there is a subtree of the DFS tree
rooted at this child. If for some child, there is no back edge going to a proper ancestor of
v, then if we remove v, this subtree becomes disconnected from the rest of the graph, and
hence v is an articulation point.
On the other hand, if every one of the subtree rooted at the children of v have back edges
to proper ancestors of v, then if v is removed, the graph remains connected (the back edges
hold everything together). This leads to the following:
Observation 1: An internal vertex v of the DFS tree (other than the root) is an articulation
point if and only if there is a subtree rooted at a child of v such that there is no back edge
from any vertex in this subtree to a proper ancestor of v.
Observation 2: A leaf of the DFS tree is never an articulation point, since a leaf will not
have any subtrees in the DFS tree.
Thus, after deletion of a leaf from a tree, the rest of the tree remains connected, thus even
ignoring the back edges, the graph is connected after the deletion of a leaf from the DFS tree.
Observation 3: The root of the DFS is an articulation point if and only if it has two or more
children. If the root has only a single child, then (as in the case of leaves) its removal does not
disconnect the DFS tree, and hence cannot disconnect the graph in general.
60
Determining the articulation turns out to be a simple extension of depth first search.
Consider a depth first spanning tree for this graph.
Fig.1.27 Graph
Deleting node E does not disconnect the graph because G and D both have dotted links
(back edges) that point above E, giving alternate paths from them to F. On the other hand,
deleting G does disconnect the graph because there are no such alternate paths from L or
H to E (G’s parent).
A vertex ‘x’ is not an articulation point if every child ‘y’ has some node lower in the tree
connect (via a dotted link) to a node higher in the tree than ‘x’, thus providing an alternate
connection from ‘x’ to ‘y’. This rule will not work for the root node since there are no
nodes higher in the tree. The root is an articulation point if it has two or more children.
61
Fig.1.28 Graph with DFS
By using the above observations, the articulation points of this graph are:
This observation leads to a simple rule to identify articulation points. For each is
define L (u) as follows:
L (u) = min {DFN (u), min {L (w) w is a child of u}, min {DFN (w) (u,
w) is a back edge}}.
L (u) is the lowest depth first number that can be reached from ‘u’ using a path of
descendants followed by at most one back edge. It follows that, If ‘u’ is not the root then
‘u’ is an articulation point iff ‘u’ has a child ‘w’ such that:
// u is a start vertex for depth first search. V is its parent if any in the depth first
// spanning tree. It is assumed that the global array dfn is initialized to zero and that // the
global variable num is initialized to 1. n is the number of vertices in G.
{
dfn [u] := num; L [u] := num; num := num +
1; for each vertex w adjacent from u do
{
if (dfn [w] = 0) then
{
62
Art (w, u); // w is
unvisited. L [u] := min (L [u], L [w]);
}
else if (w ≠ v) then L [u] := min (L [u], dfn [w]);
}
}
// u is a start vertex for depth first search. V is its parent if any in the depth first
// spanning tree. It is assumed that the global array dfn is initially zero and that the
// global variable num is initialized to 1. n is the number of vertices in G.
{
dfn [u] := num; L [u] := num; num := num +
1; for each vertex w adjacent from u do
{
if ((v ≠ w) and (dfn [w] < dfn [u])) then
add (u, w) to the top of a stack s;
1.12.4 Problems
Example 1: For the following graph identify the articulation points and Biconnected
components:
L (u) = min {DFN (u), min {L (w) / w is a child of u}, min {DFN (w) / w is a vertex to
which there is back edge from u}}
L (1) = min {DFN (1), min {L (2)}} = min {1, L (2)} = min {1, 2} = 1
L (2) = min {DFN (2), min {L (3)}} = min {2, L (3)} = min {2, 3} = 2
L (3) = min {DFN (3), min {L (4), L (5), L (6)}} = min {3, min {6, 4, 5}} = 3
L (4) = min {DFN (4), min {L (7)} = min {6, L (7)} = min {6, 6} = 6
64
Therefore, L (1: 8) = {1, 2, 3, 6, 4, 5, 6, 6}
Check for the condition if L (w) > DFN (u) is true, where w is any child of u.
It is a root node. Root is an articulation point if it has two or more child nodes.
Example 2: For the following graph identify the articulation points and Biconnected
components:
65
1 1
3
3 6
2 2
4
4 5 5 6
7
7
8
8
L (u) = min {DFN (u), min {L (w) / w is a child of u}, min {DFN (w) / w is a vertex to
which there is back edge from u}}
L (1) = min {DFN (1), min {L (2)}} = min {1, L (2)} = min {1, 2} = 1
L (2) = min {DFN (2), min {L (3)}} = min {2, L (3)} = min {2, 3} = 2
L (3) = min {DFN (3), min {L (4), L (5), L (6)}} = min {3, min {6, 4, 5}} = 3
L (4) = min {DFN (4), min {L (7)} = min {6, L (7)} = min {6, 6} = 6
66
Vertex 1: Vertex 1 is not an articulation point.
It is a root node. Root is an articulation point if it has two or more child nodes.
Example 3: For the following graph identify the articulation points and Biconnected
components:
L (u) = min {DFN (u), min {L (w) w is a child of u}, min {DFN (w) w is a vertex
to which there is back edge from u}}
67
= min {1, L (2)} = 1
L (2) = min {DFN (2), min {L (3)}} = min {2, L (3)} = min{2, 1}= 11
L (3) = min {DFN (3), min {L (4)}} = min {3, L (4)} = min {3, L (4)}
= min {3, 1} = 1
L (4) = min {DFN (4), min {L (5), L (7)}, min {DFN (1)}}
= min {4, min {L (5), L (7)}, 1} = min {4, min {1, 3}, 1}
= min {4, 1, 1} = 1
L (5) = min {DFN (5), min {L (6)}, min {DFN (1)}} = min {5, L (6), 1}
= min {5, 4, 1} = 1
L (6) = min {DFN (6), min {L (8)}, min {DFN (4)}} = min(6, L (8), 4}
= min(6, 4, 4} = 4
nodes.
69