Chapter - One
Introduction to Data
Structure and Algorithm
Introduction to DS and Algorithm Analysis
A program
A set of instruction which is written in
order
to solve a problem.
A solution to a problem actually consists of
two things:
A way to organize the data
Sequence of steps to solve the problem
Introduction....(continued)
Theway data are organized in a computers
memory is said to be data structure and
the sequence of computational steps to solve
a problem is said to be an algorithm.
Therefore,
a program is Data structures
plus Algorithm.
3
Introduction to Data Structures
Data structures are used to model the world or
part of the world. How?
1. The value held by a data structure represents some
specific characteristic of the world.
2. The characteristic being modeled restricts the possible
values held by a data structure and the operations to
be performed on the data structure
The first step to solve the problem is
obtaining ones own abstract view, or
model, of the problem.
This process of modeling is called
abstraction.
Introduction....(continued)
Abstraction
The model defines an abstract view to the
problem.
The model should only focus on problem
related stuff 5
Abstraction
Abstraction is a process of classifying characteristics
as relevant and irrelevant for the particular
purpose at hand and ignoring the irrelevant ones.
Example: model students of IU.
Relevant:
Char Name[15];
Char ID[11];
Char Dept[20];
int Age, year;
Non relevant
float hieght, weight; 6
Abstraction....(continued)
Using the model, a programmer tries to
define the properties of the problem.
These properties include
The data which are affected and
The operations that are involved in the problem
An entity with the properties just described 7
is called an abstract data type (ADT).
Abstract Data Types
Consists of data to be stored and operations
supported on them.
Is a specification that describes a data set and
the operation on that data.
The ADT specifies:
What data is stored.
What operations can be done on the data.
Does not specify how to store or how to
implement the operation.
Is independent of any programming language 8
ADT....(continued)
Example: ADT employees of an organization:
This ADT stores employees with their relevant
attributes and discarding irrelevant attributes.
Relevant:- Name, ID, Sex, Age, Salary, Dept,
Address
Non Relevant :- weight, color, height
This ADT supports hiring, firing, retiring,
…
operations. 9
Data Structure
In Contrast a data structure is a
language construct that the programmer
has defined in order to implement an
abstract data type.
Whatis the purpose of data structures in
programs?
Data structures are used to model a problem. 10
Data Structure
Example:
struct Student_Record
{
char name[20];
char ID_NO[10];
char Department[10];
int age;
};
Attributes of each variable:
Name: Textual label.
Address: Location in memory.
Scope: Visibility in statements of a program.
Type: Set of values that can be stored + set of operations that can be performed.
Size: The amount of storage required to represent the variable.
Life time: The time interval during execution of a program while the variable exists.
Algorithm
Isa brief specification of an operation for
solving a problem.
isa well-defined computational procedure that
takes some value or a set of values as input and
produces some value or a set of values as
output.
Inputs Algorithm
Outputs
An algorithm is a specification of a behavioral
process. It consists of a finite set of instructions
12
that govern behavior step-by-step.
Algorithm
Data structures model the static part of the
world. They are unchanging while the world
is changing.
Inorder to model the dynamic part of the
world we need to work with algorithms.
Algorithmsare the dynamic part of a
program’s world model.
13
Algorithm
An algorithm transforms data structures
from one state to another state.
What is the purpose of algorithms in programs?
Take values as input. Example: cin>>age;
Change the values held by data structures. Example:
age=age+1;
Change the organization of the data structure:
Example:
Sort students by name
Produce outputs:
Example: Display student’s information
14
Algorithm
The quality of a data structure is related to its
ability to successfully model the characteristics
of the world (problem).
Similarly, the quality of an algorithm is related
to its ability to successfully simulate the
changes in the world.
15
Algorithm
However, the quality of data structure and
algorithms is determined by their ability to work
together well.
Generallyspeaking, correct data structures lead
to simple and efficient algorithms.
And correct algorithms lead to accurate and
efficient data structures.
16
Properties of Algorithms
Finiteness:
Algorithm must complete after a finite
number of steps.
Algorithm should have a finite number of
steps.
Finite int i=0; Infinite while(true){
while(i>10){ cout<<“Hello”;
cout<< i; }
i++;
17
}
Definiteness (Absence of ambiguity):
Each step must be clearly defined, having
one and only one interpretation.
At each point in computation, one should be
able to tell exactly what happens next.
18
Sequential:
Each step must have a uniquely defined
preceding and succeeding step.
The first step (start step) and last step (halt
step) must be clearly noted.
19
Feasibility:
It must be possible to perform each
instruction.
Each instruction should have possibility to
be executed.
1) for(int i=0; i<0; i++){
cout<< i; // there is no possibility
} that this statement to
be executed.
2) if(5>7) {
cout<<“hello”; // not executed.
}
20
Correctness
It must compute correct answer for all
possible legal inputs.
The output should be as expected and
required and correct.
Language Independence:
It must not depend on any one programming
language.
Completeness:
It must solve the problem completely.
21
Effectiveness:
Doing the right thing. It should yield the
correct result all the time for all of the
Efficiency:
possible cases.
It must solve with the least amount of
computational resources such as time and
space.
Producing an output as per the requirement
22
E x a mp le:
Write a program that takes a number and
displays the square of the number.
1) int x;
cin>>x;
cout<<x*x;
2) int x,y;
cin>>x;
y=x*x;
cout<<y;
23
E x a mp le:
Write a program that takes two numbers and
displays the sum of the two.
Program a Program b Program c
cin>>a; cin>>a;
cin>>a;
cin>>b; cin>>b; cin>>b;
sum = a+b; a = a+b; cout<<a+b;
cout<<sum; cout<<a;
Which one is most efficient and which are effective?
Program c the most efficient
All are effective but with different efficiencies.
24
Input/output:
There must be a specified number of input
values, and one or more result values.
Zero or more inputs and one or more
outputs.
Simplicity:
A good general rule is that each step should carry
out one logical step.
What is simple to one processor may not be
simple to another.
25
Algorithm Analysis
What is an algorithm Analysis?
Algorithm analysis refers to the process of
determining how much computing time
and storage that algorithms will require.
In other words, it’s a process of redacting
the resource requirement of algorithms
in a given environment. 26
What is an algorithm Analysis? ………..
In order to solve a problem, there are many
possible algorithms.
One has to be able to choose the best algorithm
for the problem at hand using some scientific
method.
To classify some data structures and algorithms
as good:
we need precise ways of analyzing them in
terms of resource requirement.
27
The main resources are:
• Running Time
• Memory Usage
• Communication Bandwidth
• Computer hardware
Note: Running time is the most important since
computational time is the most precious resource in
most problem domains. 28
There are two approaches to measure the efficiency of algorithms:
1. Empirical
Also known as a posterior approach.
Based on the total running time of the
program.
Uses actual system clock time.
Example:
t1
for(int i=0; i<=10; i++)
cout<<i;
t2
Running time taken by the above algorithm is 29
(TotalTime) = t2-t1;
It is difficult to determine efficiency of algorithms using
this approach,
Because clock-time can vary based on many factors.
Example:
a) Processor speed of the computer
b) Current processor load
c) Specific data for a particular run of the program
d) Operating System
e) Programming language
f) The programmer
g) Operating environment/platform (PC, sun, smartphone
etc) 30
h) Therefore, it is quite machine dependent
2. Theoretical
Also known as the priori approach
Determining the quantity of resources
required using mathematical concept.
Analyze an algorithm according to the
number of basic operations (time units)
required, rather than according to an
absolute amount of time involved. 31
We use theoretical approach to determine
the efficiency of algorithm because:
The number of operation will not vary under
different conditions.
It helps us to have a meaningful measure that
permits comparison of algorithms independent
of operating platform.
32
It helps to determine the complexity of
Complexity Analysis
Complexity Analysis is the systematic study
of the cost of computation, measured either
in:
Time units
Operations performed, or
The amount of storage space required.
33
Complexity Analysis
Cont.….
Two important ways to characterize the effectiveness of
an algorithm are its Space Complexity and Time
Complexity.
Time Complexity: Determine the approximate amount
of time (number of operations) required to solve a
problem of size n.
The limiting behavior of time complexity as size
increases is called the Asymptotic Time Complexity.
Space Complexity: Determine the approximate
memory required to solve a problem of size n.
The limiting behavior of space complexity as size
increases is called the Asymptotic Space 34
Complexity.
Asymptotic Complexity of an algorithm determines the
size of problems that can be solved by the algorithm.
Factors affecting the running time of a program:
CPU type (80286, 80386, 80486, Pentium I---IV)
Memory used
Computer used
Programming Language
C (fastest), C++ (faster), Java (fast)
C is relatively faster than Java, because C is relatively nearer to Machine
language, so, Java takes relatively larger amount of time for
interpreting/translation to machine code.
- Algorithm used
- Input size 35
Note: Important factors for this course are Input size and Algorithm
Complexity Analysis Cont.…
Complexity analysis involves two distinct phases:
• Algorithm Analysis: Analysis of the algorithm or
data structure to produce a function T(n) that
describes the algorithm in terms of the operations
performed in order to measure the complexity of
the algorithm.
Example: Suppose we have hardware capable of
executing 106 instructions per second. How long would it
take to execute an algorithm whose complexity function is
T(n)=2n2 on an input size of n=108?
Solution: T(n)= 2n2=2(108)2 = 2*1016
Running time=T(108)/106=2*1016/106=2*1010
seconds.
• Order of Magnitude Analysis: Analysis of the 36
function T (n) to determine the general complexity
There is no generally accepted set of rules for
algorithm analysis.
However, an exact count of operations is
commonly used.
To count the number of operations we can use
the following Analysis Rule.
Analysis Rules:
1. Assume an arbitrary time unit.
2. Execution of one of the following operations
takes time 1 unit:
Assignment Operation
Example: i=0;
Single Input/Output Operation
37
Example: cin>>a;
Single Boolean Operations
Example: i>=10
Single Arithmetic Operations (Addition, Subtraction,
Multiplication)
Example: a+b;
Function Return
Example: return sum;
3. Running time of a selection statement (if,
switch) is the time for the condition evaluation
plus the maximum of the running times for the
individual clauses in the selection.
38
Example: int x;
int sum=0;
if(a>b)
{
sum= a+b;
cout<<sum;
}
else
{
cout<<b;
}
T(n) = 1+1+max(3,1)
39
=5
4. Loop statements:
The running time for the statements inside the
loop * number of iterations + time for setup(1) +
time for checking (number of iteration + 1) + time
for update (number of iteration)
The total running time of statements inside a
group of nested loops is the running time of the
statements * the product of the sizes of all the
loops.
For nested loops, analyze inside out.
Always assume that the loop executes the
maximum number of iterations possible. (Why?)
Because we are interested in the worst case
complexity.
40
5. Function call:
• 1 for setup + the time for any
parameter calculations + the time
required for the execution of the
function body.
Examples:
1)
int k=0,n;
cout<<“Enter an integer”;
cin>>n
for(int i=0;i<n; i++)
k++;
T(n)= 3+1+n+1+n+n=3n+5
41
2)
int k=0;
for(int i=1 ; i<=n; i++)
for( int j=1; j<=n; j++)
k++;
T(n)=1+1+(n+1)+n+n(1+(n+1)+n+n)
= 2n+3+n(3n+2)
= 2n+3+3n2+2n
= 3n2+4n+3
42
3) int i=0;
while(i<n)
{
cout<<i;
i++;
}
int j=1;
while(j<=10)
{
cout<<j;
j++;
}
T(n)=1+n+1+n+n+1+11+2(10)
= 3n+34
43
4). int sum=0;
for(i=1;i<=n;i++))
sum=sum+i;
T(n)=1+1+(n+1)+n+(1+1)n
=3+4n
5). int counter(){
int a=0;
cout<<”Enter a number”;
cin>>n;
for(i=0;i<n;i++)
a=a+1;
return 0; }
T(n)=1+1+1+(1+n+1+n)+2n+1
44
=4n+6
6). int sum(int n){
int s=0;
for(int i=1;i<=n;i++)
s=s+(i*i*i*i);
return s;
}
T(n)=1+(1+n+1+n+5n)+1
=7n+4
7). int sum=0;
for(i=0;i<n;i++)
for(j=0;j<n;j++)
sum++;
T(n)=1+1+(n+1)+n+n*(1+(n+1)+n+n)
=3+2n+n2+2n+2n2
=3+2n+3n2+2n
=3n2+4n+3 45
Categories of Algorithm
Analysis
Algorithms may be examined under different
situations to correctly determine their
efficiency for accurate comparison.
Best Case − Minimum time required for
program execution.
Average Case − Average time required for
program execution.
Worst Case − Maximum time required for
program execution.
Order of Magnitude
Refers to the rate at which the storage or
time grows as a function of problem size.
It is expressed in terms of its relationship
to some known functions.
This type of analysis is called Asymptotic
analysis.
Asymptotic
Notations
▶ Asymptotic Analysis is concerned with how the
running time of an algorithm increases with the
size of the input in the limit, as the size of the
input increases without bound!
▶ Asymptotic Analysis makes use of
O (Big-Oh) ,
(Big-Omega),
(Theta), notations in performance analysis
and characterizing the complexity of an
15
algorithm.
Types of Asymptotic
Notations
1. Big-Oh Notation
Definition: We say f(n)=O(g(n)), if there are
positive constants no and c, such that to the
right of no, the value of f(n) always lies on or
below c.g(n).
As n increases f(n) grows no faster than g(n).
It’s only concerned with what happens for
very large values of n.
Describes the worst case analysis.
▶ O-Notations are used to represent
the amount of time an algorithm
takes on the worst possible set of
inputs, “Worst-Case”.
Question-
1f(n)=10n+5 and
g(n)=n.
Show that f(n) is O(g(n)) ?
Solution:- To show that f(n) is O(g(n)), we
must show that there exist constants c
and k such that f(n)<=c.g(n) for all
n>=k.
10n+5<=c.n for all n>=k
let c=15, then show that 10n+5<=15n
5<=5n or 1<=n
(c=15,
So, k=1), there exist two constants
f(n)=10n+5<=15.g(n) for all that
n>=1
18
satisfy the above constraints.
Question-
2
f(n)=3n2+4n+1
Show that f(n)=O(n2) ?
Solution:
4n<=4n2 for all n>=1 and
1<=n2 for all n>=1
3n2+4n+1<=3n2+4n2+n2 for all for all n>=1
n>=1<=8n2
So, we have shown that f(n)<=8n2 for all
n>=1. Therefore, f(n) is O(n2), (c=8, k=1),
there exist two constants that satisfy the
constraints.
2. Big-Omega ()-Notation (Lower bound)
Definition: We write f(n)= (g(n)) if there are positive
constants no and c such that to the right of no the value of
f(n) always lies on or above c.g(n).
As n increases f(n) grows no slower than g(n).
Describes the best case analysis.
Used to represent the amount of time the algorithm takes on
the smallest possible set of inputs-“Best case”.
Example:
Find g(n) such that f(n) = (g(n)) for f(n)=n2
g(n) = n, c=1, k=1.
f(n)=n2=(n)
Big-Omega ()-Notation (Lower bound)
3. Theta Notation (-Notation) (Optimal
bound)
Definition: We say f(n)= (g(n)) if there exist positive
constants no, c1 and c2 such that to the right of no,
the value of f(n) always lies between c1.g(n) and
c2.g(n) inclusive, i.e., c2.g(n)<=f(n)<=c1.g(n), for
all n>=no.
▶ As n increases f(n) grows as fast as g(n).
▶ Describes the average case analysis.
▶ To represent the amount of time the algorithm takes on
an average set of inputs- “Average case”.
Example: Find g(n) such that f(n) = Θ(g(n))
for f(n)=2n2+3
n2 ≤ 2n2 ≤ 3n2 c1=1, c2=3
and no=1
Theta Notation (-Notation) (Optimal bound)
▶ The order of the body statements of a given
algorithm is very important in determining
Big- Oh of the algorithm.
Example: Find Big-Oh of the following
algorithm.
1. for( int i=1;i<=n; i++)
sum=sum + i;
T(n)=2*n=2n=O(n).
2. for(int i=1; i<=n; i++)
for(int j=1; j<=n; j++)
k++;
T(n)=1*n*n=n2 =
31
O(n2).
Reading Assignments
Formal Approach to Analysis