DSA Notes
Module 1
DATA STRUCTURE —:
Data structure is a very important part of Data management.
“A data structure is basically a group of data elements that are
put together under one name and which defines a particular way
of storing and organizing data in a computer so that it can be
used efficiently.”
Data structures are almost used in every program or software
system. Some common example of data structures are ARRAY,
LINKED LIST, STACK, QUEUE,TREE, GRAPHS etc.
Data Structures are widely used in given bellow areas.
COMPILER DESIGN
OPERATING SYSTEM
DBMS
ARTIFICIAL INTELLIGENCE
GRAPHICS
SIMULATION
NUMERICAL ANALYSIS etc.
Basic Terminology
Data structures are the building blocks of any program or the software. Choosing
the appropriate data structure for a program is the most difficult task for a
programmer. Following terminology is used as far as data structures are concerned.
Data: Data can be defined as an elementary value or the collection of values, for
example, student's name and its id are the data about the student.
Group Items: Data items which have subordinate data items are called Group
item, for example, name of a student can have first name and the last name.
Record: Record can be defined as the collection of various data items, for
example, if we talk about the student entity, then its name, address, course and
marks can be grouped together to form the record for the student.
File: A File is a collection of various records of one type of entity, for example, if
there are 60 employees in the class, then there will be 20 records in the related file
where each record contains the data about each employee.
Attribute and Entity: An entity represents the class of certain objects. it contains
various attributes. Each attribute represents the particular property of that entity.
Field: Field is a single elementary unit of information representing the attribute of
an entity.
Need of Data Structures
As applications are getting complexed and amount of data is increasing day by
day, there may arrise the following problems:
Processor speed: To handle very large amout of data, high speed processing is
required, but as the data is growing day by day to the billions of files per entity,
processor may fail to deal with that much amount of data.
Data Search: Consider an inventory size of 106 items in a store, If our application
needs to search for a particular item, it needs to traverse 106 items every time,
results in slowing down the search process.
Multiple requests: If thousands of users are searching the data simultaneously on
a web server, then there are the chances that a very large server can be failed
during that process
in order to solve the above problems, data structures are used. Data is organized to
form a data structure in such a way that all items are not required to be searched
and required data can be searched instantly.
Advantages of Data Structures
Efficiency: Efficiency of a program depends upon the choice of data structures.
For example: suppose, we have some data and we need to perform the search for a
particular record. In that case, if we organize our data in an array, we will have to
search sequentially element by element. hence, using array may not be very
efficient here. There are better data structures which can make the search process
efficient like ordered array, binary search tree or hash tables.
Reusability: Data structures are reusable, i.e. once we have implemented a
particular data structure, we can use it at any other place. Implementation of data
structures can be compiled into libraries which can be used by different clients.
Abstraction: Data structure is specified by the ADT which provides a level of
abstraction. The client program uses the data structure through interface only,
without getting into the implementation details.
CLASSIFICATION OF DATA STRUCTURES —:
PRIMITIVE DATA TYPE –:
Primitive data structures are the fundamental data types which are
supported by a programming language. some basic data types are
integer , character , Boolean pointer , float , double , real.
NON PRIMITIVE DATA STRUCTURES —:
Non primitive data structures are that data structures which are
created using primitive data structures as—: array , linked list ,
tree etc
Linear Data Structures: A data structure is called linear if all of its elements are
arranged in the linear order. In linear data structures, the elements are stored in
non-hierarchical way where each element has the successors and predecessors
except the first and last element.
Types of Linear Data Structures are given below:
Arrays: An array is a collection of similar type of data items and each data item is
called an element of the array. The data type of the element may be any valid data
type like char, int, float or double.
The elements of array share the same variable name but each one carries a different
index number known as subscript. The array can be one dimensional, two
dimensional or multidimensional.
The individual elements of the array age are:
age[0], age[1], age[2], age[3],......... age[98], age[99].
Linked List: Linked list is a linear data structure which is used to maintain a list in
the memory. It can be seen as the collection of nodes stored at non-contiguous
memory locations. Each node of the list contains a pointer to its adjacent node.
Stack: Stack is a linear list in which insertion and deletions are allowed only at one
end, called top. A stack is an abstract data type (ADT), can be implemented in
most of the programming languages. It is named as stack because it behaves like a
real-world stack, for example: - piles of plates or deck of cards etc.
Queue: Queue is a linear list in which elements can be inserted only at one end
called rear and deleted only at the other end called front.
It is an abstract data structure, similar to stack. Queue is opened at both end
therefore it follows First-In-First-Out (FIFO) methodology for storing the data
items.
Non Linear Data Structures: This data structure does not form a sequence i.e.
each item or element is connected with two or more other items in a non-linear
arrangement. The data elements are not arranged in sequential structure.
Types of Non Linear Data Structures are given below:
Trees: Trees are multilevel data structures with a hierarchical relationship among
its elements known as nodes. The bottommost nodes in the herierchy are
called leaf node while the topmost node is called root node. Each node contains
pointers to point adjacent nodes.
Tree data structure is based on the parent-child relationship among the nodes. Each
node in the tree can have more than one children except the leaf nodes whereas
each node can have atmost one parent except the root node. Trees can be classfied
into many categories which will be discussed later in this tutorial.
Graphs: Graphs can be defined as the pictorial representation of the set of
elements (represented by vertices) connected by the links known as edges. A graph
is different from tree in the sense that a graph can have cycle while the tree can not
have the one.
Operations on data structure
1) Traversing: Every data structure contains the set of data elements. Traversing
the data structure means visiting each element of the data structure in order to
perform some specific operation like searching or sorting.
Example: If we need to calculate the average of the marks obtained by a student in
6 different subject, we need to traverse the complete array of marks and calculate
the total sum, then we will devide that sum by the number of subjects i.e. 6, in
order to find the average.
2) Insertion: Insertion can be defined as the process of adding the elements to the
data structure at any location.
If the size of data structure is n then we can only insert n-1 data elements into it.
3) Deletion:The process of removing an element from the data structure is called
Deletion. We can delete an element from the data structure at any random location.
If we try to delete an element from an empty data structure then underflow occurs.
4) Searching: The process of finding the location of an element within the data
structure is called Searching. There are two algorithms to perform searching,
Linear Search and Binary Search. We will discuss each one of them later in this
tutorial.
5) Sorting: The process of arranging the data structure in a specific order is known
as Sorting. There are many algorithms that can be used to perform sorting, for
example, insertion sort, selection sort, bubble sort, etc.
6) Merging: When two lists List A and List B of size M and N respectively, of
similar type of elements, clubbed or joined to produce the third list, List C of size
(M+N), then this process is called merging
Analysis of Algorithm
In theoretical analysis of algorithms, it is common to estimate their complexity in
the asymptotic sense, i.e., to estimate the complexity function for arbitrarily large
input. The term "analysis of algorithms" was coined by Donald Knuth.
Algorithm analysis is an important part of computational complexity theory, which
provides theoretical estimation for the required resources of an algorithm to solve a
specific computational problem. Most algorithms are designed to work with inputs
of arbitrary length. Analysis of algorithms is the determination of the amount of
time and space resources required to execute it.
Usually, the efficiency or running time of an algorithm is stated as a function
relating the input length to the number of steps, known as time complexity, or
volume of memory, known as space complexity.
The Need for Analysis
By considering an algorithm for a specific problem, we can begin to develop
pattern recognition so that similar types of problems can be solved by the help of
this algorithm.
Algorithms are often quite different from one another, though the objective of
these algorithms are the same. For example, we know that a set of numbers can be
sorted using different algorithms. Number of comparisons performed by one
algorithm may vary with others for the same input. Hence, time complexity of
those algorithms may differ. At the same time, we need to calculate the memory
space required by each algorithm.
Analysis of algorithm is the process of analyzing the problem-solving capability of
the algorithm in terms of the time and size required (the size of memory for storage
while implementation). However, the main concern of analysis of algorithms is the
required time or performance. Generally, we perform the following types of
analysis −
Worst-case − The maximum number of steps taken on any instance of
size a.
Best-case − The minimum number of steps taken on any instance of size a.
Average case − An average number of steps taken on any instance of size a.
Amortized − A sequence of operations applied to the input of
size a averaged over time.
To solve a problem, we need to consider time as well as space complexity as the
program may run on a system where memory is limited but adequate space is
available or may be vice-versa. In this context, if we compare bubble
sort and merge sort. Bubble sort does not require additional memory, but merge
sort requires additional space. Though time complexity of bubble sort is higher
compared to merge sort, we may need to apply bubble sort if the program needs to
run in an environment, where memory is very limited.
ASYMPTOTIC NOTATIONS
Asymptotic analysis of an algorithm refers to defining the mathematical
boundation/framing of its run-time performance. Using asymptotic analysis, we
can very well conclude the best case, average case, and worst case scenario of an
algorithm. Asymptotic analysis is input bound i.e., if there's no input to the
algorithm, it is concluded to work in a constant time. Other than the "input" all
other factors are considered constant. Asymptotic analysis refers to computing the
running time of any operation in mathematical units of computation. For example,
the running time of one operation is computed as f(n) and may be for another
operation it is computed as g(n2 ). This means the first operation running time will
increase linearly with the increase in n and the running time of the second
operation will increase exponentially when n increases. Similarly, the running time
of both operations will be nearly the same if n is significantly small.
The time required by an algorithm falls under three types −
Best Case − Minimum time required for program execution.
Average Case − Average time required for program execution.
Worst Case − Maximum time required for program execution.
Asymptotic Notations
Following are the commonly used asymptotic notations to calculate the running
time complexity of an algorithm.
Ο Notation
Ω Notation
θ Notation.
Big Oh Notation, Ο : The notation Ο(n) is the formal way to express the upper
bound of an algorithm's running time. It measures the worst case time complexity
or the longest amount of time an algorithm can possibly take to complete.
For example, for a function f(n) Ο(f(n)) = { g(n) : there exists c > 0 and n0 such that f(n) ≤ c.g(n) for
all n > n0. }
Omega Notation, Ω : The notation Ω(n) is the formal way to express the
lower bound of an algorithm's running time. It measures the best case time
complexity or the best amount of time an algorithm can possibly take to
complete.
For example, for a function f(n) Ω(f(n)) ≥ { g(n) : there exists c > 0 and n0 such that g(n) ≤ c.f(n) for
all n > n0. }
Theta Notation, θ : The notation θ(n) is the formal way to express both the
lower bound and the upper bound of an algorithm's running time. It is
represented as follows –
Algortihm
An algorithm is a Step By Step process to solve a problem, where each step
indicates an intermediate task. Algorithm contains finite number of steps that
leads to the solution of the problem.
Properties /Characteristics of an Algorithm:-
Algorithm has the following basic properties
Input-Output:- Algorithm takes ‘0’ or more input and produces the
required [Link] is the basic characteristic of an algorithm.
Finiteness:- An algorithm must terminate in countable number of
steps.
Definiteness: Each step of an algorithm must be stated clearly and
unambiguously.
Effectiveness: Each and every step in an algorithm can be converted in
to programming language statement.
Generality: Algorithm is generalized one. It works on all set of inputs
and provides the required output. In other words it is not restricted to a
single input value.
Dataflow of an Algorithm
o Problem: A problem can be a real-world problem or any instance from the
real-world problem for which we need to create a program or the set of
instructions. The set of instructions is known as an algorithm.
o Algorithm: An algorithm will be designed for a problem which is a step by
step procedure.
o Input: After designing an algorithm, the required and the desired inputs are
provided to the algorithm.
o Processing unit: The input will be given to the processing unit, and the
processing unit will produce the desired output.
o Output: The output is the outcome or the result of the program.
Why do we need Algorithms?
We need algorithms because of the following reasons:
o Scalability: It helps us to understand the scalability. When we have a big
real-world problem, we need to scale it down into small-small steps to easily
analyze the problem.
o Performance: The real-world is not easily broken down into smaller steps.
If the problem can be easily broken into smaller steps means that the
problem is feasible.
Algorithm Complexity
The performance of the algorithm can be measured in two factors:
o Time complexity: The time complexity of an algorithm is the amount of
time required to complete the execution. The time complexity of an
algorithm is denoted by the big O notation. Here, big O notation is the
asymptotic notation to represent the time complexity. The time complexity
is mainly calculated by counting the number of steps to finish the execution.
o Let's understand the time complexity through an example.
1. sum=0;
2. // Suppose we have to calculate the sum of n numbers.
3. for i=1 to n
4. sum=sum+i;
5. // when the loop ends then sum holds the sum of the n numbers
6. return sum;
In the above code, the time complexity of the loop statement will be atleast n, and
if the value of n increases, then the time complexity also increases. While the
complexity of the code, i.e., return sum will be constant as its value is not
dependent on the value of n and will provide the result in one step only. We
generally consider the worst-time complexity as it is the maximum time taken for
any given input size.
o Space complexity: An algorithm's space complexity is the amount of space
required to solve a problem and produce an output. Similar to the time
complexity, space complexity is also expressed in big O notation.
For an algorithm, the space is required for the following purposes:
1. To store program instructions
2. To store constant values
3. To store variable values
4. To track the function calls, jumping statements, etc.
Auxiliary space: The extra space required by the algorithm, excluding the input
size, is known as an auxiliary space. The space complexity considers both the
spaces, i.e., auxiliary space, and space used by the input.
So,
Space complexity = Auxiliary space + Input size.
LINEAR SEARCH
The Linear search or Sequential Search is most simple searching method. It does
not expect the list to be sorted. The Key which to be searched is compared with
each element of the list one by one. If a match exists, the search is terminated. If
the end of the list is reached, it means that the search has failed and the Key has no
matching element in the list.
Ex: consider the following Array A
23 15 18 17 42 96 103
Now let us search for 17 by Linear search.
The searching starts from the first position. Since A[0] ≠17. The search proceeds to
the next position i.e; second position A[1] ≠17. The above process continuous
until the search element is found such as A[3]=17. Here the searching element is
found in the position 4.
Algorithm:
LINEAR(DATA, N,ITEM, LOC)
Here DATA is a linear Array with N elements. And ITEM is a given item of
information.
This algorithm finds the location LOC of an ITEM in DATA. LOC=-1 if the search
is unsuccessful.
Step 1: Set DATA[N+1]=ITEM
Step 2: Set LOC=1
Step 3: Repeat while (DATA [LOC] != ITEM) Set LOC=LOC+1
Step 4: if LOC=N+1 then Set LOC= -1.
Step 5: Exit
Advantages:
It is simplest known technique.
The elements in the list can be in any order.
Disadvantages:
This method is in efficient when large numbers of elements are present in
list because time taken for searching is more.
Complexity of Linear Search:
The worst and average case complexity of Linear search is O(n), where ‘n’ is the
total number of elements present in the list.
BINARY SEARCH
Suppose DATA is an array which is stored in increasing order then there is an
extremely efficient searching algorithm called “Binary Search”. Binary Search can
be used to find the location of the given ITEM of information in DATA.
Working of Binary Search Algorithm:
During each stage of algorithm search for ITEM is reduced to a segment of
elements of DATA[BEG], DATA[BEG+1], DATA[BEG+2],……DATA[END].
Here BEG and END denotes beginning and ending locations of the segment under
considerations. The algorithm compares ITEM with middle element DATA[MID]
of a segment,
where MID=[BEG+END]/2.
If DATA[MID]=ITEM then the search is successful. and we said that LOC=MID.
Otherwise a new segment of data is obtained as follows:
ii. If ITEM>DATA[MID] then ITEM can appear only in right half of the segment
i.e. DATA[MID+1], DATA[MID+2],……………………DATA[END]. So we
reset BEG=MID+1. And begin the search again.
Initially we begin with the entire array DATA i.e. we begin with BEG=1 and
END=n Or BEG=lb(Lower Bound) END=ub(Upper Bound) If ITEM is not in
DATA then eventually we obtained END.
Ex: consider a list of sorted elements stored in an Array A is
ALGORITHM:
BINARY SEARCH[A,N,KEY]
Step 1: begin
Step 2: [Initilization] Lb=1; ub=n;
Step 3: [Search for the ITEM] Repeat through step 4,while Lower bound is less
than Upper Bound.
Step 4: [Obtain the index of middle value] MID=(lb+ub)/2
Step 5: [Compare to search for ITEM] If KeyA[MID] then Lb=MID+1 Otherwise
write “Match Found” Return Middle.
Step 6: [Unsuccessful Search] write “Match Not Found”
Step 7: Stop.
Advantages:
When the number of elements in the list is large, Binary Search executed faster
than linear search. Hence this method is efficient when number of elements is
large.
Disadvantages:
To implement Binary Search method the elements in the list must be in sorted
order, otherwise it fails.
Complexity of Binary search:-
Best Case Time Complexity of Binary Search: O(1)
Average Case Time Complexity of Binary Search: O(logN)
Worst Case Time Complexity of Binary Search: O(logN)
Space Complexity of Binary Search: O(1) for iterative, O(logN) for recursive.
Time-Space Trade-Off in Algorithms
A tradeoff is a situation where one thing increases and another thing decreases. It
is a way to solve a problem in:
Either in less time and by using more space, or
In very little space by spending a long amount of time.
The best Algorithm is that which helps to solve a problem that requires less space
in memory and also takes less time to generate the output. But in general, it is not
always possible to achieve both of these conditions at the same time. The most
common condition is an algorithm using a lookup table. This means that the
answers to some questions for every possible value can be written down. One
way of solving this problem is to write down the entire lookup table, which will
let you find answers very quickly but will use a lot of space. Another way is to
calculate the answers without writing down anything, which uses very little
space, but might take a long time. Therefore, the more time-efficient algorithms
you have, that would be less space-efficient.
Types of Space-Time Trade-off
Compressed or Uncompressed data
Re Rendering or Stored images
Smaller code or loop unrolling
Lookup tables or Recalculation