Data Structure1 Notes
Data Structure1 Notes
ON
DATA STRUCTURE
CONTENTS
2. 13/05/2008 4. 12,34,43,21
ENTITY
An entity is a thing or object in the real world that is distinguishable from all other objects.
The entity has a set of properties or attributes and the values of some sets of these attributes
may uniquely identify an entity.
An entity set is a collection of entities.
Example:
Entity Student
INFORMATION
It can be defined as meaningful data or processed data. When the raw facts are processed, we
get a related piece of information as its output/result.
Example:
Data (01/01/1980) becomes information if entity Ram is related to Date of birth attribute
(01/01/1980) as follows:
DOB of student Ram is 01/01/1980
Data 1
DATA TYPE
A data type is a term which refers to the kind of data that may appear in computation. Every
programming language has its own set of built-in data types.
Example:
Data Data type
34 Numeric
Chintech String
21,43,56 Array of integers
12/05/2008 Date
In C, the following are the basic data types are: int, long, char, float, double, void etc.
Definition:
Data Structure is a specialized format for storing data so that the data’s can be organized in
an efficient way.
Or
To identify and create useful mathematical entities and operations to determine what classes
of problems can be solved by using these entities and operations.
To determine the representation of these abstract entities and to implement the abstract
operations on these concrete representation.
NEED OF A DATA STRUCTURE
To understand the relationship of one data elements with the other and organize it within the
memory.
A data structures helps to analyze the data, store it and organize it in a logical or
mathematical manner.
Data structures allow us to achieve an important goal: component reuse.
Data 2
CLASSIFICATION OF DATA STRUCTURE
Based on how the data items are operated, it will classify into two broad categories.
Primitive Data Structure: These are basic DS and the data items are operated closest to the
machine level instruction.
Example: integer, characters, strings, pointers and double.
Non-Primitive Data Structure: These are more sophisticated DS and are derived from
primitive DS. Here data items are not operated closest to machine level instruction. It
emphasize on structuring of a group of homogeneous (same type) or heterogeneous (different
type) data items.
Linear Data Structure: In which the data items are stored in sequence order.
Example: Arrays, Linked Lists, Stacks and Queues
Non Linear Data Structure: In Non-Linear data structures, the data items are not in
sequence.
Example: Trees, Graphs.
Data 3
4. Insertion – Adding a new element to the list.
5. Deletion – Removing an element from the list.
6. Sorting – Arranging the records either in ascending or descending order.
7. Merging – Combining two lists into a single list.
8. Modifying – the values of DS can be modified by replacing old values with new ones.
9. Copying – records of one file can be copied to another file.
10. Concatenating – Records of a file are appended at the end of another file.
11. Splitting – Records of big file can be splitting into smaller files.
Fig shows a picture of what an abstract data type is and how it operates.
The user interacts with the interface by using the operations.
The user is not concerned with the details of the implementation.
The implementation of an abstract data type, often referred to as a data structure
An ADT is a data declaration packaged together with the operations that are meaningful on
the data type.
1. Declaration of Data
2. Declaration of Operations
Examples: Objects such as lists, sets and graphs with their operations can be called as ADT.
For the SET ADT, we might have operations such as union, intersection, and complement
etc.
Uses of ADT:
1. It helps to efficiently develop well designed program
2. Facilitates the decomposition of the complex task of developing a software system into a
number of simpler subtasks
3. Helps to reduce the number of things the programmers has to keep in mind at any time.
4. Breaking down a complex task into a number of earlier subtasks also simplifies testing and
debugging.
Data 4
ALGORITHM:
An algorithm is a method of representing the step by step logical procedure for solving a
problem. It is a tool for finding the logic of a problem.
In addition every algorithm must satisfy the following criteria:
Input: There are zero or more quantities which are externally supplied. i.e. each
algorithm must take zero, one or more quantities as input data.
Output: At least one quantity is produced. i.e. produce one or more output values.
Definiteness: Each instruction or step of an algorithm must be clear and unambiguous.
Finiteness: An algorithm must terminate in a finite number of steps.
Effectiveness: Each step must be effective, in the sense that it should be primitive
(easily convertible to program).
ALGORITHM COMPLEXITY:
After designing an algorithm, we have to be checking its correctness. This is done by
analyzing the algorithm. The algorithm can be analyzed by tracing all step-by-step
instructions, reading the algorithm for logical correctness, and testing it on some data
using mathematical techniques to prove it correct.
There may be more than one algorithm to solve a problem. The choice of a
particular algorithm depends on its complexity.
Complexity refers to the rate at which the required storage or consumed time grows as
a function of the problem size.
It is the performance evaluation or analysis / measurement of an algorithm is obtained by
totaling the number of occurrences of each operation when running the algorithm.
Two important ways to characterize the effectiveness of an algorithm are its
space complexity and time complexity.
Space Complexity
Time Complexity
SPACE COMPLEXITY:
Space complexity of an algorithm or program is the amount of memory it needs to run
to completion.
We can measure the space by finding out that how much memory will be consumed by
the instructions and by the variables used.
Example:
Suppose we want to add two integer numbers. To solve this problem we have
following twoalgorithms:
Algorithm 1: Algorithm 2:
Step 1- Input A. Step 1- Input A.
Step 2- Input B. Step 2- Input B.
Step 3- Set C: = A+ B. Step 3- Write: ‘Sum is ‘, A+B.
Data 5
Step 4- Write: ‘Sum is ‘,
C. Step 4- [Link] 5-
Exit.
Both algorithms will produce the same result. But first takes 6 bytes and second takes 4
bytes (2 bytes for each integer). And first has more instructions than the second one. So we
will choose the second one as it takes less space than the first one. (Space Complexity).
Suppose 1 second is required to execute one instruction. So the first algorithm will take 4
seconds and the second algorithm will take 3 seconds for execution. So we will choose the
second one as it will take less time. (Time Complexity).
Time Complexity
➢ The time complexity of a program / algorithm is the amount of computer time that is
need to run to completion.
➢ But the calculation of exact amount of time required by the algorithm is very difficult. So we
can estimate the time and to estimate the time we use some asymptotic notation, which
are described below. Let the no. of steps of an algorithm is measured by the function f(n).
The given function f(n) can be expressed by big ‘O’ notation as follows. f(n) = O (g(n)) if and
only if there exist the +ve const. ‘C’ and no such that f(n) ≤ C * g(n) for all n ≥ no.
Big ‘O’ is the upper bound Ω is the lower bound θ is the avg bound which can be estimated
in time complexity.
Space Complexity
The space complexity is the program that the amount of memory that is needed to run to
completion. The space complexity need by a program have two different Components.
Data 6
1. Fixed space requirement: - The components refer to space requirement that do not depend
upon the number and size of the programs input and output.
Data 7
2. Variable space requirement: - This component consist of the space needed by structure
variable whose size depends on the particular instruction I , of the program begin solved.
TIME-SPACE TRADE-OFF:
There may be more than one approach (or algorithm) to solve a problem.
The best algorithm (or program) to solve a given problem is one that requires less space
in memory and takes less time to complete its execution.
But in practice, it is not always possible to achieve both of these objectives.
One algorithm may require more space but less time to complete its execution while the
other algorithm requires less time space but takes more time to complete its execution.
Thus, we may have to sacrifice one at the cost of the other.
If the space is our constraint, then we have to choose a program that requires less space
at the cost of more execution time.
On the other hand, if time is our constraint such as in real time system, we have to choose a
program that takes less time to complete its execution at the cost of more space.
Data 8
Chapter-2
STRING PROCESSING
A string in C is a array of character.
It is a one dimensional array type of char.
Every string is terminated by null character( ‘\0’) .
The predefined functions gets() and puts() in C language to read and display string
respectively.
Declaration of strings:
Strings are declared in C in similar manner as arrays. Only difference is that, strings are ofchar
type.
Syntax:
char str_name[str_size];
Example:
char s[5];
Initialization of strings
char c[]="abcd";
Strlen():
The strlen( ) function is used to calculate the length of the string.
It means that it counts the total number of characters present in the string
which includes alphabets, numbers, and all special characters including blank
spaces.
Example:
char str[] = "Learn C
Online"; int strLength;
strLength = strlen(str); //strLength contains the length of the string i.e. 14
Strcpy()
• strcpy function copies a string from a source location to a destination location
and provides a null character to terminate the string.
Syntax:
strcpy(Destination_String,Source_String);
Example:
char *Destination_String;
char *Source_String = "Learn C Online";
Data 9
strcpy(Destination_String,Source_String);
printf("%s", Destination_String);
Output:
Learn C Online
Strcmp()
Data 1
Chapter-3
ARRAYS
An array is a finite, ordered and collection of homogeneous data elements.
ordered – All the elements are stored one by one in a contiguous memory location.
LINEAR ARRAY
If one subscript is required to refer all the elements in an array then this is called Linear array or
one-dimensional array.
Let b = address of the 1st element in the array i.e. base address If w = word size
a[3] = b + i x w
=b+3x2=b+6
Data 1
Operation on Array
1. TRAVERSAL
Inserting the array at the end position can be done easily, but to insert at the middle of the
array we have to move the element to create a vacant position to insert the new element.
int insert (int a[], int n, int pos, int ele)
{
int i;
for(i=n-1; i>=pos-1; i--)
{
a[i+1] = a[i];
}
a[pos-1] = ele;
n++;
return(n);
}
3. DELETION
Deleting an element at the end of the array can be done easily by only decreasing the array size
by 1.
But deleting an element at the middle of the array require that each subsequent element from the
position where to be deleted should be moved to fill up the array.
Data 1
for (i = pos-1; i<n-1; i++)
{
a[i] = a[i+1];
} n- - ;
return(0);
}
Row-major Order
In row-major order the row elements are focused first that means elements of matrix are
stored on a row-by-row basis.
Logical Representation of array a[3][3]
Data 1
Column-major Order
In column major order the column elements are focused first that means elements of the matrix
are stored in column-by-column basis.
Ex: Column major order representation of a[3][3]
Data 1
Address Calculation of Matrix in Memory
Row-major order
w Column-major order
Address of a[i][j] = b + (j x u1 + i) x w
Pointers
The pointer in C language is a variable which stores the address of another variable. This
variable can be of type int, char, array, function, or any other pointer.
Example:
Pointer Example
An example of using pointers to print the address and value is given below.
Data 1
In the above figure, pointer variable stores the address of number variable, i.e., fff4. The value of
number variable is 50. But the address of pointer variable p is aaa3.
By the help of * (indirection operator), we can print the value of pointer variable p.
Let's see the pointer example as explained for the above figure.
#include<stdio.h>
int main(){
int number=50;
int *p;
p=&number;//stores the address of number variable
printf("Address of p variable is %x \n",p);
printf("Value of p variable is %d \n",*p);
return 0;
}
Output
Address of number variable is
fff4 Address of p variable is fff4
Value of p variable is 50
Sparse matrices
Matrix with relatively a high proportion of zero entries are called sparse matrix. In other words,
the sparse matrix can be defined as the matrix that has a greater number of zero elements than the
non-zero elements.
Benefits of using the sparse matrix
i) Storage - We know that a sparse matrix contains lesser non-zero elements than zero, so less
memory can be used to store elements. It evaluates only the non-zero elements.
ii) Computing time: In the case of searching in sparse matrix, we need to traverse only the non-
zero elements rather than traversing all the sparse matrix elements. It saves computing time by
logically designing a data structure traversing non-zero elements.
Representation of sparse matrix
There are two ways to represent the sparse matrix that are listed as follows -
o Array representation
o Linked list representation
Data 1
of memory. To avoid such wastage, we can store only non-zero elements. If we store only non-
zero elements, it reduces the traversal time and the storage space.
2D array is used to represent a sparse matrix in which there are three rows named as
In a linked list representation, the linked list data structure is used to represent the sparse matrix.
The advantage of using a linked list to represent the sparse matrix is that the complexity of
inserting or deleting a node in a linked list is lesser than the array.
In linked list, each node has four fields. These four fields are defined as:
Row: Index of row, where non-zero element is located
Column: Index of column, where non-zero element is located
Value: Value of the non zero element located at index – (row,column)
Next node: Address of the next node
Data 1
Data 1
Chapter-4
STACKS & QUEUES
Stack is a linear data structure in which an element may be inserted or deleted at one end
called TOP of the stack.
That means the elements are removed from a stack in the reverse order of that in which
they were inserted into the stack.
Stack is called LIFO (Last-in-first-Out) Str. i.e. the item just inserted is deleted first.
There are 2 basic operations associated with stack
1. Push :- This operation is used to insert an element into stack.
2. Pop :- This operation is used to delete an element from stack
Data 1
Array Representation of Stack :
To implement a stack in memory, we need a pointer variable called TOP that hold the index of
the top element of the stack, a linear array to hold the elements of the stack and a variable
MAXSTK which contain the size of the stack.
MAXSTK=5
Array representation of Stack is very easy and convenient but it allows only to
represent fixed sized stack.
But in several application size of the stack may very during program execution, at
that cases we represent a stack using linked list.
Data 2
INSERTION IN STACK (PUSH OPERATION)
This operation is used to insert an element in stack at the TOP of the stack.
Algorithm :-
stack. Algorithm :-
ARITHMETIC EXPRESSION
1. Infix notation
2. Prefix notation
3. Postfix notation
1. INFIX NOTATION
• The conventional way of writing an expression is called as
infix. Ex: A+B, C+D, E*F, G/M etc.
• Here the notation is
<Operand><Operator><Operand>
• This is called infix because the operators come in between the operands.
2. PREFIX NOTATION
• This notation is also known as “POLISH NOTATION”
• Here the notation is
<Operator><Operand><Operand>
• Ex: +AB, -CD, *EF, /GH
Data 2
3. POSTFIX NOTATION
• This notation is called as postfix or suffix notation where operator is suffixed
by operand.
• Here the notation is
<Operand ><Operand><Operator>
• Ex: AB+, CD-, EF*, GH/
• This notation is also known as “ REVERSE POLISH NOTATION.”
CONVERSION FROM INFIX TO POSTFIX EXPRESSION
Algorithm:
POLISH(Q,P)
Q is an arithmetic expression written in infix notation. This algorithm finds the equivalent
postfix expression P.
1. Push “ ( ” onto stack and add “ ) ” to the end of Q.
2. Scan Q from left to right and repeat steps 3 to 6 for each element of Q until the STACK
is empty.
7. Exit
Data 2
Ex: A+(B*C – (D/E^F)*G)*H)
Algorithm:
This algorithm finds the value of an arithmetic expression P written in Postfix notation.
1. Add a right parenthesis “ ) ” at the end of P.
2. Scan P from left to right and repeat steps 3 and 4 for each element of P until the “ ) ” is
encountered.
3. If an operand is encountered, put it on STACK.
4. If an operator X is encountered then:
a) Remove the two top element of STACK, where A is top element and B is the
next-to-top element.
Data 2
b) Evaluate B X A.
c) Place the result of (b) on
STACK. [End of if str.]
5. Set value equal to the top element on STACK.
6. Exit
Ex: 5, 6, 2, +, *, 12, 4, /, - )
Symbol STAC
Scanned K
5 5
6 5,6
2 5,6,2
+ 5,8
* 40
12 40,12
4 40,12,4
/ 40,3
- 37 (Result)
)
APPLICATIONS OF STACK:
Recursion process uses stack for its implementation.
Quick sort uses stack for sorting the elements.
Evaluation of antithetic expression can be sone by using STACK.
Conversion from infix to postfix expression
Evaluation of postfix expression
Conversion from infix to prefix expression
Evaluation of prefix expression.
Backtracking
Keeptrack of post-visited (history of a web- browsing)
IMPLEMENTATION OF RECURSION
A function is said to be Recursive function if it call itself or it call a function such that the
function call back the original function.
This concept is called as Recursion.
The Recursive function has two properties.
Data 2
I. The function should have a base condition.
II. The function should come closer towards the base condition.
The following is one of the example of recursive function which is described below.
Factorial of a no. using Recursion
n. n ! = 1 X 2 X 3 X.....................X (n-) X n
Physically proved n ! = n X (n-1) !
The factorial function can be defined as follows.
I. If n=0 then n ! = 1
II. If n>0 then n ! = n.(n-1) !
The factorial algorithm is given below factorial ( fact , n )
This procedure calculates n! and returns the value in the variable fact.
Data 2
Ex: Calculate the factorial of 4.
4!=4x3!
3!=3x2!
2!=2x1!
1!=1x0!
0!=1
1!=1x1=1
2!=2x1=
2
3!=3x2=
6
4!=4x6=
24
Data 2
QUEUE
• Queue is a linear data structure or sequential data structure where insertion take place at
one end and deletion take place at the other end.
• The insertion end of Queue is called rear end and deletion end is called front end.
• Queue is based on (FIFO) First in First Out Concept that means the node i.e. added first
is processed first.
• Here Enqueue is Insert Operation.
• Dequeue is nothing but Delete Operation.
The Queue is represented by a Linear array “Q” and two pointer variable FRONT
and REAR.
FRONT gives the location of element to be deleted and REAR gives the location
after which the element will be inserted.
The deletion will be done by
setting Front = Front + 1
The insertion will be done by
setting Rear = Rear + 1
INSERTION IN QUEUE(Enqueue)
Algorithm:
Data 2
else
Data 2
Rear = Rear +1
3. Q [Rear] = ITEM
4. Exit
DELETION IN QUEUE(Dequeue)
Algorithm:
Delete (Q, ITEM, FRONT, REAR)
This procedure remove element from queue Q.
1. If Front = Rear = NULL
Print ‘Underflow’ and Exit.
2. ITEM = Q (FRONT)
3. If Front = Rear
Front = NULL
Rear = NULL
else
Front = Front + 1
4. Exit
Data 2
Enqueue and Dequeue Operations
Data 3
CIRCULAR QUEUE
Let we have a queue that contain ‘n’ elements in which Q[1] comes after Q[n].
When this technic is used to construct a queue then the queue is called circular queue.
In Circular queue when REAR is ‘n’ and any element is inserted then REAR will set as 1.
Similarly when the front is n and any element is deleted then front will
be 1.
Data 3
DELETION ALGORITHM OF CIRCULAR QUEUE
Data 3
One way list representation
Array Representation
Here the priority queue is represented by means of 2D array.
Where row value represents the priority number and each row maintained as a circular queue.
Each row has its FRONT and REAR value to represent the starting and ending of row.
FRONT REAR
2 4
1 2
3 1
Data 3
Chapter-5
LINKED LIST
Linked List is a collection of data elements called as nodes.
The node has 2 parts
Info is the data part
Next i.e. the address part that means it points to the next node.
If linked list adjacency between the elements are maintained by means of links or pointers.
A link or pointer actually the address of the next node.
The last node of the list contain ‘\0’ NULL which shows the end of the list.
The linked list contain another pointer variable ‘Start’ which contain the address the first node.
Linked List is of 4 types
Single Linked List
Double Linked List
Circular Linked List
Header Linked List
Representation of Linked List in Memory
Linked List can be represented in memory by means 2 linear arrays i.e. Data or info and Link or
address.
They are designed such that info[K] contains the Kth element and Link[K] contains the next pointer
field i.e. the address of the Kth element in the list.
The end of the List is indicated by NULL pointer i.e. the pointer field of the last element will be
NULLorZero
Data 3
Start = 4
info [4] = N Link [4] = 3
info [3] = T Link [3] = 6
info [6] = E Link [6] = 2
info [2] = F Link [2] = 0 i.e. the null
value So the list has ended .
1. Traversal
Algorithm:
Display (Start) This algorithm traverse the list starting from the 1 st node to the end of the list.
Step-1 : “Start” holds the address of the first node.
Step-2 : Set Ptr = Start [Initializes pointer Ptr]
Step-3 : Repeat Step 4 to 5, While Ptr ≠
NULL
Step-4 : Process info[Ptr] [apply Process to
info(Ptr)] Step-5 : Set Ptr = next[Ptr] [move Print to
next node] [End of loop]
Step-6 : Exit
Data 3
2. Insertion
The insertion operation is used to add an element in an existing Linked List. There is various positions where
node can be inserted.
Suppose the new mode whose information field contains 20 is inserted as the first node.
Algorithm:
This algorithm is used to insert a node at the beginning of the Linked List. Start holds the address of the first
node.
Step-1 : Create a new node named as P.
exit. Step-3 : Set info [P] = 𝑥 (copies a new data into a new node)
Step-2 : If P == NULL then print “Out of memory space” and
Step-4 : Set next [P] = Start (new node now points to original first
node) Step-5 : Set Start = P
Step-6 : Exit
Data 3
Algorithm:
This algorithm is used to insert a node at the end of the linked list. ‘Start’ holds the address of the first
node. Step-1 : Create a new node named as P.
Exit. Step-3 : Set info [P] = 𝑥 (copies a new data into a new
Step-2 : If P = NULL then print “Out of memory space” and
node)
Step-4 : Set next [P] =
NULL Step-5 : Set Ptr =
Start
Step-6 : Repeat Step-7 while Ptr ≠
NULL Step-7 : Set temp = Ptr
Ptr = next [Ptr] (End of step-6
loop) Step-8 : Set next [temp] = P
Step-9 : Exit
To insert a new node at any specific location we scan the List from the beginning and move up to the desired
node where we want to insert a new node.
In the below fig. Whose information field contain 88 is inserted at 3rd location.
Algorithm:
‘Start’ holds the address of the 1st
node. Step-1 : Set Ptr = Start
Step-2 : Create a new node named as P.
Data 3
Step-6 : Read Loc Step-7 : Set i = 1
Step-8 : Repeat steps 9 to 11 while Ptr ≠ NULL and i <
Loc Step-9 : Set temp = Ptr.
Step-10 : Set Ptr = next [Ptr]
Step-11 : Set i = i + 1
[End of step-7 loop]
Step-12 : Set next [temp] = P
Step-13 : Set next [P] = Ptr
Step-14 : Exit
3. Deletion
The deletion operation s used to delete an element from a single linked list. There are various
position where node can be deleted.
Algorithm:
Start holds the address of the 1st
node. Step-1 : Set temp = Start
Step-2 : If Start = NULL then write ‘UNDERFLOW’ & Exit.
Step-3 : Set Start = next[Start]
Step-4 : Free the space associated with
temp. Step-5 : Exit
Data 3
Algorithm:
Start holds the address of the 1st node.
Step-1 : Set Ptr = Start Step-2 : Set temp = Start
Step-3 : If Ptr = NULL then write ‘UNDERFLOW’ & Exit. Step-4 : Repeat Step-5
and 6 While next[Ptr] ≠ NULL Step-5 : Set temp = Ptr
Step-6 : Set Ptr = next[Ptr] (End of step 4
loop) Step-7 : Set next[temp] = NULL
Step-8 : Free the space associated with Ptr.
Step-9 : Exit
Algorithm:
Start holds the address of the 1st node.
Step-1 : Set Ptr = Start
Step-2 : Set temp = Start
Step-3 : If Ptr = NULL then write ‘UNDERFLOW’ and
Exit. Step-4 : Set i = 1
Step-5 : Read Loc
Step-6 : Repeat Step-7 to 9 while Ptr ≠ NULL and i <
Loc Step-7 : Set temp = Ptr
Step-8 : Set Ptr = next[Ptr]
Step-9 : Set i = i+1
(End of Step 6
loop)
Step-10 : Set next[temp] = next[Ptr]
Step-11 : Free the space associated with
Ptr. Step-12 : Exit
4. SEARCHING
Data 3
Algorithm:
Start holds the address of the 1st node.
Step-1 : Set Ptr = Start
Step-2 : Set Loc = 1
Step-3 : Read
element
Step-4 : Repeat Step-5 and 7 While Ptr ≠ NULL
Step-5 : If element = info[Ptr] then Write ‘Element found at position’, Loc and
Exit. Step-6 : Set Loc = Loc+1
Step-7 : Set Ptr = next[Ptr]
(End of step 4 loop)
Step-8 : Write ‘Element not
found’ Step-9 : Exit
This data part of this header node is generally used to hold any global information about the
entire linked list. The next part of the header node points to the first node in the list.
i) Grounded header linked list that stores NULL in the last node’s next field.
ii) Circular header linked list that stores the address of the header node in the next part of the
last node of the list.
Data 4
Garbage Collection
The operating system of a computer may periodically collect all the deleted space on to the free
storage list. Any technique which does these collections is called garbage collection.
When we delete a particular note from an existing linked list or delete the linked list the space
occupied by it must be given back to the free pool. So that the memory can be the used by some
other program that needs memory space.
The operating system will perform this operation whenever it finds the CPU idle or whenever
the programs are falling short of memory space. The OS scans through the entire memory cell &
marks those cells. That are being by some program then it collects the entire cell which are not
being used & add to the free pool. So that this cells can be used by other programs. This process
is called garbage collection. The garbage collection is invisible to the programmer.
Data 4
Chapter-6
TREE
A tree is non-linear and a hierarchical data structure consisting of a collection of nodes such that
each node of the tree stores a value.
Application:
The following are the applications of trees:
Storing naturally hierarchical data: Trees are used to store the data in the
hierarchical structure. For example, the file system.
Organize data: It is used to organize data for efficient insertion, deletion and searching.
Trie: It is a special kind of tree that is used to store the dictionary. It is a fast and
efficient way for dynamic spell checking.
Heap: It is also a tree data structure implemented using arrays. It is used to
implement priority queues.
B-Tree and B+Tree: B-Tree and B+Tree are the tree data structures used to
implement indexing in databases.
Routing table: The tree data structure is also used to store the data in routing tables
in the routers.
Basic Tree Terminologies
1. Node
A node is an entity that contains a key or value .
The last nodes of each path are called leaf nodes or external nodes that do not contain a
link/pointer to child nodes.
The node having at least a child node is called an internal node.
2. Root Node: The topmost node of a tree or the node which does not have any parent node is
called the root node. {1} is the root node of the tree. A non-empty tree must contain exactly
one root node .
3. Parent Node: The node which is a predecessor of a node is called the parent node of
that node. {2} is the parent node of {6, 7}.
4. Child Node: The node which is the immediate successor of a node is called the child node of
that node. Examples: {6, 7} are the child nodes of {2}.
Data 4
5. Degree of a Node: The total count of subtrees attached to that node is called the degree of the
node. The degree of a leaf node must be 0. The degree of a tree is the degree of its root. The
degree of the node {3} is 3. Leaf Node or External Node: The nodes which do not have any
child nodes are called leaf nodes. {6, 14, 8, 9, 15, 16, 4, 11, 12, 17, 18, 19} are the leaf nodes of
the tree
6. Sibling: Children of the same parent node are called siblings. {8, 9, 10} are called siblings.
7. Depth of a node: The count of edges from the root to the node. Depth of node {14} is 3.
8. Height of a node: The number of edges on the longest path from that node to a leaf. Height of
node {3} is 2.
9. Height of a tree: The height of a tree is the height of the root node i.e the count of edges
from the root to the deepest node. The height of the above tree is 3.
10. Level of a node: The count of edges on the path from the root node to that node. The
root node has level 0.
11. Internal node: A node with at least one child is called Internal Node.
12. Ancestor of a Node: Any predecessor nodes on the path of the root to that node are
called Ancestors of that node. The node 1 is called ancestor of node7 as there is successive
parent present from node 7 to node 1. {1, 2} are the parent nodes of the node {7}
13. Descendant: Any successor node on the path from the leaf node to that node. {7, 14} are the
descendants of the node. {2}.
14. Path: A sequence of edges is called as path.
Data 4
Binary Tree
Each node of a binary tree consists of three items:
data item
address of left child
address of right child
A binary tree is a tree-type non-linear data structure with a maximum of two children for each
parent. Here, binary name itself suggests that 'two'; therefore, each node can have either 0, 1 or 2
children.
Types of Binary Tree
1. Full/ proper/ strict Binary tree
The full binary tree is also known as a strict binary tree. The tree can only be considered as the
full binary tree if each node must contain either 0 or 2 children. The full binary tree can also
be defined as the tree in which each node must contain 2 children except the leaf nodes.
The complete binary tree is a tree in which all the nodes are completely filled except the last
level. In the last level, all the nodes must be as left as possible. In a complete binary tree,
the nodes should be added from the left.
If we number the nodes of the complete binary tree from left to right level by level then the node
k has its left child at 2k position and right child at 2k+1 position.
Data 4
Binary Tree Representation in memory
A tree must represent a hierarchical relationship between parent node and child node.
Let T be a Binary Tree. There are two ways of representing T in the memory as follow
1. Sequential or Linear Representation of Binary Tree.
2. Link Representation of Binary Tree.
In this
representation of binary tree root will contain the location of the root R of T. If any one of the
Data 4
subtree is empty, then the corresponding pointer will contain the null value if the tree T itself is
empty, the ROOT will contain the null value.
Example
Consider the binary tree T in the figure. A schematic diagram of the linked list representation of
T appears in the following figure. Observe that each node is pictured with its three fields, and
that the empty subtree is pictured by using x for null entries.
Binary Tree
Traversal is a process to visit all the nodes of a tree and may print their values too. Because, all
nodes are connected via edges (links) we always start from the root (head) node. That is, we
cannot randomly access a node in a tree.
There are three ways which we use to traverse a tree −
In-order Traversal
Pre-order Traversal
Post-order Traversal
[Link]-order Traversal
This is also known as symmetric order.
This traversal is referred as LNR.
Algorithm-
Data 4
3. Visit all the nodes in the right subtree
Left → Root → Right
Application-
Inorder traversal is used to get infix expression of an expression tree.
2. Preorder Traversal-
This is also known as depth fist
order. This traversal is referred as
NLR. Algorithm-
1. Visit the root
2. Traverse the left sub tree i.e. call Preorder (left sub tree)
3. Traverse the right sub tree i.e. call Preorder (right sub
Applications-
Preorder traversal is used to get prefix expression of an expression [Link] traversal is used
to create a copy of the tree.
3. Postorder Traversal-
This traversal is referred as LRN.
Data 4
Algorithm-
1. Traverse the left sub tree i.e. call Postorder (left sub tree)
2. Traverse the right sub tree i.e. call Postorder (right sub tree)
3. Visit the root
Applications-
Postorder traversal is used to get postfix expression of an expression tree.
Postorder traversal is used to delete the tree.
This is because it deletes the children first and then it deletes the parent.
Binary Search tree
A binary tree T is termed as Binary Search Tree if each node n of T satisfies the following
property.
The value of n is greater than the value of all nodes in its left subtree.
The value of n is less than the value of all nodes in its right subtree.
Data 4
o First, we have to insert 45 into the tree as the root of the tree.
o Then, read the next element; if it is smaller than the root node, insert it as the root of
the left subtree, and move to the next element.
o Otherwise, if the element is larger than the root node, then insert it as the root of the
right subtree.
Now, let's see the process of creating the Binary search tree using the given data element. The
process of creating the BST is shown below -
Step 1 - Insert 45.
Data 4
Step 6 - Insert 55.
55 is larger than 45 and smaller than 79, so it will be inserted as the left subtree of 79.
12 is smaller than 45 and 15 but greater than 10, so it will be inserted as the right subtree of 10.
20 is smaller than 45 but greater than 15, so it will be inserted as the right subtree of 15.
Data 5
Step 9 - Insert 50.
50 is greater than 45 but smaller than 79 and 55. So, it will be inserted as a left subtree of 55.
Data 5
Now, let's understand the searching in binary tree using an example. We are taking the binary
search tree formed above. Suppose we have to find node 20 from the below tree.
Step1:
Step2:
Step3:
Data 5
Now, let's see the algorithm to search an element in the Binary search tree.
Algorithm to search an element in Binary search tree
1. Search (root, item)
2. Step 1 - if (item = root → data) or (root = NULL)
3. return root
4. else if (item < root → data)
5. return Search(root → left, item)
6. else
7. return Search(root → right, item)
8. END if
9. Step 2 - END
Data 5
Deletion in Binary Search tree
In a binary search tree, we must delete a node from the tree by keeping in mind that the property
of BST is not violated. To delete a node from BST, there are three possible situations occur -
o The node to be deleted is the leaf node, or,
o The node to be deleted has only one child, and,
o The node to be deleted has two children
1. When the node to be deleted is the leaf node
o It is the simplest case to delete a node in BST. Here, we have to replace the leaf node
with NULL and simply free the allocated space.
o In below image, suppose we have to delete node 90, as the node to be deleted is a leaf
node, so it will be replaced with NULL, and the allocated space will free.
Data 5
In the below image, suppose we have to delete the node 79, as the node to be deleted has only
one child, so it will be replaced with its child 55.
So, the replaced node 79 will now be a leaf node that can be easily deleted.
Data 5
Chapter-7
GRAPH
A graph is a non-linear kind of data structure made up of nodes or vertices and edges.
Formally, a graph is a pair of sets (V, E), where V is the set of vertices and E is the set of edges,
connecting the pairs of vertices.
In the above Graph, the set of vertices V = {0,1,2,3,4} and the set of edges E = {01, 12, 23, 34,
04, 14, 13}.
Applications
Graphs are used to solve many real-life problems. Graphs are used to represent networks. The
networks may include paths in a city or telephone network or circuit network. Graphs are also
used in social networks like linkedIn, Facebook. For example, in Facebook, each person is
represented with a vertex(or node). Each node is a structure and contains information like person
id, name, gender etc.
Graph Terminologies
1. Undirected: A graph in which all the edges are bi-directional. The edges do not point in a
specific direction.
Data 5
2. Directed Graph or Digraph: A graph G = (V, E) with a mapping f such that every edge maps
onto some ordered pair of vertices (Vi, Vj) is called Digraph. It is also called Directed Graph.
Ordered pair (Vi, Vj) means an edge between Vi and Vj with an arrow directed from Vi to Vj.
3. Weighted Graph: A graph that has a value associated with every edge. The values
corresponding to the edges are called weights. A value in a weighted graph can represent
quantities such as cost, distance, and time, depending on the graph. Weighted graphs are
typically used in modeling computer networks.
4. Unweighted Graph: A graph in which there is no value or weight associated with the edge.
All the graphs are unweighted by default unless there is a value associated.
Data 5
5. Complete Graph: A simple graph with n vertices is called a complete graph if the degree of
each vertex is n-1, that is, one vertex is attach with n-1 edges. A complete graph is also called
Full Graph.
6. Cyclic Graph: For a graph to be called a cyclic graph, it should consist of at least one cycle. If
a graph has a minimum of one cycle present, it is called a cyclic graph.
7. Acyclic Graph: A graph is called an acyclic graph if zero cycles are present, and an acyclic
graph is the complete opposite of a cyclic graph.
Data 5
12. Adjacent Nodes: Two nodes are called adjacent if they are connected through an edge.
13. Path: sequence of vertices in which each pair of successive vertices is connected by an edge
14. Isolated Vertex: A vertex with degree zero is called an isolated vertex.
Example:
Here, the vertex 'a' and vertex 'b' has a no connectivity between each other and also to any other
vertices. So the degree of both the vertices 'a' and 'b' are zero. These are also called as isolated
vertices.
Representation of Graph
The graph can be represented as matrices.
Adjacency Matrix
Adjacency Matrix is used to represent a graph. We can represent directed as well as undirected
graphs using adjacency matrices..
If a graph has n number of vertices, then the adjacency matrix of that graph is n x n, and each
entry of the matrix represents the number of edges from one vertex to another.
Adjacency Matrix Representation for undirected graph
If an Undirected Graph G consists of n vertices then the adjacency matrix of a graph is n x n
matrix A = [aij] and defined by -
aij = 1 {if there is a path exists from Vi to
Vj} aij = 0 {Otherwise}
In an undirected graph, if there is an edge exists between Vertex A and Vertex B, then the
vertices can be transferred from A to B as well as B to A.
Data 5
The adjacency matrix of the above graph will be -
In a directed graph, edges form an ordered pair. Edges represent a specific path from some vertex
A to another vertex B. Node A is called the initial node, while node B is called the terminal node.
Data 6
Properties of the adjacency matrix
o An adjacency matrix is a matrix that contains rows and columns used to represent a
simple labeled graph with the numbers 0 and 1 in the position of (VI, Vj), according to the
condition of whether or not the two Vi and Vj are adjacent.
o For an undirected graph, if there is an edge that exists between vertex i or Vi to Vertex
j or Vj, then the value of A[Vi][Vj] = 1 and A[Vj][Vi] = 1, otherwise the value will be 0.
Data 6
Chapter-8
SORTING SEARCHING & MERGING
Sorting refers to the operation or technique of arranging and rearranging sets of data in some
specific order. Sorting can be done in ascending and descending order such as such as
increasing or decreasing with numerical data or alphabetically with character data.
BUBBLE SORT
Bubble sort is a simple sorting algorithm. Bubble sort works on the repeatedly swapping of
adjacent elements until they are not in the intended order. This algorithm is not suitable for
large data sets as its average and worst case complexity are of Ο(n2) where n is the number of
items.
Algorithm:
a is the list of elements stored in memory to be sorted.
m is the size of the list.
bubbleSort(a[ ],m)
1. for(pass=0;pass<m-1;pass++)
2. for (i=0; i<m-pass-1;i++)
3. if (a[i]>a[i+1])
4. temp = a[i];
5. a[i] =a[i+1];
6. a[i+1]=temp;
End
First Pass
Sorting will start from the initial two elements. Let compare them to check which is greater.
Here, 32 is greater than 13 (32 > 13), so it is already sorted. Now, compare 32 with 26.
Here, 26 is smaller than 36. So, swapping is required. After swapping new array will look like -
Here, 35 is greater than 32. So, there is no swapping required as they are already sorted.
Now, the comparison will be in between 35 and 10.
Data 6
Here, 10 is smaller than 35 that are not sorted. So, swapping is required. Now, we reach at the
end of the array. After first pass, the array will be -
Here, 10 is smaller than 32. So, swapping is required. After swapping, the array will be -
Here, 10 is smaller than 26. So, swapping is required. After swapping, the array will be -
Data 6
Time Complexity
Space Complexity
Space Complexity O(1)
Quick sort
Quick sort is a faster and highly efficient sorting algorithm which follows the divide and
conquers approach.
Divide and conquer is a technique of breaking down the algorithms into sub problems, then
solving the sub problems, and combining the results back together to solve the original
problem.
Divide: In Divide, first pick a pivot element. After that, partition or rearrange the array into two
sub-arrays such that each element in the left sub-array is less than or equal to the pivot element
and each element in the right sub-array is larger than the pivot element.
Conquer: Recursively, sort two subarrays with Quicksort.
Combine: Combine the already sorted array.
Choosing the pivot
Pivot can be random, i.e. select the random pivot from the given array.
Pivot can either be the rightmost element of the leftmost element of the given array.
Select median as the pivot element.
Algorithm:
a is the list of elements stored in memory to be sorted.
beg is the initial index of the list.
end is the last index of the list.
Pivot is the position in the list where partition takes place.
quickSort(a[],beg,end)
1. if(beg< end)
2. pivot = partition(a, beg, end)
3. quickSort(a, beg, pivot-1)
4. quickSort(a, pivot+1, end)
End
partition(a, beg,
end) [Link] =
a[beg] 2.i= beg
Data 6
3. for (j=beg+1; j<=end; j++)
4. if (a[j]<=
pivotItem) 5. i=i+1
6. if(i!=j)
7. swap(a[i],a[j])
8. swap(a[beg], a[i])
9. return
i End
In the given array, we consider the leftmost element as pivot. So, in this case, a[left] = 24,
a[right] = 27 and a[pivot] = 24.
Since, pivot is at left, so algorithm starts from right and move towards left.
Now, a[pivot] < a[right], so algorithm moves forward one position towards left, i.e. –
Now, a[left] = 19, a[right] = 24, and a[pivot] = 24. Since, pivot is at right, so algorithm starts
from left and moves to right.
As a[pivot] > a[left], so algorithm moves one position to right as -
Data 6
Now, a[left] = 9, a[right] = 24, and a[pivot] = 24. As a[pivot] > a[left], so algorithm moves one
position to right as -
Now, a[left] = 29, a[right] = 24, and a[pivot] = 24. As a[pivot] < a[left], so, swap a[pivot] and
a[left], now pivot is at left, i.e. –
Since, pivot is at left, so algorithm starts from right, and move to left. Now, a[left] = 24,
a[right] = 29, and a[pivot] = 24. As a[pivot] < a[right], so algorithm moves one position to left,
as –
Now, a[pivot] = 24, a[left] = 24, and a[right] = 14. As a[pivot] > a[right], so, swap a[pivot] and
a[right], now pivot is at right, i.e. –
Now, a[pivot] = 24, a[left] = 14, and a[right] = 24. Pivot is at right, so the algorithm starts from
left and move to right.
Data 6
Now, a[pivot] = 24, a[left] = 24, and a[right] = 24. So, pivot, left and right are pointing the
same element. It represents the termination of procedure.
Element 24, which is the pivot element is placed at its exact position.
Elements that are right side of element 24 are greater than it, and the elements that are left side
of element 24 are smaller than it.
Now, in a similar manner, quick sort algorithm is separately applied to the left and right sub-
arrays. After sorting gets done, the array will be –
1. Time Complexity
2. Space Complexity
Space Complexity O(n*logn)
MERGING:
The operation of sorting is closely related to the process of merging. The merging of two order
table which can be combined to produce a single sorted table.
This process can be accomplishes easily by successively selecting the record with the smallest
key occurring by either of the table and placing this record in a new table.
SIMPLE MERGE
SIMPLE MERGE [FIRST,SECOND,THIRD,K]
Given two orders in table sorted in a vector K with FIRST, SECOND, THIRD
Data 6
The variable I & J denotes the curser associated with the FIRST & SECOND table
respectively. L is the index variable associated with the vector TEMP.
Algorithm
Step 1:
[Initialize] Set I =
FIRST
Set J = SECOND
Set L = 0
Step 2: [Compare corresponding elements and output the smallest]
Repeat while I < SECOND & J < THIRD
If K[I] <= K[J],then L=L+1
TEMP [L]=K[I]
I=I+1
Else L=L+1
TEMP[L]=K[J]
J=J+1
Step 3: [Copy remaining unprocessed element in output area]
If I>=SECOND
Then repeat while J<=THIRD
L=L+1
TEMP[L]=K[J]
J=J+1
Else
Repeat while I< SECOND
L=L+1
TEMP[L]=K[I]
I=I+1
Step 4: [Copy the element into temporary vector into original area]
Repeat for I=1,2,…..L
K[FIRST-I+1]=TEMP[I]
Step 5: Exit
SEARCHING
Finding the location of a given item in a particular array is called as searching.
These are 2 types Searching
1. Linear Search
2. Binary Search
Linear Search
Suppose ‘a’ is an array with n element.
To find an item we have to search the array ‘a’ from the first element sequentially.
This sequential search is known as Linear Search.
Data 6
Algorithm:
void linearsearch (int a[], int n, int no)
{
int loc = -1, i ; for (i=0; i<n; i++)
{
if (a[i] == no) loc = i ;
}
if (loc == -1)
Printf (“Data not found”);
else
Printf(“Data found at %d position/location”,loc);
}
Where a[] = name of the array
n = no. of element present in array ‘a’
no = number to be searched loc = location to be searched
Binary Search
- Sequential Search is a simple method for searching an element in array.
- But this is not efficient for large list because we will have to make n comparison to
search for the last element in the list.
- The binary search technique is very much efficient and applied only in sorted
array. Ex: Searching a name in telephone directory or searching a word in dictionary.
- In this method we make a comparison between the key and the middle element of the
array.
- Since the array is sorted this comparison results either in a match between key and the
middle element of the array or identifying the left half or result half of the array.
- When the current element is not equal to the middle element of the array, the procedure
is repeated on the half in which the desired element is likely to be present.
- Proceeding in this way, either the element is detected or final division leads to a half
consisting of no element.
Algorithm:
int BinarySearch (int key, int a[], int n)
{
int low, hi, mid;
low = 0 ;
hi = n-1;
while (low<=hi)
{
mid = (low+hi)/2;
if (key = = a[mid])
Data 6
return(mid);
if (key<a[mid])
hi = mid-1;
else
low = mid+1;
}
return(-1);
}
Data 7
Example:
Suppose the array elements are
11, 22, 33, 44, 55, 66, 77
Is 33 == a[3] No
Is 33 < a[3] yes
the steps will be repeated for lower half
hi = mid-1=3-1=2
low = 0
Is low <= hi
Mid=(0+2)/2=1
Is 33 == a[2]
Yes the search is successful at index 2.
Data 7
Chapter-9
FILE ORGANIZATION
File:
A file is an external collection of related data treated as a unit.
Files are stored in auxiliary/secondary storage devices.
– Disk
– Tapes
A file is a collection of data records with each record consisting of one or more fields.
A file is a collection of data records grouped together for purpose of access control
andmodification.
It is the primary resource in which we can store information and can retrieve the
information when it is required.
The absolute file name consists of:
• drive name
• directory name(s)
• file name
• Extension
For example: d:/network/[Link]
Terms Used with Files
Field: This is the basic element of data which contains a single value and
characterized by itslength and data type.
Record: This is the collection of related fields that can be treated as a unit by some
applicationprogram.
Data 7
PowerPoint Microsoft PowerPoint
.ppt
Presentation
File Operations:
Different types of operations can be performed on the file.
• Create
• Delete
• Open
• Close
• Read
• Write
Data 7
Direct access/ Random access method:
Direct access is also called as relative access.
In this method records can read/write randomly without any order.
The direct access method is based on disk model of a file, because disks allow random
access to any file block.
A direct access file allows arbitrary blocks to be read or written.
For ex. A disk consisting of 256 blocks, the current position of
read/write head is at 95th block. The block to be read or write is 250th block. Then we can
access the 250th block directly without any direction.
Best example for direct access is a CD consisting of 10 songs, at present we are listening
the song no.3, suppose we want listening the song no.9, then we can shift from song no.3
to 9 without any restriction.
Data 7
ADDRESS CALCULATION OR HASHING
Hashing is the function to generate a physical address from the record key value.
So hashing is a process to generate the physical address from key value in direct access file.
H(k) =
- The hash function should be as far as possible uniformly distribute the hash address
throughout the set L, so that there will be minimum no. of collision.
Data 7
The remainder is calculated physical address for that record.
Suppose key value of the record is 2345.
i.e. k =2345 and m = 97 then H(k) = 2345 % 97=17
So that particular record will be sorted in 17 location of memory.
Generally m should be a prime number.
Mid Square method
The key value is squared. Then the mid digits are the address for that key.
Suppose key value = 2345
k2 = 5499025
If the memory is of two bit then mid will be 99.
Folding Method
In folding method the key value is folded into no. of parts, where each part contains same no. of
digits except possibly the last.
H(k) = k1 + k2 +k3…..kn
Suppose key value = 2356
k = 2356 k1 = 23, k2 = 56
23+56 = 79
So the address for that record is 79.
Collision
Suppose we have calculated an address L for a given key value k. But if that location L is
previously occupied by some other record then that situation is known as collision.
So if the hash function will generate same address for two different key values. Then it is
said tobe collision.
Consider the division remainder
[Link] k1 = 150 , k2 = 184 , m
=17
H (k1) = 150 mod 17 = 14
H (k2) = 184 mod 17 = 14
So for key values 150 and 184 the calculated address is 14. This is collision.
Data 7
Advantages
- Accessing a record is faster.
- No need to arrange the record on a sorted order.
- If required, records can be processed sequentially.
- Insertion, deletion is easier
Disadvantage
- Expensive I/O device as compared to sequential file structure.
- Address generation overhead due to hashing function.
- Updation of all record is inefficient as compared to sequential file.
Application
Interactive online application such as airline and railway reservation system.
Data 7
References
Data 7