UNIT 1
INTRODUCTION TO DATA STRUCTURES & ALGORITHMS
Basic Terminologies:
Data
Raw facts and figures that have no meaning until processed.
Example: 10, 20, 30 are data values.
Information
Processed data that has meaning.
Example: “The average marks of a student is 20” → meaningful information.
Data Item
A single unit of data that represents a value.
Example: In student record → RollNo, Name, Marks are data items.
Record
A collection of related data items treated as a unit.
Example: A student record may contain (RollNo, Name, Marks).
Field
A single piece of data inside a record.
Example: “Name” in a student record is a field.
Key
A special data item used to uniquely identify a record.
Example: Student Roll Number.
File
A collection of records stored on a storage medium.
Example: A file containing all student records.
Data → Information → Data Item → Record → File → Data Structure.
Concept of Data Structure
Definition:
A data structure is a way of organizing, storing, and managing data so that it can
be accessed and modified efficiently.
It defines the relationship between data elements and provides operations to
manipulate them (such as insertion, deletion, searching, and updating).
Example:
• Imagine a library: Books are not kept randomly; they are organized in shelves
according to subjects or categories. This organization helps us find a book quickly.
• Similarly, in computer science, a data structure organizes data in memory so that it
can be accessed and processed effectively.
Types of Data Structure:
Primitive Data Structures:
• Primitive Data Structures are the basic building blocks for representing simple data.
• They are directly supported by most programming languages.
• All other Non-Primitive Data Structures (like arrays, linked lists, stacks, trees,
etc.) are built using these primitives.
Non-Primitive Data Structures
• Non-Primitive Data Structures are derived from primitive data types.
• They are used to handle large and complex data efficiently.
• They can be linear or non-linear depending on how data elements are arranged.
Types of Non-Primitive Data Structures
1. Linear Data Structures
A linear data structure is one in which data elements are arranged sequentially, i.e.,
one after another. Each element has a unique predecessor (previous element) and a
unique successor (next element), except the first and last elements.
Characteristics
1. Elements are arranged in a sequence.
2. Easy to traverse (visit each element one by one).
3. Memory may be contiguous (arrays) or linked dynamically (linked lists).
Examples: Array, Linked List, Stack, Queue.
1. Array:
Collection of elements of the same type stored in contiguous memory locations.
Characteristics of Array Data Structure:
• Homogeneous Elements: All elements within an array must be of the same data
type.
• Contiguous Memory Allocation: In most programming languages, elements in an
array are stored in contiguous (adjacent) memory locations.
• Zero-Based Indexing: In many programming languages, arrays use zero-based
indexing, which means that the first element is accessed with an index of 0, the
second with an index of 1, and so on.
• Random Access: Arrays provide constant-time (O(1)) access to elements. This
means that regardless of the size of the array, it takes the same amount of time to
access any element based on its index.
2. Linked List:
A Linked List is a linear data structure which looks like a chain of nodes, where
each node contains a data field and a reference(link) to the next node in the list.
Unlike Arrays, Linked List elements are not stored at a contiguous location.
Common Features of Linked List:
Node: Each element in a linked list is represented by a node, which contains two
components:
Data: The actual data or value associated with the element.
Next Pointer(or Link): A reference or pointer to the next node in the linked list.
Head: The first node in a linked list is called the "head." It serves as the starting
point for traversing the list.
3. Stack:
A stack is a linear data structure that follows the Last-In-First-Out (LIFO)
principle, meaning that the last element added to the stack is the first one to be
removed.
Stack Operations:
• push(): When this operation is performed, an element is inserted into the stack.
• pop(): When this operation is performed, an element is removed from the top of
the stack and is returned.
• top(): This operation will return the last inserted element that is at the top without
removing it.
• size(): This operation will return the size of the stack i.e. the total number of
elements present in the stack.
• isEmpty(): This operation indicates whether the stack is empty or not.
4. Queue
A queue is a linear data structure that follows the First-In-First-Out (FIFO)
principle. In a queue, the first element added is the first one to be removed.
4. Queue Operations:
• Enqueue(): Adds (or stores) an element to the end of the queue..
• Dequeue(): Removal of elements from the queue.
• Peek() or front(): Acquires the data element available at the front node of the
queue without deleting it.
• rear(): This operation returns the element at the rear end without removing it.
• isFull(): Validates if the queue is full.
• isNull(): Checks if the queue is empty.
Advantages of Linear Data Structures
• Efficient data access: Elements can be easily accessed by their position in the
sequence.
• Dynamic sizing: Linear data structures can dynamically adjust their size as
elements are added or removed.
• Ease of implementation: Linear data structures can be easily implemented using
arrays or linked lists.
• Versatility: Linear data structures can be used in various applications, such as
searching, sorting, and manipulation of data.
• Simple algorithms: Many algorithms used in linear data structures are simple
and straightforward.
Disadvantages of Linear Data Structures
• Limited data access: Accessing elements not stored at the end or the beginning
of the sequence can be time-consuming.
• Memory overhead: Maintaining the links between elements in linked lists and
pointers in stacks and queues can consume additional memory.
• Complex algorithms: Some algorithms used in linear data structures, such as
searching and sorting, can be complex and time-consuming.
• Inefficient use of memory: Linear data structures can result in inefficient use of
memory if there are gaps in the memory allocation.
• Unsuitable for certain operations: Linear data structures may not be suitable for
operations that require constant random access to elements, such as searching for
an element in a large dataset.
2. Non-Linear Data Structures
A non-linear data structure is one in which data elements are not arranged
sequentially.
Instead, they are arranged hierarchically (like a tree) or connected in a network (like
a graph).
Characteristics
1. Data elements are not stored in a straight sequence.
2. Each element may have multiple relationships.
3. Traversal is more complex compared to linear structures.
Examples: Tree, Graph.
1. Tree:
Tree data structure is a hierarchical structure that is used to represent and organize
data in the form of parent child relationship. The following are some real world
situations which are naturally a tree.
File system in a computer → Root directory → Sub-directories → Files.
2. Graph:
A graph is a non-linear data structure consisting of nodes (or vertices) and edges that
connect pairs of nodes. Graphs are widely used to represent networks, such as social
networks, communication networks, or road systems.
Basic Operations on Data Structures
1. Insertion – Adding a new element.
2. Deletion – Removing an element.
3. Traversal – Visiting each element once.
4. Searching – Finding an element.
5. Sorting – Arranging in order.
6. Merging – Combining two structures.
Why Data Structures are Important:
• Efficient data retrieval and modification.
• Optimal memory usage.
• Better program performance.
Choice of the Right Data Structure
Choosing the correct data structure is crucial for solving a problem efficiently. A
poor choice may lead to wastage of memory or slow execution time.
Factors to Consider:
1. Nature of the Problem
Example: If you need fast searching, a Hash Table or Binary Search Tree is better
than an array.
2. Operations Required
Example: If frequent insertion and deletion are needed, Linked List is better than
Array because arrays require shifting of elements.
3. Memory Usage
Example: Arrays require contiguous memory allocation, while linked lists use
dynamic memory but with extra overhead of pointers.
4. Performance (Time Complexity)
Example:
Searching in an Array (unsorted) → O(n)
Searching in a Binary Search Tree (balanced) → O(log n)
Searching in a Hash Table → O(1) on average
Examples of Choosing Data Structures
Scenario 1: Student records with roll numbers (frequent searching)
→ Use Hash Table or Binary Search Tree.
Scenario 2: Undo operation in a text editor
→ Use Stack (LIFO principle).
Scenario 3: CPU Task Scheduling
→ Use Queue (FIFO principle).
Scenario 4: Social Network connections
→ Use Graph (nodes = users, edges = friendships).
Algorithm
An Algorithm is a finite, step-by-step set of instructions to solve a specific problem.
It is language-independent and focuses on logic, not syntax.
Properties of an Algorithm:
1. Input: Zero or more inputs.
2. Output: At least one output.
3. Definiteness: Clear and unambiguous steps.
4. Finiteness: Must terminate after a finite number of steps.
5. Effectiveness: Steps should be basic and executable.
6. Clarity – Easy to understand and implement.
7. Generality – Works for all valid inputs, not just specific cases.
Steps in Designing an Algorithm
Step 1: Understand the Problem
Clearly define input, output, and the goal.
➢ What is given?
➢ What is required?
➢ What are the constraints?
Example:
Problem: Find the maximum of three numbers.
Input: Three numbers (say a, b, c)
Output: The largest number
Step 2: Define the Problem Requirements
➢ What operations are needed? (Comparison, addition, etc.)
➢ What kind of data structure might be required? (Array, List, etc.)
➢ What performance is expected? (Fast execution, less memory use?)
Step 3: Algorithm Design
➢ Break down the problem into small steps.
➢ Think logically about the order of operations.
➢ Consider multiple strategies (Brute force, Divide & Conquer, Greedy, Dynamic
Programming, etc.)
Example (Max of three numbers):
1. Compare a and b.
2. If a > b, compare a with c.
3. Else, compare b with c.
4. Whichever is greater is the maximum.
Step 4: Represent the Algorithm
• Use pseudocode or flowchart
Pseudocode Example:
Start
Algorithm Max(a, b, c):
1. If a ≥ b and a ≥ c
return a is greater
2. Else if b ≥ a and b ≥ c
return b is greater
3. Else
return c is greater
end
Flowchart can be drawn for better visualization (input → decision boxes → output).
Step 5: Refine the Algorithm
Check for correctness (does it always give the right answer?).
Check for efficiency:
Time Complexity – How fast does it run?
Space Complexity – How much memory does it use?
Step 6: Translate into Code
Implement the algorithm in a programming language .
C Example:
#include <stdio.h>
int main() {
int a, b, c;
printf("Enter three numbers: ");
scanf("%d %d %d", &a, &b, &c);
if(a >= b && a >= c)
printf("Maximum = %d", a);
else if(b >= a && b >= c)
printf("Maximum = %d", b);
else
printf("Maximum = %d", c);
return 0;
}
Step 7: Test the Algorithm
• Test with different cases:
o Normal input: (10, 20, 30) → Output = 30
o Equal values: (10, 10, 10) → Output = 10
o Negative values: (-5, -2, -9) → Output = -2
Step 8: Analyze and Optimize
• Measure performance on large inputs.
• Improve efficiency if needed (e.g., reduce comparisons, use better data structure).
Analysis of Algorithm
Apriori Analysis:
Apriori analysis means, analysis is performed prior to running it on a specific
system. This analysis is a stage where a function is defined using some theoretical
model. Hence, we determine the time and space complexity of an algorithm by just
looking at the algorithm rather than running it on a particular system with a different
memory, processor, and compiler.
Apostiari Analysis:
Apostiari analysis of an algorithm means we perform analysis of an algorithm only
after running it on a system. It directly depends on the system and changes from
system to system.
In an industry, we cannot perform Apostiari analysis as the software is generally
made for an anonymous user, which runs it on a system different from those present
in the industry.
In Apriori, it is the reason that we use asymptotic notations to determine time and
space complexity as they change from computer to computer; however,
asymptotically they are the same.
Parameters to Measure Complexities
1. Time Complexity
Time complexity is the amount of time taken by an algorithm to execute each
statement of the code till its completion. The time taken here is for the function of
the length of the input and not the actual execution time of the machine on which the
algorithm is running. The time complexity of algorithms is commonly expressed
using the Big O notation.
To calculate the time complexity, total the cost of each fundamental instruction and
the number of times the instruction is executed.
• Best case: Minimum running time.
• Worst case: Maximum running time.
• Average case: Expected running time.
Asymptotic Notations
• Asymptotic Notations are mathematical tools used to analyze the performance of
algorithms by understanding how their efficiency changes as the input size grows.
• These notations provide a concise way to express the behavior of an algorithm’s time
or space complexity as the input size approaches infinity.
• Rather than comparing algorithms directly, asymptotic analysis focuses on
understanding the relative growth rates of algorithms’ complexities.
• It enables comparisons of algorithms’ efficiency by abstracting away machine-
specific constants and implementation details, focusing instead on fundamental
trends.
• Asymptotic analysis allows for the comparison of algorithms’ space and time
complexities by examining their performance characteristics as the input size varies.
There are mainly three asymptotic notations:
1. Big-O Notation (O-notation)
2. Omega Notation (Ω-notation)
3. Theta Notation (Θ-notation)
Big Oh, O: Asymptotic Upper Bound
The notation (n) is the formal way to express the upper bound of an algorithm's
running time. is the most commonly used notation. It measures the worst case time
complexity or the longest amount of time an algorithm can possibly take to
complete.
A function f(n) can be represented is the order of g(n) that is O(g(n)), if there exists
a value of positive integer n as n0 and a positive constant c such that −
f(n)⩽c.g(n) for n>n0 in all case
Hence, function g(n) is an upper bound for function f(n), as g(n) grows faster
than f(n).
Example
Let us consider a given function, f(n)=4.n3+10.n2+5.n+1
Considering g(n)=n3,
f(n)⩽5.g(n) for all the values of n>2
Hence, the complexity of f(n) can be represented as O(g(n)), i.e. O(n3)
Big Omega, Ω: Asymptotic Lower Bound
The notation Ω(n) is the formal way to express the lower bound of an algorithm's
running time. It measures the best case time complexity or the best amount of time
an algorithm can possibly take to complete.
We say that f(n)=Ω(g(n)) when there exists constant c that f(n)⩾c.g(n) for all
sufficiently large value of n. Here n is a positive integer. It means function g is a
lower bound for function f ; after a certain value of n, f will never go below g.
Example
Let us consider a given function, f(n)=4.n3+10.n2+5.n+1.
Considering g(n)=n3, f(n)⩾4.g(n) for all the values of n>0.
Hence, the complexity of f(n) can be represented as Ω(g(n)), i.e. Ω(n3)
Theta, θ: Asymptotic Tight Bound
The notation (n) is the formal way to express both the lower bound and the upper
bound of an algorithm's running time. Some may confuse the theta notation as the
average case time complexity; while big theta notation could be almost accurately
used to describe the average case, other notations could be used as well.
We say that f(n)=θ(g(n)) when there exist
constants c1 and c2 that c1.g(n)⩽f(n)⩽c2.g(n) for all sufficiently large value of n.
Here n is a positive integer.
This means function g is a tight bound for function f.
Example
Let us consider a given function, f(n)=4.n3+10.n2+5.n+1
Considering g(n)=n3, 4.g(n)⩽f(n)⩽5.g(n) for all the large values of n.
Hence, the complexity of f(n) can be represented as θ(g(n)), i.e. θ(n3).
Common Time Complexities:
• O(1): Constant time.
• O(log n): Logarithmic.
• O(n): Linear.
• O(n log n): Log-linear.
• O(n²): Quadratic.
• O(2ⁿ): Exponential.
Space Complexity
Space complexity tells how much memory your program will need depending on
input size. It measures the total memory required by an algorithm:
• Fixed part: independent of input size (constants, program instructions, fixed-size
variables).
• Variable Part: depends on the input size (arrays, dynamic memory allocation,
recursion stack, etc.).
Space Complexity= Fixed Part +Variable Part
Example:
int sum(int arr[], int n) {
int s = 0; // fixed variable (constant space)
for(int i = 0; i < n; i++) {
s = s + arr[i]; // loop variable i
}
return s;
}
➢ Fixed space: variables, loop counter i → constant = O(1)
➢ Variable space: array arr[] of size n → O(n)
➢ Total space = O(1) + O(n) = O(n)
Array
Array is a container which can hold a fix number of items and these
items should be of the same type. Most of the data structures make use
of arrays to implement their algorithms. Following are the important
terms to understand the concept of Array.
Element − Each item stored in an array is called an element.
Index − Each location of an element in an array has a numerical index,
which is used to identify the element.
Array Representation: (Storage structure)
Arrays can be declared in various ways in different languages. For
illustration, let's take C array declaration.
Arrays can be declared in various ways in different languages. For
illustration, let's take C array declaration.
As per the above illustration, following are the important points to be considered.
• Index starts with 0.
• Array length is 10 which means it can store 10 elements.
• Each element can be accessed via its index. For example, we
can fetch an element at index 6 as 27.
Types of arrays:
• One-Dimensional Array: This is the simplest form of an array, which
consists of a single row of elements, all of the same data type. Elements
in a 1D array are accessed using a single index.
• Two-Dimensional Array: A two-dimensional array, often referred to as a
matrix or 2D array, is an array of arrays. It consists of rows and columns,
forming a grid-like structure. Elements in a 2D array are accessed using
two indices, one for the row and one for the column.
When we store a 2D array (matrix) in memory (RAM), the computer
stores it in linear memory locations (1D form).
So, there are two common ways to arrange elements:
• Row-Major Order:
In Row-major order, elements of the first row are stored first, then the
second row, and so on. C language uses Row-major order. Eg:
[5, 10, 20, 25, 30, 35, 1, 3, 4]
• Column-Major Order:
In Column-major order, elements of the first column are stored first, then
the second column, and so on. Eg:
[5, 25, 1, 10, 30, 3, 20, 35, 4]
Multi-Dimensional Array: Arrays can have more than two dimensions, leading
to multi-dimensional arrays. These are used when data needs to be organized in
a multi-dimensional grid.
Basic Operations on Array:
Following are the basic operations supported by an array.
• Traverse − print all the array elements one by one.
• Insertion − Adds an element at the given index.
• Deletion − Deletes an element at the given index.
• Search − Searches an element using the given index or by the value.
• Update − Updates an element at the given index.
Insertion Operation:
Insert operation is to insert one or more data elements into an array.
Based on the requirement, a new element can be added at the
beginning, end, or any given index of array.
Here, we see a practical implementation of insertion operation, where
we add data at the end of the array −
Algorithm
Let LA be a Linear Array r(unordered) with N elements and K is a
positive integer such that K<=N. Following is the algorithm where
ITEM is inserted into the Kth position of LA
–
1. Start
2. Set J = N - 1
3. Set N = N + 1
4. Repeat steps 5 and 6 while J >= K
5. Set LA[J + 1] = LA[J]
6. Set J = J - 1
7. Set LA[K] = ITEM
8. Stop
Example:
Array: LA = [3, 8, 12, 15, 20], N = 5
Insert ITEM = 10 at K = 2 (third position).
• Step1: Start
• Step 2: J = 4
• Step 3: N = 6
Loop (J >= 2):
• J = 4: LA[5] = LA[4] → [3, 8, 12, 15, 20, 20]; J = 3
• J = 3: LA[4] = LA[3] → [3, 8, 12, 15, 15, 20]; J = 2
• J = 2: LA[3] = LA[2] → [3, 8, 12, 12, 15, 20]; J = 1 (loop ends)
Step 7: LA[2] = 10
Final array: [3, 8, 10, 12, 15, 20]
➢ Insert at the end (K = N): The loop runs zero times; you simply place ITEM at the
new last position. This is O(1).
➢ Insert at the beginning (K = 0): Every element shifts right; this is the worst case.
➢ worst-case O(N)
Deletion Operation:
Deletion refers to removing an existing element from the array and re-
organizing all elements of an array.
Algorithm
Consider LA is a linear array with N elements and K is a positive
integer such that K<=N-1. Following is the algorithm to delete an
element available at the Kth position of LA.
1. Start
2. Set J = K
3. Repeat steps 4 and 5 while J < N-1
4. Set LA[J] = LA[J + 1]
5. Set J = J+1
6. Set N = N-1
7. Stop
Example:
LA = [10, 20, 30, 40, 50], N = 5
K = 2 (delete element at index 2 i.e., 30)
Algorithm:
• Step1: Start
• Step 2: J = K = 2
• Step 3–5 (while J < N-1):
Iteration 1: J=2 → LA[2] = LA[3] → 40
Array = [10, 20, 40, 40, 50]
Iteration 2: J=3 → LA[3] = LA[4] → 50
Array = [10, 20, 40, 50, 50]
Iteration 3: J=4 → Stop (since J < 4 fails)
• Step 6: N = N-1 = 4
Final Array = [10, 20, 40, 50] (size = 4)
Best Case
If we delete the last element (at index N-1 in 0-based indexing), then:
• No shifting is required.
• Only N = N-1 is updated.
Time Complexity = O(1)
Worst Case
If we delete the first element (at index 0), then:
• We must shift all N-1 elements one position to the left.
Time Complexity = O(N)
Update Operation:
Update operation refers to updating an existing element from the array at a given
index.
Algorithm
Consider LA is a linear array with N elements and K is a positive
integer such that K<=N. Following is the algorithm to update an
element available at the Kth position of LA.
1. Start
2. Input the array LA[0...N-1] and size N
3. Input the index K (0 ≤ K < N)
4. Input the new value ITEM
5. Set LA[K] = ITEM
6. Print "Element
Searching updated successfully"
Techniques
7. Stop
Time Complexity
Accessing element by index in an array = O(1)
Updating it is just assignment = O(1)
So, Update Operation = O(1) (constant time).
Searching:
Linear Search: Check each element sequentially until found.
Algorithm(Linear Search)
Consider LA is a linear array with N elements and K is a positive
integer such that K<=N. Following is the algorithm to find an element
with a value of ITEM using sequential search.
1. Start
2. Set J = 0
3. While J < N do
IF LA[J] = ITEM THEN
Print "ITEM found at index J"
Stop
ELSE
Set J = J + 1
• Print "ITEM not found"
4.
5. Stop
Example:
LA = [15, 30, 25, 40, 50] , N = 5, ITEM = 25
Algorithm:
J=0 → LA[0]=15 ≠ 25 → increment
J=1 → LA[1]=30 ≠ 25 → increment
J=2 → LA[2]=25 = ITEM → found!
If instead ITEM = 100, the loop will finish and output:
ITEM not found
Time Complexity:
• Best: O(1) (found at first position).
• Worst: O(n) (not found or last position).
• Average: O(n/2) ≈ O(n).
Binary Search: Works only on sorted arrays. Repeatedly divides the search range in half.
Algorithm: Sorted array LA[0…N-1], size N, search element ITEM
1. Start
2. Set low = 0, high = N-1
3. Repeat while low ≤ high
a. mid = (low + high) / 2
b. If LA[mid] == ITEM
Print "ITEM found at index mid"
Stop
c. Else If ITEM < LA[mid]
high = mid - 1
d. Else
low = mid + 1
4. Print "ITEM not found"
5. Stop
Example:
LA = [10, 20, 30, 40, 50, 60], N = 6, ITEM = 40
• Step 2: low=0, high=5
• Step 3a: mid = (0+5)/2 = 2 → LA[2]=30
• ITEM=40 > 30 → set low=3
• Step 3a: mid = (3+5)/2 = 4 → LA[4]=50
• ITEM=40 < 50 → set high=3
• Step 3a: mid = (3+3)/2 = 3 → LA[3]=40 found
Time Complexity:
➢ Each step reduces the search interval by half.
➢ Number of comparisons in worst case ≈ log₂N.
➢ Best Case: O(1) → ITEM found at mid in first comparison
➢ Worst Case: O(log N) → requires maximum divisions
➢ Average Case: O(log N)
Sparse Matrix and its representations
A matrix is a two-dimensional data object made of m rows and n
columns, therefore having total m x n values. If most of the elements of
the matrix have 0 value, then it is called a sparse matrix.
Why to use Sparse Matrix instead of simple matrix ?
▪ Storage: There are lesser non-zero elements than zeros and
thus lesser memory can be used to store only those elements.
▪ Computing time: Computing time can be saved by logically
designing a data structure traversing only non-zero elements..
Example:
00304
00570
00000
02600
Representing a sparse matrix by a 2D array leads to wastage of lots of
memory as zeroes in the matrix are of no use in most of the cases. So,
instead of storing zeroes with non-zero elements, we only store non-
zero elements. This means storing non-zero elements with triples-
(Row, Column, value).
Sparse Matrix Representations can be done in many ways following are
two common representations:
Array representation
Link list representation: