DS Full Theory Notes
DS Full Theory Notes
UNIT I UNIT II
• Intro to Data Structures & Terminology • Stacks & Applications
• Algorithms, Analysis & Asymptotic Notations • Prefix, Postfix, Infix Expressions
• Arrays & Sparse Matrices • Recursion & Iteration
• Linked Lists (SLL, DLL, CLL) • Queues & Circular Queues
• Polynomial Representation • Searching: Sequential, Binary, Index
• Hashing Concepts
UNIT I — Introduction to Data Structures
Data
Data is a collection of raw, unprocessed facts and figures that have no meaning by themselves. Data can be
numbers, characters, symbols, images, or any other form. For example, '45', 'Suraj', 'Lucknow' are all data
items. Individually, they tell us nothing without context.
Information
Information is data that has been processed, organized, and given meaning so that it becomes useful for
decision-making. When we say 'Suraj lives in Lucknow and scored 45 marks', that is information — because it is
meaningful and tells us something useful. The key transformation is: Data + Processing + Context =
Information.
Entity
An entity is a person, place, object, or concept that has some characteristics or attributes. For example, a
Student is an entity. The attributes of the student entity could be: Roll Number, Name, Age, Branch, Marks, etc.
In databases and data structures, entities are the objects we store data about.
Data Type
A data type defines the kind of value a variable can hold and the operations that can be performed on it. Every
programming language provides built-in data types. For example: int (stores whole numbers like 5, 100), float
(stores decimal numbers like 3.14), char (stores a single character like 'A'), double (stores large decimal
numbers), and boolean (stores true or false). Data types are essential because they tell the computer how
much memory to allocate and how to interpret the stored bits.
An Abstract Data Type (ADT) is a high-level description of a data structure that specifies WHAT operations are
performed, but NOT HOW those operations are implemented. An ADT defines the logical behavior (interface)
without worrying about implementation details. For example, Stack ADT says: you can push, pop, and peek —
but it doesn't say whether arrays or linked lists are used underneath. This separation of interface from
implementation is called abstraction, and it is a key principle in computer science.
KEY Build-in types (int, float) = given by language. ADT (Stack, Queue, Tree) = defined by
DIFFERENCE programmer at logical level. The implementation of ADT can vary (array-based or
pointer-based) but the behavior remains the same. 1.2
Types of
Data Structures
A data structure is a systematic way of organizing, storing, and managing data so that it can be accessed and
modified efficiently. Choosing the right data structure is crucial for writing efficient programs. Data structures
are broadly divided into two categories: Linear and Non-Linear.
Definition of Algorithm
An algorithm is a well-defined, finite sequence of instructions that takes some input, processes it, and produces
the correct output to solve a given problem. The word 'algorithm' comes from the name of the 9th-century
Persian mathematician Al-Khwarizmi. An algorithm is independent of programming language — it is a logical
plan written in English, pseudocode, or flowcharts before actual coding begins.
Properties of a Good Algorithm
For a set of steps to qualify as an algorithm, it must satisfy five essential properties:
• Input: An algorithm must have zero or more inputs. These are the values provided to the algorithm
before it starts. Example: For a sorting algorithm, the input is an unsorted array.
• Output: An algorithm must produce at least one output — the result of the computation. Example: A
sorting algorithm outputs the sorted array.
• Definiteness: Every step of the algorithm must be clear, precise, and unambiguous. There must be no
confusion about what each step does. Vague steps like 'do something' are not allowed.
• Finiteness: The algorithm must terminate (stop) after a finite number of steps. An infinite loop is not an
algorithm. Every path through the algorithm must eventually reach an end.
• Effectiveness: Each step must be basic enough to be carried out by a person with a pen and paper. Steps
should be simple, executable actions — not vague concepts.
Algorithm vs Program
Algorithm Program
A plan or design phase Implementation phase — actual code
Language independent (pseudocode, English) Written in a specific language like C, Java,
Python
Does not run on a computer Runs and executes on a computer
Created during design/planning stage Created during coding stage
Focuses on WHAT to do and in WHAT order Focuses on HOW to code it in a specific
language
Example: Steps to find largest number Example: C code to find largest number
There are several general strategies (paradigms) for designing algorithms. Choosing the right technique for a
problem can dramatically improve efficiency.
2. Greedy Method
At each step, the greedy algorithm makes the locally optimal (best at that moment) choice, hoping it leads to a
globally optimal solution. It does not reconsider past choices. This method is fast but does not always give the
best solution.
Examples: Dijkstra's Shortest Path, Kruskal's Minimum Spanning Tree, Activity Selection Problem, Huffman
Coding.
4. Backtracking
Backtracking incrementally builds candidates for solutions and abandons a candidate (backtracks) as soon as it
determines the candidate cannot lead to a valid solution. It explores all possibilities in a tree-like manner.
Examples: N-Queens Problem, Sudoku Solver, Rat in a Maze, Graph Coloring.
Performance analysis means measuring the efficiency of an algorithm. We analyze two aspects:
• Time Complexity: How much TIME (number of operations) an algorithm takes as input size grows. We
don't measure actual seconds because that depends on hardware. Instead, we count operations.
• Space Complexity: How much MEMORY (RAM) an algorithm uses as input size grows. This includes both
the input data and any extra memory used during computation.
We use Asymptotic Analysis — measuring performance as input size 'n' approaches infinity (grows very large).
This gives us a general, machine-independent measure.
Asymptotic Notations
Notation Name Meaning
O (Big-O) Big-Oh WORST case upper bound. Maximum
time/space the algorithm will ever need. Most
commonly used in practice.
Ω (Omega) Big-Omega BEST case lower bound. Minimum time/space
the algorithm will need.
Θ (Theta) Big-Theta AVERAGE / TIGHT bound. Algorithm always
takes this much time, not more, not less.
o (little-o) Little-oh Strict upper bound — algorithm is definitely
faster than this (not equal).
EXAM O(1) < O(log n) < O(n) < O(n log n) < O(n²) < O(n³) < O(2ⁿ) < O(n!) → Left = FASTER /
FORMULA BETTER
1.6 Arrays
Definition
An array is the simplest and most widely used linear data structure. It is a collection of elements of the SAME
data type stored in CONTIGUOUS (consecutive/adjacent) memory locations. Each element in an array is
identified by an index (also called subscript), which starts from 0 in most programming languages.
Arrays are static in nature — their size is fixed at the time of declaration and cannot be changed during
program execution. The advantage is that any element can be accessed directly using its index in O(1) constant
time, which is called random access.
• Row Major Order: All elements of ROW 0 are stored first, then ROW 1, then ROW 2, and so on. Used by
C, C++, Python.
• Column Major Order: All elements of COLUMN 0 are stored first, then COLUMN 1, then COLUMN 2, etc.
Used by FORTRAN, MATLAB.
Application of Arrays
• Storing and accessing large collections of data (student marks, temperatures)
• Implementation of other data structures: Stacks and Queues can be implemented using arrays
• Matrices for scientific and mathematical computation
• String manipulation (a string is an array of characters)
• Lookup tables and hash tables
WHY IT Sparse matrix storage reduces memory from O(n²) to O(number of non-zero Sparse
MATTERS elements). This is critical in scientific computing, graph algorithms, and machine
learning where matrices can be millions × millions in size.
Matrices
A sparse matrix is a 2D array (matrix) in which MOST of the elements have a value of ZERO. For example, a
100×100 matrix (10,000 elements) where only 200 elements are non-zero is a sparse matrix. Storing all 10,000
values wastes memory.
Solution: Store only the NON-ZERO elements along with their positions. A common representation is a Triplet
(3-column) table where each non-zero element is stored as (row, column, value).
The first node of a linked list is called the HEAD. The last node's link is set to NULL, indicating the end of the list.
Linked lists grow and shrink dynamically — memory is allocated when a new node is needed and freed when a
node is deleted.
Operations like polynomial addition and subtraction can be done by traversing both polynomial linked lists
simultaneously, comparing exponents, and adding coefficients of matching exponent terms.
LIFO Imagine you push: 10, 20, 30 onto the stack. Stack (bottom to top): [10, 20, 30]. Now
EXAMPLE Pop: you get 30 first, then 20, then 10. The LAST pushed (30) comes OUT first.
2.1 Stacks
A stack has one end called the TOP. All insertions and deletions happen at the TOP only. A stack is called an
Abstract Data Type (ADT) because it defines the operations (push, pop) without specifying implementation
details.
What is Recursion?
Recursion is a programming technique where a function calls ITSELF to solve a smaller version of the same
problem. Recursion works by breaking a big problem into smaller identical sub-problems until a BASE CASE is
reached. The base case is the simplest version of the problem that can be solved directly without further
recursion.
Every recursive function has two essential parts: (1) Base Case — the condition where recursion STOPS.
Without this, recursion becomes infinite. (2) Recursive Case — the part where the function calls itself with a
reduced/simpler input, moving towards the base case.
When a function calls itself, the current state (local variables, return address) is saved on the SYSTEM CALL
STACK. Each recursive call creates a new stack frame. When the base case is reached, the stack unwinds —
each frame returns its result to the caller.
Principles of Recursion
• Each recursive call must work on a SMALLER version of the problem.
• There must always be a BASE CASE that terminates the recursion.
• The recursive calls must converge towards the base case.
• The stack grows with each call — deep recursion can cause Stack Overflow.
Tail Recursion
A recursive function is called tail-recursive if the recursive call is the LAST operation performed by the function
(no computation after the recursive call). Tail recursion is important because compilers can optimize it into a
simple loop (iterative code), avoiding stack growth. This optimization is called Tail Call Optimization (TCO).
Types of Recursion
Type Description
Direct Recursion Function A calls function A directly.
Indirect Recursion Function A calls B, and B calls A (A → B → A
→ ...)
Tail Recursion Recursive call is the very LAST statement — can
be optimized to iteration.
Non-tail Recursion Operations performed after the recursive call
returns.
HANOI Hanoi(3 disks) = 2³ - 1 = 7 moves. Hanoi(10 disks) = 2¹⁰ - 1 = 1023 moves. Time
SHORTCUT complexity = O(2ⁿ). This is EXPONENTIAL — grows VERY fast.
Classic Examples
Fibonacci Series: fib(0)=0, fib(1)=1, fib(n) = fib(n-1) + fib(n-2). This is tree recursion — each call generates two
more calls. Time: O(2ⁿ). Can be improved with Dynamic Programming.
Binary Search (Recursive): Divide the sorted array in half. If target == mid, return. If target < mid, recurse on left
half. If target > mid, recurse on right half. Base case: array is empty. Time: O(log n).
Tower of Hanoi: Move n disks from Source peg to Destination peg using a Helper peg, following the rule that a
larger disk can never be placed on a smaller disk. Solution: Move top (n-1) disks to Helper, move nth disk to
Destination, move (n-1) disks from Helper to Destination. Moves = 2ⁿ − 1. Time: O(2ⁿ).
Recursion vs Iteration
Recursion Iteration
Uses function call stack (extra memory) No extra stack memory — uses loop variables
only
Each call creates a new stack frame Constant memory — O(1) space for loop
Code is shorter and more elegant for some Code can be longer but more efficient
problems
Risk of Stack Overflow for deep recursion No stack overflow risk
Best for: Trees, Graphs, Divide & Conquer, Best for: Simple repetition, counting, summing
Backtracking
Example: Merge Sort, Tree traversal, Tower of Example: Factorial with for loop, array traversal
Hanoi
2.4 Queues
Queue Operations
Operation What It Does Detail
Create Initialize empty queue Set FRONT = -1 and REAR = -1.
Enqueue (Add) Insert at REAR end Check if FULL. If not, increment REAR
and insert element at queue[REAR].
Dequeue (Delete) Remove from FRONT end Check if EMPTY. If not, read element at
FRONT, then increment FRONT.
isEmpty() Check if queue empty Returns TRUE if FRONT > REAR or
FRONT == -1.
isFull() Check if queue full Returns TRUE if REAR == MAX_SIZE - 1
(for array implementation).
2.5 Searching
Searching is the process of finding whether a specific element (called the search key or target) exists in a given
collection of data, and if so, finding its position (index or location). Searching is one of the most fundamental
operations in computer science.
The choice of searching algorithm depends on whether the data is sorted or unsorted, the size of the data, and
whether the data is in an array or a linked list.
• Algorithm: Start at index 0. Compare key with A[i]. If match: return i. If not: move to next (i++). Repeat
until end of array.
• Best Case: O(1) — key is found at the very first position.
• Worst Case: O(n) — key is at the last position or not present at all.
• Average Case: O(n/2) ≈ O(n) — on average, check half the elements.
• Works on: Both sorted and unsorted arrays; can also be used on linked lists.
• No pre-condition: Does not require the data to be sorted.
2. Binary Search
Binary Search is a much faster searching algorithm but has a strict pre-condition: the array MUST BE SORTED.
The algorithm works by repeatedly dividing the search range in HALF. It compares the search key with the
MIDDLE element. If the key equals the middle, the search is successful. If the key is LESS than middle, search
BINARY Sorted Array: [2, 5, 8, 12, 16, 23, 38, 56, 72, 91]. Search for 23. Step 1: low=0, high=9, the LEFT
SEARCH mid=4, A[4]=16. 23>16 → low=5. Step 2: low=5, high=9, mid=7, A[7]=56. 23<56 → half. If
EXAMPLE high=6. Step 3: low=5, high=6, mid=5, A[5]=23. FOUND at index 5! Only 3 the key is
comparisons instead of 6 in linear search. GREATER
than
middle, search the RIGHT half. This halving continues until the key is found or the range is empty.
• Algorithm: Set low=0, high=n-1. Compute mid = (low+high)/2. Compare key with A[mid]. If equal: return
mid. If key < A[mid]: high = mid-1. If key > A[mid]: low = mid+1. Repeat until low > high.
• Best Case: O(1) — key is at the middle on first comparison.
• Worst Case: O(log n) — key is found or not found after log₂(n) comparisons.
• Pre-condition: Array MUST be sorted. Binary Search on unsorted data gives wrong results.
This method works best on large sorted files stored on disk (like file systems or old databases). The index table
fits in memory and reduces the number of disk accesses. Time complexity is better than pure sequential but
slightly less efficient than binary search in theory — however, for file-based systems it can be faster because of
reduced disk I/O.
The DATA STRUCTURE used in hashing is called a HASH TABLE — an array of fixed size. The position where an
element is stored is determined by: hash_index = hash_function(key). A simple and common hash function is
the division method: hash(key) = key % table_size.
Collision
A COLLISION occurs when two different keys are mapped to the SAME position (index) in the hash table.
Example: If table size = 10, then both key=15 and key=25 map to index 5 (both give 5 when modulo 10 is
taken). Collisions are unavoidable in hashing — the goal is to HANDLE them efficiently.
All the best, Suraj! Read each definition once, then focus on the comparison tables and example
boxes. You will do great! 🎯