0% found this document useful (0 votes)
6 views16 pages

Understanding Graphs in Data Science

The document discusses the concepts of graphs and maps in Machine Learning and Data Science, explaining their structures, types, and applications. Graphs represent relationships between entities through nodes and edges, while maps store data in key-value pairs for efficient access and manipulation. Both are essential for tasks such as social network analysis, recommendation systems, and data preprocessing.

Uploaded by

sahil ameriya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views16 pages

Understanding Graphs in Data Science

The document discusses the concepts of graphs and maps in Machine Learning and Data Science, explaining their structures, types, and applications. Graphs represent relationships between entities through nodes and edges, while maps store data in key-value pairs for efficient access and manipulation. Both are essential for tasks such as social network analysis, recommendation systems, and data preprocessing.

Uploaded by

sahil ameriya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Unit: 02

Algorithms

Graphs
In Machine Learning and Data Science, a graph is a powerful data structure used to
represent relationships between different entities or objects.
A graph is made up of nodes (also called vertices) and edges (connections between nodes).
In simple terms —
A graph shows how things are connected to each other.

Real-Life Analogy
Imagine a social media network like Instagram:
 Each user is a node.
 Each friendship or following is an edge connecting two users.
If you have 200 followers, that means there are 200 edges connected to your node.
So, the Instagram network is nothing but a graph of users connected by relationships.
Graphs in data science help computers understand and analyze such relationships.

Basic Components of a Graph


1. Vertices (Nodes):
These represent the entities in a dataset.
Example: Users, cities, websites, or molecules.
2. Edges (Links):
These represent relationships or connections between nodes.
Example: Friendship between two people, road between two cities, or hyperlink between
two web pages.
3. Weight (Optional):
Sometimes edges have a weight that represents the strength or cost of the connection.
Example: Distance between cities, similarity score between users, etc.

Types of Graphs
Graphs can be categorized based on their structure and connections:
1. Undirected Graph
 The edges have no direction.
 If A is connected to B, then B is also connected to A.
Example: Friendship network — if you’re friends with someone, it’s mutual.
Mathematically:
If an edge (A, B) exists, then (B, A) also exists.

2. Directed Graph (Digraph)


 The edges have a specific direction.
 If A follows B, it doesn’t mean B follows A.
Example: Twitter — you may follow Elon Musk, but he might not follow you back.
Mathematically:
If edge (A → B) exists, (B → A) may not exist.

3. Weighted Graph
 Each edge has a weight (cost, distance, or similarity).
Example:
In Google Maps, cities (nodes) are connected by roads (edges) with distances as weights.

4. Unweighted Graph
 All edges are considered equal, with no specific weight.
Example:
Friendship network — each connection is just a relationship, not measured by strength.

5. Cyclic and Acyclic Graphs


 Cyclic Graph: Contains at least one loop or cycle (A → B → C → A).
 Acyclic Graph: Has no cycles.
Example:
A dependency chart in project management (where one task depends on another) forms a
Directed Acyclic Graph (DAG) — commonly used in ML pipelines.

Mathematical Representation of a Graph


A graph is generally represented as:
𝐺 = (𝑉, 𝐸)

where,
 𝑉= set of vertices (nodes)
 𝐸= set of edges (connections between nodes)
If the graph is weighted, each edge 𝑒 ∈ 𝐸has an associated weight 𝑤(𝑒).

Representation Methods in Computer Memory


Graphs are stored and processed in computers using these structures:
1. Adjacency Matrix
o A 2D matrix of size 𝑛 × 𝑛(where n = number of vertices).
o If there’s an edge between vertex i and j, then matrix[i][j] = 1 (or weight).
Example:
For 3 vertices (A, B, C):
A B C
A0 1 0
B1 0 1
C0 1 0
→ Means A connected to B, B connected to C.
2. Adjacency List
o Stores each vertex and its connected neighbors.
o More memory-efficient for large, sparse graphs.
Example:
A→ B
B → A, C
C→B

Role of Graphs in Machine Learning and Data Science


Graphs are not just data structures — they are fundamental in many ML applications where
relationships between entities are crucial.
1. Social Network Analysis
Used to model and analyze relationships among people or organizations.
Algorithms like PageRank (used by Google) or Community Detection are graph-based.
Example: Detecting groups of similar users or identifying influencers in a social network.

2. Recommendation Systems
Graphs help build recommendation models by connecting users to items.
Example:
 Netflix → Users and Movies form a graph.
 If users A and B like the same movie, the system may recommend another movie liked
by A to B.
This is achieved using Graph-based Collaborative Filtering.

3. Knowledge Graphs
Knowledge graphs store and link real-world facts in a structured form.
Used by Google, IBM Watson, and AI systems to understand relationships between entities.
Example:
For “Sachin Tendulkar”, the graph connects to Cricket, India, World Cup, Records, etc.

4. Graph Neural Networks (GNNs)


A modern advancement in Machine Learning where deep learning models are applied on
graph structures.
Used in:
 Social networks (predicting connections)
 Molecular chemistry (predicting molecular properties)
 Fraud detection in finance
Example:
Banks use GNNs to detect suspicious money transfers by analyzing patterns in transaction
graphs.
5. Data Science Pipelines
Graphs represent workflow dependencies in ML pipelines — such as data collection →
preprocessing → model training → testing.
Each process is a node, and edges represent data flow between steps.
Mathematical Concepts in Graph Algorithms
Several graph-based algorithms are used in ML preprocessing and optimization tasks:
Algorithm Purpose Mathematical Idea
Dijkstra’s Uses greedy approach; updates minimum distance
Find shortest path
Algorithm 𝑑(𝑢) = min⁡(𝑑(𝑢), 𝑑(𝑣) + 𝑤(𝑢, 𝑣))
Measure 𝑃𝑅(𝑖)
PageRank
importance of 𝑃𝑅(𝐴) = (1 − 𝑑) + 𝑑∑
Algorithm 𝐿(𝑖)
nodes
Breadth-First Level-wise
Uses queue-based exploration
Search (BFS) traversal
Depth-First
Deep exploration Uses recursion or stack
Search (DFS)
These algorithms support various ML tasks such as data clustering, feature selection, and
pattern discovery.

Real-World Applications in Machine Learning


Domain Use of Graphs
Social Media Finding friend suggestions, detecting communities
Finance Fraud detection by transaction linkage
Healthcare Analyzing protein–drug interactions
Search Engines Ranking websites (Google’s PageRank)
Recommender Systems Building user-item interaction models
Transportation Finding optimal delivery or route networks

Graphs are essential in Machine Learning and Data Science for representing and analyzing
relationships in complex datasets.
They help models understand connections, influence, and structure within data, enabling
intelligent recommendations, predictions, and discoveries.

Maps:
In Machine Learning and Data Science, handling large amounts of data efficiently is very
important.
We often need to store, retrieve, and update data based on a key — like a name, ID, or
category.
This is where a Map (also called a Dictionary or Hash Map) becomes extremely useful.
A map is a data structure that stores data in the form of key–value pairs.
In simple words —
A map helps us quickly find the value associated with a key, without searching through the
entire dataset.

Real-Life Analogy
Think of a map like a real-world dictionary:
 The word is the key.
 The meaning is the value.
When you want to find the meaning of a word, you don’t read the entire book — you directly
go to the word and get its meaning.
Similarly, in programming and data science, when you know the key, you can instantly find
its value using a map.

Structure of a Map
A map is made up of:
 Key: A unique identifier (like a name, ID, or feature label).
 Value: The data or information associated with that key.
For example:
student_scores = {
"Ankesh": 95,
"Ravi": 88,
"Neha": 92
}
Here,
 “Ankesh”, “Ravi”, “Neha” → Keys
 95, 88, 92 → Values
If we need Ankesh’s score, we can directly retrieve it using the key:
student_scores["Ankesh"] → 95

Mathematical Representation
A map can be represented as a set of ordered pairs:
𝑀 = {(𝑘1, 𝑣1), (𝑘2 , 𝑣2 ), … , (𝑘𝑛 , 𝑣𝑛 )}

where:
 𝑘𝑖 = key
 𝑣𝑖 = value
and each key 𝑘𝑖 is unique.

Operations on Maps
Operation Description Example
Insert Add a new key–value pair Add a new student’s marks
Search / Lookup Find value by key Find marks of “Ravi”
Update Modify an existing key’s value Change Neha’s marks from 92 → 94
Delete Remove a key–value pair Remove “Ravi” from the record
Traversal Access all keys or values Display all students and marks

Types of Maps
There are different kinds of map structures, each useful for specific tasks:
1. Hash Map
 Stores key–value pairs using a hashing function.
 Provides average O(1) time for search, insert, and delete operations.
Example:
Python’s dict or Java’s HashMap.
💡 Analogy: Like an index in a book — you directly jump to the page instead of searching
line by line.

2. Tree Map
 Stores key–value pairs in a sorted order (usually using Binary Search Trees).
 Operations take O(log n) time.
Example:
Java’s TreeMap, used when data must stay sorted by keys.
💡 Use Case in ML: Keeping sorted feature names or model parameters.

3. Linked Hash Map


 Maintains the insertion order of elements.
 Used when the sequence of data entry matters.
Example:
Used in caching algorithms where order of use is important (like LRU cache).

Importance of Maps in Machine Learning and Data Science


Maps are not just programming tools — they’re essential for managing large datasets
efficiently.
They allow ML algorithms to quickly access and manipulate data based on unique
identifiers (like feature names or sample IDs).
1. Feature Mapping
When training a model, features (inputs) often have names.
Maps store these as key–value pairs, linking the feature name to its numerical value.
Example:
features = {
"Age": 25,
"Income": 40000,
"Education": "Graduate"
}
Here, each key represents a feature, and its value represents the data.

2. Label Encoding and One-Hot Encoding


In data preprocessing, categorical variables are converted into numbers using a mapping.
Example:
Gender: {"Male": 0, "Female": 1}
This map allows ML algorithms to understand categorical data numerically.
Analogy:
Like giving every student a roll number so computers can identify them easily.

3. Model Parameters and Hyperparameters


During ML model training, parameters (like weights, biases, learning rates) are stored as
key–value pairs for easy access.
Example:
model_params = {
"learning_rate": 0.01,
"epochs": 50,
"optimizer": "adam"
}
This map allows algorithms to quickly retrieve configuration values during training.

4. Word Mapping in NLP


In Natural Language Processing (NLP), words are converted into numerical IDs using
maps.
Example:
word_index = {"cat": 1, "dog": 2, "apple": 3}
So, the sentence "cat eats apple" becomes [1, 4, 3] (using mapped values).
This mapping allows ML models to process text efficiently.

5. Graph and Network Data


In Graph-based ML, maps are used to store connections between nodes.
Example:
graph = {
"A": ["B", "C"],
"B": ["A", "D"],
"C": ["A"]
}
Here, the map links each node (key) with its connected nodes (values).

Mathematical Use in ML (Map Function)


In some ML and data processing frameworks (like Spark or MapReduce), the term “map”
refers to a function that applies an operation to each element of data.
Mathematically:
𝑦𝑖 = 𝑓(𝑥𝑖 )

where f is applied to every element of dataset X.


Example in Python:
data = [1, 2, 3, 4]
squared = list(map(lambda x: x**2, data))
# squared = [1, 4, 9, 16]
Analogy:
Imagine stamping each paper in a pile with a label — the map function applies the same
operation to each item.
This concept is also used in distributed ML systems to process large data in parallel.

Applications of Maps in Data Science & ML


Application Use
Data Preprocessing Encode, normalize, or label data
Feature Selection Store and retrieve feature names and values
Model Parameters Keep track of learning configurations
Word Embeddings Convert words into numerical IDs
Graph Networks Represent node–edge relationships
Distributed Computing (MapReduce) Process big data across multiple machines

Advantages of Using Maps


✅ Fast Lookup: Quickly access data using keys.
✅ Memory Efficient: Stores only necessary key–value pairs.
✅ Easy to Modify: Add or remove items dynamically.
✅ Organized Data: Keeps features and parameters neatly mapped.
✅ Supports Complex Structures: Can store even nested dictionaries (maps within maps).

A map is a key–value data structure that enables fast data access, efficient storage, and
easy data manipulation — all of which are essential in Machine Learning and Data
Science for managing datasets, features, parameters, and encoded variables.
It not only helps in programmatic operations but also plays a crucial role in data
preprocessing, NLP, and big data frameworks like MapReduce.

Map Searching:
In the world of Machine Learning and Data Science, map searching refers to the process
of finding the most efficient path, location, or relationship between data points that are
represented as nodes and edges in a map-like structure.
Here, a map doesn’t only mean a physical geographical map — it can represent any dataset
structured as connections or relationships, such as social networks, logistics routes, or
even recommendation systems.
So, map searching algorithms help us navigate through data efficiently — just like how
Google Maps helps us find the shortest or fastest route from one location to another.

Real-Life AnalogyThink about when you use Google Maps to go from your home to your
college.
You enter the start and destination points, and Google instantly shows you the best path.
Behind the scenes, it runs map searching algorithms like Dijkstra’s algorithm or A (A-
star) algorithm* to determine the shortest or fastest route based on real-time data like
distance, traffic, and speed.
In Machine Learning, similar logic is used for finding optimal relationships or paths
between data points — such as clustering locations, linking similar users, or planning
efficient delivery routes.

Why Map Searching is Important in Data Science


Map searching plays a crucial role in ML-related tasks like:
1. Recommendation Systems: Finding connections between users and products (like a
map of preferences).
2. Clustering and Graph Analysis: Identifying the shortest path or closest group of data
points.
3. Routing Problems: Used in logistics, supply chain optimization, and self-driving
vehicles.
4. Social Network Analysis: Finding the shortest link between two people (like degrees
of separation).
These applications show that map searching algorithms are not just about navigation —
they’re about optimizing decisions in complex data structures.

Common Map Searching Algorithms


1. Breadth-First Search (BFS)
 It explores all the neighboring nodes first before moving to the next level.
 Works well for unweighted graphs (where all edges have equal cost).
 Example: Finding the minimum number of stops between two metro stations.
Steps:
1. Start at the source node.
2. Visit all its neighbors.
3. Move to the next level of neighbors.
Time Complexity: O(V + E)
(where V = number of vertices, E = number of edges)

2. Depth-First Search (DFS)


 It explores as far as possible along a single branch before backtracking.
 Useful for path existence and connectivity checking.
Example:
Finding if there’s a path between two cities in a map, regardless of distance.
Steps:
1. Start at a node.
2. Go deep into one branch until you reach the end.
3. Backtrack and explore another path.
Time Complexity: O(V + E)
3. Dijkstra’s Algorithm
 Used for finding the shortest path between nodes in a weighted graph (like road
networks).
 It minimizes the total cost or distance.
Real-Life Example:
Google Maps uses Dijkstra’s algorithm to find the shortest or fastest driving route.
Mathematical Expression:
If we denote distance between nodes u and v as 𝑤(𝑢, 𝑣), Dijkstra’s algorithm minimizes:
Distance(𝑣) = min⁡[Distance(𝑢) + 𝑤(𝑢, 𝑣)]

Time Complexity: O(V²) or O(E log V) using priority queue.

4. A (A-Star) Algorithm*
 A smarter version of Dijkstra’s algorithm.
 Uses both the actual cost so far (g) and the estimated future cost (h).
 Formula for evaluation:
𝑓(𝑛) = 𝑔(𝑛) + ℎ(𝑛)

Where:
 𝑔(𝑛): Cost from start to node n
 ℎ(𝑛): Heuristic (estimated cost to goal)
Example: Used in AI games and navigation systems to plan paths quickly.

Map Searching in Machine Learning


In ML, map searching is not limited to geographical maps — it’s also used to:
 Search for optimal hyperparameters (using search grids or optimization maps).
 Navigate feature spaces to find the best combinations of data attributes.
 Optimize graph-based models like Graph Neural Networks (GNNs) for link
prediction and node classification.
Essentially, the “map” here represents the search space that the ML algorithm needs to
explore to find the most efficient or accurate model configuration.

Mathematical Concept
A map can be represented as a graph 𝐺(𝑉, 𝐸):
 𝑉= set of vertices (locations or data points)
 𝐸= set of edges (connections between them)
Each edge 𝑒 ∈ 𝐸has a weight 𝑤(𝑒)representing cost, distance, or time.
The goal of a map searching algorithm is to find the minimum cost path from source 𝑠to
destination 𝑡:
min⁡∑𝑤(𝑒)

This minimization principle is at the heart of both data optimization and navigation systems.
Application of Algorithms – Stable Marriages, Dictionaries and Hashing,
Search Trees
In Machine Learning and Data Science, algorithms are not just theoretical concepts —
they are the core tools used to efficiently store, retrieve, and process data. The applications
of algorithms extend to tasks like data matching, searching, sorting, and optimization,
which are essential for training ML models, preprocessing datasets, and managing huge data
efficiently.
In this topic, we’ll explore three key applications of algorithms that are frequently
encountered in data-driven systems:
1. Stable Marriage Problem (Matching Algorithms)
2. Dictionaries and Hashing (Fast Data Lookup)
3. Search Trees (Structured Searching)
Each of these plays a crucial role in designing efficient data structures and ensuring that ML
pipelines handle data smoothly.

1. Stable Marriage Problem (Matching Algorithms)


The Stable Marriage Problem (SMP) is a matching algorithm that helps pair two sets of
entities based on preferences — in a way that no pair would prefer each other over their
current partners.
In ML and Data Science, this concept is applied to matching problems, such as
recommender systems, task assignments, and clustering.

Real-Life Analogy
Imagine you have 4 students and 4 project mentors.
 Each student ranks mentors according to preference.
 Each mentor ranks students too.
The goal is to find a stable pairing such that no student and mentor would rather be
matched with each other than their assigned pair.
This concept forms the base of the Gale–Shapley algorithm, which ensures a stable match.

Gale–Shapley Algorithm (Stable Matching Algorithm)


Steps:
1. Each unpaired student proposes to their most preferred mentor who hasn’t yet rejected
them.
2. Each mentor reviews proposals:
o If free, accepts the proposal.
o If already paired but prefers this new proposal, they accept the new one and
reject the old.
3. Repeat until everyone is paired stably.

Mathematical Representation
If 𝑆 = {𝑠1, 𝑠2, … , 𝑠𝑛 }are students and 𝑀 = {𝑚1, 𝑚2 , … , 𝑚𝑛 }are mentors,
the goal is to find a stable matching function 𝑓: 𝑆 → 𝑀such that
no pair (𝑠𝑖 , 𝑚𝑗 )prefers each other over 𝑓(𝑠𝑖 )and 𝑓 −1(𝑚𝑗 ).

Applications in Data Science


 Matching users to suitable products in recommender systems.
 Allocating tasks or computing resources in cloud systems.
 Matching job seekers to suitable job roles based on preferences.

2. Dictionaries and Hashing


Concept
A dictionary (also known as a map in programming) stores data in key-value pairs.
This structure allows quick access to data using keys rather than searching through the entire
dataset.
In Machine Learning, dictionaries are crucial for:
 Storing datasets efficiently (e.g., feature-value mappings).
 Caching model outputs for repeated queries.
 Counting word frequencies in NLP tasks.

Real-Life Analogy
Imagine a telephone directory — you don’t scan every page to find a name; you look it up
alphabetically.
Similarly, in a dictionary (hash map), you directly jump to the correct "bucket" using a key.

Hashing Technique
Hashing is the process of converting a key into a unique index (hash code) that points to its
value in a hash table.
Hash Function Formula:
ℎ(𝑘) = 𝑘mod 𝑚

Where:
 ℎ(𝑘): hash function output (index)
 𝑘: key
 𝑚: size of hash table
If multiple keys map to the same index (collision), it’s resolved using techniques like:
 Chaining (linked lists)
 Open Addressing (linear probing)

Applications in ML and Data Science


 Fast lookup of features during model training.
 Indexing large datasets for search and retrieval.
 Counting occurrences (e.g., term frequency in NLP).
 Storing model parameters efficiently (e.g., word embeddings).
3. Search Trees
A search tree is a hierarchical data structure used to store elements in a way that allows fast
searching, insertion, and deletion operations — which are vital for data-intensive ML
tasks.

🔸 Types of Search Trees


(a) Binary Search Tree (BST)
 Each node has at most two children.
 Left child < Parent < Right child.
 Searching time depends on the height of the tree.
Time Complexity:
 Best Case: 𝑂(log⁡𝑛)
 Worst Case: 𝑂(𝑛)

(b) Balanced Trees (AVL, Red-Black Tree)


 Automatically balance themselves to maintain efficiency.
 Guarantee 𝑂(log⁡𝑛)performance for insert, delete, and search.
Example: Databases often use Red-Black trees for indexing.

(c) B-Trees and B+ Trees


 Used in databases and file systems.
 Can store multiple keys in a single node, reducing disk access.

Real-Life Analogy
Think of a library bookshelf:
 Books are arranged in order by title or ID.
 You don’t check every book; you follow an ordered path to find it quickly — just like
a search tree reduces search time in large datasets.

Applications in ML and Data Science


 Decision Trees: Fundamental models used for classification and regression.
 Database Indexing: Storing and retrieving massive data efficiently.
 Search Optimization: Used in K-D trees for nearest neighbor searches in clustering
and pattern recognition.
 Feature Mapping: Organizing data points hierarchically for quick access.
Key Takeaways
Concept Purpose Example Application
Stable Marriage Matching entities based on User-product matching, resource
Algorithm preference allocation
Dictionaries & Fast data retrieval using key-
Feature lookup, caching
Hashing value pairs
Hierarchical data structure for Decision trees, clustering,
Search Trees
fast search indexing
The application of algorithms in Machine Learning and Data Science helps create systems
that are efficient, scalable, and intelligent.
Whether it’s matching users to products (stable matching), storing and searching
massive datasets (hashing and trees), or building predictive models (decision trees) —
algorithms form the foundation for making data-driven systems smarter and faster.

Dynamic Programming:
Dynamic Programming (DP) is an algorithmic technique used to solve complex problems
by breaking them down into smaller, overlapping subproblems and solving each subproblem
only once.
The results of these smaller problems are then stored (memoized) and reused, which saves
computation time and avoids redundant work.
In the context of Machine Learning and Data Science, dynamic programming plays a vital
role in tasks such as sequence prediction, time series modeling, optimization,
reinforcement learning, and probabilistic reasoning.

Real-Life Analogy
Imagine you are climbing a staircase with 10 steps.
You can take either 1 step or 2 steps at a time.
Now, to reach the top, you can either:
 Take one step from the 9th step, or
 Take two steps from the 8th step.
If you already know how many ways there are to reach the 8th and 9th steps, you don’t need
to calculate them again — you just add them up!
That’s exactly what dynamic programming does — it remembers past results and reuses
them to build up the final solution efficiently.

When to Use Dynamic Programming


A problem can be solved using Dynamic Programming if it satisfies two main properties:
1. Overlapping Subproblems:
The same smaller problems are solved multiple times.
→ DP avoids re-solving them by storing results.
2. Optimal Substructure:
The solution to a big problem depends on the solutions to smaller problems.
→ The optimal overall solution is built from optimal smaller ones.

Basic Idea
In DP, we usually follow these steps:
1. Define the problem in terms of subproblems.
2. Store (memoize) the results of subproblems.
3. Build the final solution using these stored results.
The process can be done in two main ways:
 Top-Down (Memoization): Solve the problem recursively and store results.
 Bottom-Up (Tabulation): Solve smaller subproblems first and build the solution
iteratively.

Mathematical Expression
If a problem’s solution can be expressed as:
𝑓(𝑛) = 𝑓(𝑛 − 1) + 𝑓(𝑛 − 2)

and we store results of 𝑓(𝑛 − 1)and 𝑓(𝑛 − 2)to reuse later,


then this becomes a Dynamic Programming solution (like in Fibonacci series).
Example – Fibonacci Series:
Without DP:
𝐹(𝑛) = 𝐹(𝑛 − 1) + 𝐹(𝑛 − 2)

Each call recomputes the same values multiple times → inefficient.


With DP:
 Compute 𝐹(0)and 𝐹(1).
 Store them and build up to 𝐹(𝑛).
This reduces time complexity from O(2ⁿ) to O(n).

Example Problem – Shortest Path in a Graph


One of the most famous DP-based algorithms is the Floyd–Warshall Algorithm, which
finds the shortest paths between all pairs of vertices in a weighted graph.
If we denote the shortest distance between vertex i and j as 𝐷𝑖𝑗 , then:
𝐷𝑖𝑗 (𝑘) = min⁡[𝐷𝑖𝑗 (𝑘 − 1), 𝐷𝑖𝑘 (𝑘 − 1) + 𝐷𝑘𝑗 (𝑘 − 1)]

Here, the algorithm uses DP to build solutions progressively by considering each vertex as an
intermediate point.

Applications in Machine Learning and Data Science


Dynamic Programming is deeply embedded in many data science and AI tasks, especially
where sequential decision-making and optimization are involved.
Application Area Example / Use Case
Speech recognition, handwriting recognition (using
Sequence Prediction
Hidden Markov Models)
Predicting next values efficiently using overlapping
Time Series Analysis
patterns
Value Iteration and Policy Iteration use DP to find optimal
Reinforcement Learning
actions
Natural Language Processing Viterbi Algorithm for Part-of-Speech tagging and word
(NLP) alignment
Resource allocation, route optimization, and scheduling
Optimization Problems
tasks
Real-World Example: Viterbi Algorithm (NLP)
In speech recognition or text decoding, we often need to find the most probable sequence
of hidden states (like phonemes or words) that could produce a given sequence of
observations (sounds or letters).
The Viterbi algorithm, based on DP, efficiently finds this optimal sequence without
exploring all possible combinations.

Comparison with Other Techniques


Technique Principle Example
Divide and Conquer Divide into independent subproblems Merge Sort, Quick Sort
Dynamic Divide into overlapping subproblems and Fibonacci, Shortest Path,
Programming reuse results Viterbi
Choose locally optimal choice at each Dijkstra’s, Huffman
Greedy Algorithms
step Coding

Advantages of Dynamic Programming


✅ Reduces time complexity by avoiding recomputation.
✅ Ensures optimal solutions for complex problems.
✅ Highly useful in optimization and probabilistic modeling.

Disadvantages
❌ Requires more memory to store intermediate results.
❌ Needs careful problem formulation to identify subproblems.
❌ Sometimes overkill for simple or non-overlapping problems.

Key Takeaways
 Dynamic Programming is an algorithm design strategy, not a specific algorithm.
 It works by breaking problems into smaller parts, solving once, and reusing
results.
 Plays a crucial role in optimization, sequential data, and probabilistic models in
ML.
 Used in famous algorithms like Viterbi, Bellman–Ford, Floyd–Warshall, and Value
Iteration in reinforcement learning.

You might also like