0 ratings 0% found this document useful (0 votes) 6 views 13 pages KNN HMM Notes
The document provides an overview of K Nearest Neighbors (KNN) and Hidden Markov Models (HMM), detailing their core principles, algorithms, advantages, limitations, and applications. KNN is a lazy learning algorithm used for classification and regression based on distance metrics, while HMM is a statistical model for sequential data that infers hidden states from observable outputs. Key takeaways include the importance of feature scaling in KNN and the handling of temporal dependencies in HMM.
AI-enhanced title and description
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save KNN_HMM_Notes For Later
Pattern Recognition:
KNN & HMM Study Notes
K Nearest Neighbors KNN
Basic Concepts
K Nearest Neighbors KNN is a supervised machine learning algorithm used for both classification
and regression tasks. It's called a "lazy learning" algorithm because it doesn't build an explicit model
during training - instead, it stores the entire dataset and makes predictions at query time.
Core Principle
KNN operates on the assumption that similar data points exist close to each other in the feature
space. When classifying a new data point, KNN finds the K closest training examples and makes a
prediction based on their labels.
Key Characteristics:
Non-parametric: Makes no assumptions about data distribution «
Instance-based: Uses training instances directly for prediction
« Lazy learning: No explicit training phase
« Memory-based: Stores entire training dataset
Algorithm Steps
The KNN algorithm follows these steps:
Choose the value of K (number of nearest neighbors)
Calculate distances between the query point and all training points,
Sort distances in ascending order
Select K nearest neighbors based on smallest distances
Make prediction:
© Classification: Majority voting among K neighbors
o Regression: Average (or weighted average) of K neighbor valuesDistance Metrics
The choice of distance metric is crucial for KNN performance:
1, Euclidean Distance Most Common)
When to use:
« Continuous numerical data »
Well-scaled features
« Default choice in most libraries
2, Manhattan Distance
Formula: (p,q) = Epa] When
to-use:
© Grid-like data structures
« Less sensitive to outliers «
High-dimensional spaces,
3. Minkowski Distance Generalized)
Formula: d(p,q) = Eip-ipPy* Up
Special eases:
«p> 1 Manhattan distance «
p=2 Euclidean distance
« p=: Chebyshev distance
4. Cosine Similarity
When to use:
« Text analysis
« High-dimensional sparse data
« When direction matters more than magnitudeWorked Example: KNN
Problem: Classify a new tissue sample with attributes: Acid Durability = 3, Strength = 7
‘Training Data:
X,Acid Durbiliy) X,Suength) Class
7 7 Bad
7 4 Bad
3 4 Good
1 4 Good
Solution K 3
Step 1: Calculate Euclidean distances from query point 3, 7
ed V3 7274777) =V16 0-40 0d=
00 d= 1337
312+744]
116 9=
Dede
V377+747
+74%J=109
3.6
V4
Step 2: Sort by distance:
Point 3: distance = 3.0, class = Good
Point 4: distance
Point I: distance = 4.0, class = Bad
3.6, class = Good
Step 3: Select K 3 nearest neighbors (all points)
Step 4: Majority voting: 2 Good, 1 Bad — Prediction: GoodAdvantages and Limitations: KNN
Advantages
« Simple to understand and implement
« No assumptions about data distribution
« Effective for non-linear decision boundaries
« Can handle multi-class problems naturally
« Good performance with sufficient data
« Useful for both classification and regression
Limitations
« Computationally expensive O(n for each prediction)
« Sensitive to feature scaling
« Curse of dimensionality (poor performance in high dimensions)
« Sensitive to noisy data and outliers
« Requires large memory to store training data
(0 explicit model (no interpretable parameters)
« Poor performance with imbalanced datasets
Key Assumptions
« Similar data points have similar labels
+ Local neighborhood structure is meaningful
« All features are equally important (unless weighted)Applications: KNN
1. Recommendation Systems
« Netflix/Amazon: "Users who liked X also liked Y"
‘commerce: Product recommendations based on user similarity »
Music streaming: Song recommendations
2. Anomaly Detection
« Fraud detection: Identify unusual transaction patterns
Network seeurity: Detect abnormal network traffic
« Quality control: Identify defective products
3. Text Classification
« Spam detection: Email classification
« Document categorization: News article classification »
Sentiment analysis: Customer review analysis
4. Computer Vision
« Image classification: Handwritten digit recognition «
Face recognition: Identity verification systems
« Medical ima
: Disease diagnosis from X-rays/MRIs
5. Bioinformatics
« Gene classification: Identify gene functions
« Protein structure prediction: Classify protein families «
Drug discovery: Find similar molecular compounds
Question: Numerical Problem
Given the following 2D dataset, classify point 2, 3 ) using KNN with K 3 and Euclidean distance
XX Chase
1ro2 a
3 |4i8
2 108
43 8BSolution Process:
Calculate distances from 2,3
Sort distances
Select 3 neare:
neighbors
Apply majority votingHidden Markov Models HMMs
Basic Concepts: HMM
A Hidden Markov Model HMM is a statistical model for sequential data where the system transitions
between hidden (unobservable) states over time, with cach state producing observable outputs. HMMs
are particularly powerful for modeling temporal dependencies and patterns in sequences
Core Components:
« Hidden States: Unobservable system states
« Observations: Observable outputs generated by states
« Temporal Dependencies: Current state depends only on previous state Markov property)
Key Insight:
We can't see the hidden states direetly, but we can infer them from the sequence of observations
they produce.
Components of HMM
An HMM is defined by five elements Q, V, m, A, B
[Link] Q
© Q= {a1, 42,
« Example: Sunny, Rainy, Cloudy} for weather
an}: Set of N hidden states
2. Observations V
ov
V2, V2, nn Vn}: Set of M possible observations «
Example: Walk, Shop, Clean} for activities
»
initial State Probabilities (m)
[711, 2, um Rn]: Probability of starting in each state «
m= P(qx = 5): Probability of starting in state i
4. Transition Probabilities A
# A= [ai}: NN matrix of state transition probabilities
# y= P(qusa =5) | qt = 8): Probability of transitioning from state i to state j5. Emission Probabilities B
« B= [by(k)]: N M matrix of observation probabilities
# by(k) = P(ot = vie | qt =5): Probability of observing symbol k in state j
Key Algorithms
1, Forward Algorithm.
Purpose: Calculate the probability that the HMM generates a given observation sequence Time
Complexity: © N?T) where
= number of states, T= sequence length
2. Viterbi Algorithm
Purpose: Find the most likely sequence of hidden stat
s given observations Steps:
Initialization: Set initial probabilities
Recursion: For each time step, find most probable path to each state
Termination: Select state with highest probability at final step
Backtracking: Trace back the optimal path Time
Complexity: 0 N'T
3. Baum-Welch Algorithm Forward-Backward)
Purpose: Learn HMM parameters from observation sequences (unsupervised learning)
‘Type: Expectation-Maximization EM) algorithm variant
Use Case: When you have observation sequences but don't know the model parameters.Worked Example: HMM
Problem: Weather prediction using activities Setup:
« States: S
ainy, Sunny}
« Observations: © = Walk, Shop, Clean}
« Observation Sequence: Walk, Shop, Clean)
Model Parameters:
Initial Probabilities (x):
« m(Rainy) = 0.6
4
« m(Sunny) =
‘Transition Matrix A:
Rainy Sunny
Rainy 0703
Sumy 04 06
Emission Matrix B
Walk Shop Clean
Rainy 01 04 0s
Walk Shop Clean
Sumy 06 03,
Viterbi Algorithm Solution:
Step 1 Initialization (t=1, observation=Walk)
« V: Rainy = m(Rainy) * B Rainy, Walk) = 0.6 0.1 0.06
# V; Sunny = (Sunny) * B Sunny,Walk) = 0.4 0.6 0.24 Step 2
hop)
Recursion ((=2, observatio
« V: Rainy = max[V; Rainy)*A Rainy,Rainy), V; Sunny)A Sunny,Rainy)] x B Rainy,Shop)
= max|0.06 0.7, 0.24 0.4 x 0.4 = max 0.042, 0.096 x 0.4 0.096 0.4 0.0384
# Vz Sunny ~ max{0.06 0.3, 0.24 0.6 x 0.3 0.144 0.3 0.0432 Step 3
Recursion (t=3, observation=Clean)
«Vs Rainy = max{0.0384 0.7, 0.0432 0.4 * 0.5 0.02688 0.5 0.01344
# Vs Sunny = max[0,0384 0.3, 0.0432 0.6 x 0.1 0.02592 0.1 0.002592Result: Most likely state sequence is determined by backtracking from the highest final
probability
Advantages and Limitations: HMM
Advantages
« Handles sequential data effectively
« Models temporal dependencies naturally
« Unsupervised learning capability Baum-Welch)
« Probabilistic framework provides uncertainty measures
Efficient algorithms for inference and learning
« Interpretable structure with clear state meanings
Li
itations
« Markov assumption: Current state depends only on previous state
« Static parameters: Transition and emission probabilities don't change over time «
Independence assumption: Observations are independent given the state
« Model selection: Choosing number of states is challenging «
Local optima: Baum-Welch can get stuck in local optima
« Limited memory: Only remembers one previous state (first-order)
Key Assumptions
# Markoy Property: P(geslqi.-44) = P(qesla)
« Stationarity: Transition probabilities don't change over time
« Independence: Observations are independent given the hidden stateApplications: HMM
1. Speech Recognition
« Hidden States; Phonemes, words, or linguistic units
« Observations: Audio features MFCCs, spectrograms)
« Applications: Voice assistants Siri, Alexa), dictation software
2. Natural Language Processing
« Part-of-Speech Tagging: Hidden states = POS tags, observations = words «
‘Named Entity Recognition: Identify person/location/organization names
« Machine Translation: Model phrase alignments
3. Bioinformat
s
« Gene Finding: Identify coding vs non-coding regions in DNA
« Protein Structure Prediction: Predict secondary structure from sequence »
Sequence Alignment: Find similar regions in biological sequences
# CpG Island Detection: Identify methylation sites,
4, Finance
« Market Regime Detection: Bull/bear market states
« Algorithmic Trading: Model price movement patterns «
Risk Management: Detect market volatility regimes
Computer Vision
« Gesture Recognition: Model temporal sequence of hand movements «
Activity Recognition: Human behavior analysis in videos
« Object Tracking: Track objects actoss video fraKNN vs HMM Comparison
Aspect KNN
Type Supervised Leaning
Data Structure Independent instances
Training
Memory
Assumptions Similariy in feature space
Computational Complexity O(0) per prediction
KNN
No temporal modeling
Interpretabilty Distnce-hased similarity
Output Class labels o values
Scalability Poor with large datasets
Noise Sensitivity High (sensitive to outliers)
Feature Dependencies Assumes independence
Applications Classification, regression
Model Complexity Simple, parameter free
Uncertainty Quantification Limited (distance-based)
When to Choose KNN vs HMM
‘Choose KNN when:
« Data points are independent
# You have labeled training data »
Quick prototyping is needed
« Non-linear decision boundaries exist «
Sufficient training data is available
Choose HMM when:
« Data has temporal/sequential structure »
Hidden states generate observations
# You need to model state transitions
« Uncertainty quantification is important ©
Working with time series or sequences
HMM
Probabilistic Model
Sequental/Temporal data
Parameter estimation required
Stores model parameters only
Markov property stationarity
© N=) for algorithms
uM
Explicit temporal dependencies
Probabilistic state transitions
State sequences, probabilities
Good for ong sequences
Moderate (probabilistic smoothing)
Models depend
rough states
Sequence analysis, time series
Moderate (requires parameter tuning)
Natural (probabilistic framework)Key Takeaways
KKNN Quick Review
« Lazy learner: No training phase, stores all data
« Distance-based: Uses similarity metrics for classification/regression «
Parameter K: Critical choice affecting bias-variance tradeoff
« Feature scaling: Essential for distance calculations
« Applications: Recommendation systems, anomaly detection, classification
HMM Quick Review
« Sequential model: Handles temporal dependencies in data
« Hidden states: Unobservable states generate observable outputs «
Three problems: Evaluation, decoding, learning
«Key algorithms: Forward, Viterbi, Baum-Welch
« Applications: Speech recognition, bioinformatics, NLP
Exam Tips:
# Understand the five components of an HMM «
Know which algorithm solves which problem «
Practice Viterbi algorithm calculations
« Remember the Markov assumption
« Understand transition vs emission probabilities
References
#-~text=-The%20kY 51%20nei: Y
20(KNN)%20al gorithm%20is%20a%20non,used%20in%20machine%20learning%20
today.
[Link]
[Link] Ll pdf
hittps://www. [Link]/machine-learning/k-nearest-neighbours/