0% found this document useful (0 votes)
6 views13 pages

KNN HMM Notes

The document provides an overview of K Nearest Neighbors (KNN) and Hidden Markov Models (HMM), detailing their core principles, algorithms, advantages, limitations, and applications. KNN is a lazy learning algorithm used for classification and regression based on distance metrics, while HMM is a statistical model for sequential data that infers hidden states from observable outputs. Key takeaways include the importance of feature scaling in KNN and the handling of temporal dependencies in HMM.

Uploaded by

kileodlex
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
6 views13 pages

KNN HMM Notes

The document provides an overview of K Nearest Neighbors (KNN) and Hidden Markov Models (HMM), detailing their core principles, algorithms, advantages, limitations, and applications. KNN is a lazy learning algorithm used for classification and regression based on distance metrics, while HMM is a statistical model for sequential data that infers hidden states from observable outputs. Key takeaways include the importance of feature scaling in KNN and the handling of temporal dependencies in HMM.

Uploaded by

kileodlex
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
Pattern Recognition: KNN & HMM Study Notes K Nearest Neighbors KNN Basic Concepts K Nearest Neighbors KNN is a supervised machine learning algorithm used for both classification and regression tasks. It's called a "lazy learning" algorithm because it doesn't build an explicit model during training - instead, it stores the entire dataset and makes predictions at query time. Core Principle KNN operates on the assumption that similar data points exist close to each other in the feature space. When classifying a new data point, KNN finds the K closest training examples and makes a prediction based on their labels. Key Characteristics: Non-parametric: Makes no assumptions about data distribution « Instance-based: Uses training instances directly for prediction « Lazy learning: No explicit training phase « Memory-based: Stores entire training dataset Algorithm Steps The KNN algorithm follows these steps: Choose the value of K (number of nearest neighbors) Calculate distances between the query point and all training points, Sort distances in ascending order Select K nearest neighbors based on smallest distances Make prediction: © Classification: Majority voting among K neighbors o Regression: Average (or weighted average) of K neighbor values Distance Metrics The choice of distance metric is crucial for KNN performance: 1, Euclidean Distance Most Common) When to use: « Continuous numerical data » Well-scaled features « Default choice in most libraries 2, Manhattan Distance Formula: (p,q) = Epa] When to-use: © Grid-like data structures « Less sensitive to outliers « High-dimensional spaces, 3. Minkowski Distance Generalized) Formula: d(p,q) = Eip-ipPy* Up Special eases: «p> 1 Manhattan distance « p=2 Euclidean distance « p=: Chebyshev distance 4. Cosine Similarity When to use: « Text analysis « High-dimensional sparse data « When direction matters more than magnitude Worked Example: KNN Problem: Classify a new tissue sample with attributes: Acid Durability = 3, Strength = 7 ‘Training Data: X,Acid Durbiliy) X,Suength) Class 7 7 Bad 7 4 Bad 3 4 Good 1 4 Good Solution K 3 Step 1: Calculate Euclidean distances from query point 3, 7 ed V3 7274777) =V16 0-40 0d= 00 d= 1337 312+744] 116 9= Dede V377+747 +74%J=109 3.6 V4 Step 2: Sort by distance: Point 3: distance = 3.0, class = Good Point 4: distance Point I: distance = 4.0, class = Bad 3.6, class = Good Step 3: Select K 3 nearest neighbors (all points) Step 4: Majority voting: 2 Good, 1 Bad — Prediction: Good Advantages and Limitations: KNN Advantages « Simple to understand and implement « No assumptions about data distribution « Effective for non-linear decision boundaries « Can handle multi-class problems naturally « Good performance with sufficient data « Useful for both classification and regression Limitations « Computationally expensive O(n for each prediction) « Sensitive to feature scaling « Curse of dimensionality (poor performance in high dimensions) « Sensitive to noisy data and outliers « Requires large memory to store training data (0 explicit model (no interpretable parameters) « Poor performance with imbalanced datasets Key Assumptions « Similar data points have similar labels + Local neighborhood structure is meaningful « All features are equally important (unless weighted) Applications: KNN 1. Recommendation Systems « Netflix/Amazon: "Users who liked X also liked Y" ‘commerce: Product recommendations based on user similarity » Music streaming: Song recommendations 2. Anomaly Detection « Fraud detection: Identify unusual transaction patterns Network seeurity: Detect abnormal network traffic « Quality control: Identify defective products 3. Text Classification « Spam detection: Email classification « Document categorization: News article classification » Sentiment analysis: Customer review analysis 4. Computer Vision « Image classification: Handwritten digit recognition « Face recognition: Identity verification systems « Medical ima : Disease diagnosis from X-rays/MRIs 5. Bioinformatics « Gene classification: Identify gene functions « Protein structure prediction: Classify protein families « Drug discovery: Find similar molecular compounds Question: Numerical Problem Given the following 2D dataset, classify point 2, 3 ) using KNN with K 3 and Euclidean distance XX Chase 1ro2 a 3 |4i8 2 108 43 8B Solution Process: Calculate distances from 2,3 Sort distances Select 3 neare: neighbors Apply majority voting Hidden Markov Models HMMs Basic Concepts: HMM A Hidden Markov Model HMM is a statistical model for sequential data where the system transitions between hidden (unobservable) states over time, with cach state producing observable outputs. HMMs are particularly powerful for modeling temporal dependencies and patterns in sequences Core Components: « Hidden States: Unobservable system states « Observations: Observable outputs generated by states « Temporal Dependencies: Current state depends only on previous state Markov property) Key Insight: We can't see the hidden states direetly, but we can infer them from the sequence of observations they produce. Components of HMM An HMM is defined by five elements Q, V, m, A, B [Link] Q © Q= {a1, 42, « Example: Sunny, Rainy, Cloudy} for weather an}: Set of N hidden states 2. Observations V ov V2, V2, nn Vn}: Set of M possible observations « Example: Walk, Shop, Clean} for activities » initial State Probabilities (m) [711, 2, um Rn]: Probability of starting in each state « m= P(qx = 5): Probability of starting in state i 4. Transition Probabilities A # A= [ai}: NN matrix of state transition probabilities # y= P(qusa =5) | qt = 8): Probability of transitioning from state i to state j 5. Emission Probabilities B « B= [by(k)]: N M matrix of observation probabilities # by(k) = P(ot = vie | qt =5): Probability of observing symbol k in state j Key Algorithms 1, Forward Algorithm. Purpose: Calculate the probability that the HMM generates a given observation sequence Time Complexity: © N?T) where = number of states, T= sequence length 2. Viterbi Algorithm Purpose: Find the most likely sequence of hidden stat s given observations Steps: Initialization: Set initial probabilities Recursion: For each time step, find most probable path to each state Termination: Select state with highest probability at final step Backtracking: Trace back the optimal path Time Complexity: 0 N'T 3. Baum-Welch Algorithm Forward-Backward) Purpose: Learn HMM parameters from observation sequences (unsupervised learning) ‘Type: Expectation-Maximization EM) algorithm variant Use Case: When you have observation sequences but don't know the model parameters. Worked Example: HMM Problem: Weather prediction using activities Setup: « States: S ainy, Sunny} « Observations: © = Walk, Shop, Clean} « Observation Sequence: Walk, Shop, Clean) Model Parameters: Initial Probabilities (x): « m(Rainy) = 0.6 4 « m(Sunny) = ‘Transition Matrix A: Rainy Sunny Rainy 0703 Sumy 04 06 Emission Matrix B Walk Shop Clean Rainy 01 04 0s Walk Shop Clean Sumy 06 03, Viterbi Algorithm Solution: Step 1 Initialization (t=1, observation=Walk) « V: Rainy = m(Rainy) * B Rainy, Walk) = 0.6 0.1 0.06 # V; Sunny = (Sunny) * B Sunny,Walk) = 0.4 0.6 0.24 Step 2 hop) Recursion ((=2, observatio « V: Rainy = max[V; Rainy)*A Rainy,Rainy), V; Sunny)A Sunny,Rainy)] x B Rainy,Shop) = max|0.06 0.7, 0.24 0.4 x 0.4 = max 0.042, 0.096 x 0.4 0.096 0.4 0.0384 # Vz Sunny ~ max{0.06 0.3, 0.24 0.6 x 0.3 0.144 0.3 0.0432 Step 3 Recursion (t=3, observation=Clean) «Vs Rainy = max{0.0384 0.7, 0.0432 0.4 * 0.5 0.02688 0.5 0.01344 # Vs Sunny = max[0,0384 0.3, 0.0432 0.6 x 0.1 0.02592 0.1 0.002592 Result: Most likely state sequence is determined by backtracking from the highest final probability Advantages and Limitations: HMM Advantages « Handles sequential data effectively « Models temporal dependencies naturally « Unsupervised learning capability Baum-Welch) « Probabilistic framework provides uncertainty measures Efficient algorithms for inference and learning « Interpretable structure with clear state meanings Li itations « Markov assumption: Current state depends only on previous state « Static parameters: Transition and emission probabilities don't change over time « Independence assumption: Observations are independent given the state « Model selection: Choosing number of states is challenging « Local optima: Baum-Welch can get stuck in local optima « Limited memory: Only remembers one previous state (first-order) Key Assumptions # Markoy Property: P(geslqi.-44) = P(qesla) « Stationarity: Transition probabilities don't change over time « Independence: Observations are independent given the hidden state Applications: HMM 1. Speech Recognition « Hidden States; Phonemes, words, or linguistic units « Observations: Audio features MFCCs, spectrograms) « Applications: Voice assistants Siri, Alexa), dictation software 2. Natural Language Processing « Part-of-Speech Tagging: Hidden states = POS tags, observations = words « ‘Named Entity Recognition: Identify person/location/organization names « Machine Translation: Model phrase alignments 3. Bioinformat s « Gene Finding: Identify coding vs non-coding regions in DNA « Protein Structure Prediction: Predict secondary structure from sequence » Sequence Alignment: Find similar regions in biological sequences # CpG Island Detection: Identify methylation sites, 4, Finance « Market Regime Detection: Bull/bear market states « Algorithmic Trading: Model price movement patterns « Risk Management: Detect market volatility regimes Computer Vision « Gesture Recognition: Model temporal sequence of hand movements « Activity Recognition: Human behavior analysis in videos « Object Tracking: Track objects actoss video fra KNN vs HMM Comparison Aspect KNN Type Supervised Leaning Data Structure Independent instances Training Memory Assumptions Similariy in feature space Computational Complexity O(0) per prediction KNN No temporal modeling Interpretabilty Distnce-hased similarity Output Class labels o values Scalability Poor with large datasets Noise Sensitivity High (sensitive to outliers) Feature Dependencies Assumes independence Applications Classification, regression Model Complexity Simple, parameter free Uncertainty Quantification Limited (distance-based) When to Choose KNN vs HMM ‘Choose KNN when: « Data points are independent # You have labeled training data » Quick prototyping is needed « Non-linear decision boundaries exist « Sufficient training data is available Choose HMM when: « Data has temporal/sequential structure » Hidden states generate observations # You need to model state transitions « Uncertainty quantification is important © Working with time series or sequences HMM Probabilistic Model Sequental/Temporal data Parameter estimation required Stores model parameters only Markov property stationarity © N=) for algorithms uM Explicit temporal dependencies Probabilistic state transitions State sequences, probabilities Good for ong sequences Moderate (probabilistic smoothing) Models depend rough states Sequence analysis, time series Moderate (requires parameter tuning) Natural (probabilistic framework) Key Takeaways KKNN Quick Review « Lazy learner: No training phase, stores all data « Distance-based: Uses similarity metrics for classification/regression « Parameter K: Critical choice affecting bias-variance tradeoff « Feature scaling: Essential for distance calculations « Applications: Recommendation systems, anomaly detection, classification HMM Quick Review « Sequential model: Handles temporal dependencies in data « Hidden states: Unobservable states generate observable outputs « Three problems: Evaluation, decoding, learning «Key algorithms: Forward, Viterbi, Baum-Welch « Applications: Speech recognition, bioinformatics, NLP Exam Tips: # Understand the five components of an HMM « Know which algorithm solves which problem « Practice Viterbi algorithm calculations « Remember the Markov assumption « Understand transition vs emission probabilities References #-~text=-The%20kY 51%20nei: Y 20(KNN)%20al gorithm%20is%20a%20non,used%20in%20machine%20learning%20 today. [Link] [Link] Ll pdf hittps://www. [Link]/machine-learning/k-nearest-neighbours/

You might also like