0% found this document useful (0 votes)

35 views24 pages

Distance Measures in Machine Learning

Distance measures are essential in machine learning for quantifying similarity or dissimilarity between data points, with various types including metric and non-metric measures. Common distance metrics include Euclidean, Manhattan, and Minkowski distances, while Hamming distance and cosine similarity are used for categorical and binary data. The K-Nearest Neighbor (K-NN) algorithm utilizes these distance measures for classification, and performance metrics such as accuracy, MAE, and MSE are used to evaluate model effectiveness.

Uploaded by

praveenveerepalli729

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views24 pages

Distance Measures in Machine Learning

Uploaded by

praveenveerepalli729

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Distance Measure in Machine Learning

Distance measure is a crucial concept in machine learning, particularly in clustering, classification,

and information retrieval tasks. It helps in quantifying the similarity or dissimilarity between two data
points. Based on the nature of data and the application, different types of distance measures are
used.

Types of Distance Measures:

1. Metric Distance Measures

2. Non-Metric Distance Measures

Euclidean Distance : Euclidean distance is a fundamental distance metric used in Machine Learning
(ML) and various other fields. It represents the straight-line distance between two points in a
multidimensional space.
Manhattan distance : Manhattan distance (also known as L1 distance or Taxicab distance) is a
metric used to measure the distance between two points by summing the absolute differences of
their coordinates.

For two points A (x1,x2,...,xnx_1, x_2, ..., x_nx1,x2,...,xn) and B (y1,y2,...,yny_1, y_2, ..., y_ny1,y2
,...,yn) in an n-dimensional space, the Manhattan distance is calculated as:

Minkowski

Minkowski Distance in Machine Learning

Minkowski distance is a generalized distance metric that includes Euclidean distance and Manhattan
distance as special cases. It is widely used in machine learning, especially in clustering and
classification algorithms.

Formula

For two points A (x1,x2,...,xnx_1, x_2, ..., x_nx1,x2,...,xn) and B (y1,y2,...,yny_1, y_2, ..., y_ny1,y2
,...,yn) in an n-dimensional space, the Minkowski distance is defined as:
(b) Non-Metric Distance Measures

Non-metric distances do not satisfy all the properties of metric distances, particularly triangle
inequality or symmetry. These are mostly used in cases where relationships between objects are not
strictly numerical but are instead based on ranks or qualitative properties.
Hamming Distance in Machine Learning

Hamming distance is a metric used to measure the number of positions at which two strings of equal
length differ. It is primarily used for categorical or binary data.

Formula

For two strings or binary vectors A and B, the Hamming distance is calculated as:
Cosine Similarity in Machine Learning

Cosine similarity is a metric used to measure the similarity between two vectors in an inner product
space. It calculates the cosine of the angle between two vectors, indicating their directional
alignment rather than magnitude differences.
Proximity Between Binary Patterns
Proximity measures are used to quantify the similarity or dissimilarity between two binary patterns
(bit strings). In machine learning and pattern recognition, these measures help in clustering,
classification, and data retrieval tasks.

When comparing binary data (where elements take values 0 or 1), we use distance (dissimilarity)
measures or similarity measures to evaluate how close or far two binary patterns are.
Application:

 Used in error detection and correction (e.g., in digital communications).

 Used in DNA sequence comparison.

 Applied in cryptography and data clustering.

Application:

 Used in categorical data analysis.

 Applied in clustering binary data in machine learning.

Application:

 Used in text analysis and document similarity (e.g., comparing word sets).

 Applied in clustering categorical data.

 Popular in recommendation systems (e.g., collaborative filtering).

K-Nearest Neighbor (K-NN) Classifier
Definition:

K-Nearest Neighbor (K-NN) is a supervised machine learning algorithm used for classification and
regression tasks. It is a non-parametric and instance-based learning algorithm that classifies new
data points based on the majority vote of their "K" nearest neighbors.

Unlike other models that build an explicit function for classification, K-NN stores the entire dataset
and classifies new instances by comparing them to existing ones.

How K-NN Works (Step-by-Step Explanation)

1. Choose the number of neighbors (K):

o Select the value of KKK, which determines how many closest points will be
considered.

2. Compute the distance between the new data point and existing points:

o Common distance measures include Euclidean Distance, Manhattan Distance, and

Minkowski Distance.

3. Find the K-nearest neighbors:

o Identify the K closest data points from the training dataset.

4. Assign a class based on majority voting:

o The class with the highest number of neighbors is assigned to the new data point.

5. Return the predicted class.

Performance Measures for Classifiers
When evaluating a classification model, we use performance measures to assess how well it predicts
the correct class labels. Two essential performance metrics are:

1. Classification Accuracy

2. Confusion Matrix

These metrics help in understanding the model's strengths and weaknesses.

Limitations of Accuracy:

 Accuracy is misleading in imbalanced datasets.

Example: In a medical test for a rare disease, if 98 out of 100 people are healthy, a model
that predicts "No Disease" for everyone will be 98% accurate but useless.
Performance of Regression Algorithms
Regression models predict continuous values (e.g., predicting house prices, stock prices, or
temperature). The performance of regression algorithms is evaluated using different error metrics,
which measure how far the predicted values are from the actual values.

Two important performance metrics for regression are:

1. Mean Absolute Error (MAE)

2. Mean Squared Error (MSE)

Interpretation:

 If MAE = 0, the model makes perfect predictions.

 A lower MAE means better model performance.

 MAE is in the same unit as the target variable, making it easy to interpret.

Advantages of MAE:

✔️Simple and easy to understand.

✔️Does not heavily penalize large errors.

Disadvantages of MAE:

❌ Treats all errors equally, even large ones.

Interpretation:

 If MSE = 0, the model makes perfect predictions.

 A lower MSE means better model performance.

 MSE is more sensitive to large errors than MAE.

Advantages of MSE:

✔️Penalizes large errors more, making it useful in applications where big mistakes are costly.

Disadvantages of MSE:

❌ Not in the same unit as the target variable (since errors are squared).
❌ Highly sensitive to outliers.

Nearest Neighbor Models & Proximity Measures
No ratings yet
Nearest Neighbor Models & Proximity Measures
9 pages
Linear Discriminants in Machine Learning
No ratings yet
Linear Discriminants in Machine Learning
6 pages
Cryptography Principles and Security Goals
No ratings yet
Cryptography Principles and Security Goals
13 pages
Data Pre-processing Techniques Guide
No ratings yet
Data Pre-processing Techniques Guide
4 pages
Machine Learning Clustering Techniques
No ratings yet
Machine Learning Clustering Techniques
16 pages
Machine Learning with MLlib & Scikit-learn
100% (1)
Machine Learning with MLlib & Scikit-learn
28 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
38 pages
Data Visualization Lab
No ratings yet
Data Visualization Lab
13 pages
Data Representation and Diversity in ML
No ratings yet
Data Representation and Diversity in ML
8 pages
Decision Trees: Classification & Regression Guide
No ratings yet
Decision Trees: Classification & Regression Guide
38 pages
Machine Learning Question Bank 2024
No ratings yet
Machine Learning Question Bank 2024
6 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
32 pages
One-Hot Encoding in NLP Explained
No ratings yet
One-Hot Encoding in NLP Explained
9 pages
Understanding k-Nearest Neighbor Algorithm
No ratings yet
Understanding k-Nearest Neighbor Algorithm
6 pages
Machine Learning Optimization Techniques
No ratings yet
Machine Learning Optimization Techniques
51 pages
Ensemble Learning and Random Forests Guide
No ratings yet
Ensemble Learning and Random Forests Guide
15 pages
Machine Learning Concepts and Techniques
No ratings yet
Machine Learning Concepts and Techniques
20 pages
ML Unit-4
No ratings yet
ML Unit-4
25 pages
Software Reliability and Quality Metrics
No ratings yet
Software Reliability and Quality Metrics
9 pages
Characteristics of Predictive Models
No ratings yet
Characteristics of Predictive Models
25 pages
Well-Posed Problems in Machine Learning
No ratings yet
Well-Posed Problems in Machine Learning
15 pages
Linear vs Non-Linear Models in ML
No ratings yet
Linear vs Non-Linear Models in ML
18 pages
Decision Tree Algorithm with Tuning
No ratings yet
Decision Tree Algorithm with Tuning
5 pages
R23 Machine Learning Lab Manual
No ratings yet
R23 Machine Learning Lab Manual
40 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
16 pages
Deep Learning Data Processing Guide
No ratings yet
Deep Learning Data Processing Guide
41 pages
Anatomy of MapReduce Job Execution
No ratings yet
Anatomy of MapReduce Job Execution
28 pages
Key Machine Learning Challenges
No ratings yet
Key Machine Learning Challenges
8 pages
Understanding Deep Feedforward Networks
No ratings yet
Understanding Deep Feedforward Networks
113 pages
Machine Learning Applications Overview
No ratings yet
Machine Learning Applications Overview
54 pages
Understanding Unsupervised Learning Techniques
No ratings yet
Understanding Unsupervised Learning Techniques
4 pages
ML r23 Unit 5 Notes by Lokesh
No ratings yet
ML r23 Unit 5 Notes by Lokesh
38 pages
McCulloch-Pitts Neuron vs Perceptron
No ratings yet
McCulloch-Pitts Neuron vs Perceptron
15 pages
Machine Learning Regression Techniques
No ratings yet
Machine Learning Regression Techniques
16 pages
Rough Clustering in Machine Learning
No ratings yet
Rough Clustering in Machine Learning
9 pages
Understanding Priority Queues and BSTs
No ratings yet
Understanding Priority Queues and BSTs
5 pages
Dendrogram in Hierarchical Clustering
No ratings yet
Dendrogram in Hierarchical Clustering
50 pages
Machine Learning in Data Science: Unit 5
No ratings yet
Machine Learning in Data Science: Unit 5
19 pages
Overview of Data Warehousing Concepts
100% (1)
Overview of Data Warehousing Concepts
45 pages
Types of Machine Learning Algorithms
No ratings yet
Types of Machine Learning Algorithms
9 pages
Understanding Linear Discriminants in ML
No ratings yet
Understanding Linear Discriminants in ML
11 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
26 pages
KNN and Case-Based Learning Overview
No ratings yet
KNN and Case-Based Learning Overview
43 pages
Deep Learning: Machine Learning Basics
No ratings yet
Deep Learning: Machine Learning Basics
35 pages
Gaussian Mixture Model Parameters Analysis
No ratings yet
Gaussian Mixture Model Parameters Analysis
24 pages
Wireless Networks Overview and Technologies
0% (1)
Wireless Networks Overview and Technologies
5 pages
ML Lab Viva Questions and Answers
100% (1)
ML Lab Viva Questions and Answers
9 pages
Process States and Scheduling Overview
No ratings yet
Process States and Scheduling Overview
45 pages
Linear Algebra and Hilbert Space Basics
No ratings yet
Linear Algebra and Hilbert Space Basics
12 pages
Graph Representation Efficiency Analysis
No ratings yet
Graph Representation Efficiency Analysis
50 pages
Data Science Overview and Applications
No ratings yet
Data Science Overview and Applications
25 pages
Soft Computing Handwritten Notes
No ratings yet
Soft Computing Handwritten Notes
22 pages
Effective Subset Selection in Data Analytics
No ratings yet
Effective Subset Selection in Data Analytics
11 pages
Deep Learning Techniques Overview
No ratings yet
Deep Learning Techniques Overview
19 pages
1 An Introduction To Rough Set Theory and Its Applic
No ratings yet
1 An Introduction To Rough Set Theory and Its Applic
40 pages
Multilayer Perceptron Overview
No ratings yet
Multilayer Perceptron Overview
71 pages
Data Science Overview and R Basics
No ratings yet
Data Science Overview and R Basics
22 pages
Learning Conjunctive Concepts in ML
No ratings yet
Learning Conjunctive Concepts in ML
32 pages
GET 307 Students Note-Machine Learning Workflow1
No ratings yet
GET 307 Students Note-Machine Learning Workflow1
6 pages
K-Nearest Neighbors for Anomaly Detection
No ratings yet
K-Nearest Neighbors for Anomaly Detection
21 pages
C Programming Assignment Guide
No ratings yet
C Programming Assignment Guide
1 page
AI & ML Tech Fest 2025 Overview
No ratings yet
AI & ML Tech Fest 2025 Overview
2 pages
Naïve Bayes SMS/Email Spam Classifier
No ratings yet
Naïve Bayes SMS/Email Spam Classifier
9 pages
ML 8 Program
No ratings yet
ML 8 Program
5 pages
SVM Classification of Iris Species
No ratings yet
SVM Classification of Iris Species
2 pages
Python BFS Algorithm Implementation
No ratings yet
Python BFS Algorithm Implementation
3 pages
Machine Learning Exam Questions Guide
No ratings yet
Machine Learning Exam Questions Guide
2 pages
Linear vs Logistic Regression Comparison
No ratings yet
Linear vs Logistic Regression Comparison
4 pages
Machine Learning: Key Concepts Explained
No ratings yet
Machine Learning: Key Concepts Explained
3 pages
KNN Models: Key Concepts & Applications
No ratings yet
KNN Models: Key Concepts & Applications
3 pages
Probability and Statistics Assignment Guide
No ratings yet
Probability and Statistics Assignment Guide
2 pages
Statistical Estimation and Hypothesis Testing
No ratings yet
Statistical Estimation and Hypothesis Testing
1 page
Click Fraud Detection with ML & DL Techniques
No ratings yet
Click Fraud Detection with ML & DL Techniques
18 pages
Walnut Species Classification with Deep Learning
No ratings yet
Walnut Species Classification with Deep Learning
22 pages
A Study On Document Classification Using Machine Learning Techniques
No ratings yet
A Study On Document Classification Using Machine Learning Techniques
6 pages
Java Program Output Analysis
No ratings yet
Java Program Output Analysis
14 pages
IEEE 2010 Project List for Students
No ratings yet
IEEE 2010 Project List for Students
11 pages
Smart Attendance System with Facial Recognition
No ratings yet
Smart Attendance System with Facial Recognition
17 pages
Understanding AI and Machine Learning Concepts
No ratings yet
Understanding AI and Machine Learning Concepts
65 pages
AI Search Methods and Problem Solving Techniques
No ratings yet
AI Search Methods and Problem Solving Techniques
4 pages
EPFL Machine Learning Final Exam 2017
No ratings yet
EPFL Machine Learning Final Exam 2017
18 pages
LDA, QDA, and kNN Classifiers Overview
No ratings yet
LDA, QDA, and kNN Classifiers Overview
39 pages
Urban Environment's Impact on Jogging
No ratings yet
Urban Environment's Impact on Jogging
17 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
12 pages
IoT Predictive Maintenance for Motors
No ratings yet
IoT Predictive Maintenance for Motors
7 pages
ML & Analytics Consultant Interview Guide
No ratings yet
ML & Analytics Consultant Interview Guide
7 pages
Machine Learning Principles Question Bank
No ratings yet
Machine Learning Principles Question Bank
4 pages
Python Masterclasses Internship Report
No ratings yet
Python Masterclasses Internship Report
34 pages
Supervised Learning: Linear Regression Guide
No ratings yet
Supervised Learning: Linear Regression Guide
147 pages
Breast Cancer Detection via Data Mining
No ratings yet
Breast Cancer Detection via Data Mining
16 pages
Fundamentals of Machine Learning Guide
No ratings yet
Fundamentals of Machine Learning Guide
45 pages
Predictive Data Mining Homework Guide
No ratings yet
Predictive Data Mining Homework Guide
6 pages
EEG Differential Entropy for Emotion Recognition
No ratings yet
EEG Differential Entropy for Emotion Recognition
4 pages
Large-Scale Machine Learning Methods
No ratings yet
Large-Scale Machine Learning Methods
33 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
1 page
Machine Learning for Anomaly Detection
No ratings yet
Machine Learning for Anomaly Detection
33 pages
Shape Contexts for Object Recognition
No ratings yet
Shape Contexts for Object Recognition
38 pages
Variable Neighborhood Search (7th International Conference, ICVNS 2019 Rabat)
No ratings yet
Variable Neighborhood Search (7th International Conference, ICVNS 2019 Rabat)
205 pages
Data Science & ML with Python and R
100% (1)
Data Science & ML with Python and R
10 pages
Machine Learning for PMSM Temperature Estimation
No ratings yet
Machine Learning for PMSM Temperature Estimation
17 pages
AI Surveillance for Theft Detection
No ratings yet
AI Surveillance for Theft Detection
8 pages
Machine Learning Concepts and Types
No ratings yet
Machine Learning Concepts and Types
92 pages

Distance Measures in Machine Learning

Uploaded by

Distance Measures in Machine Learning

Uploaded by

Distance Measure in Machine Learning

Distance measure is a crucial concept in machine learning, particularly in clustering, classification,

Types of Distance Measures:

1. Metric Distance Measures

2. Non-Metric Distance Measures

Minkowski Distance in Machine Learning

 Used in error detection and correction (e.g., in digital communications).

 Used in DNA sequence comparison.

 Applied in cryptography and data clustering.

 Used in categorical data analysis.

 Applied in clustering binary data in machine learning.

 Applied in clustering categorical data.

 Popular in recommendation systems (e.g., collaborative filtering).

How K-NN Works (Step-by-Step Explanation)

1. Choose the number of neighbors (K):

o Common distance measures include Euclidean Distance, Manhattan Distance, and

3. Find the K-nearest neighbors:

o Identify the K closest data points from the training dataset.

4. Assign a class based on majority voting:

5. Return the predicted class.

These metrics help in understanding the model's strengths and weaknesses.

 Accuracy is misleading in imbalanced datasets.

Two important performance metrics for regression are:

1. Mean Absolute Error (MAE)

2. Mean Squared Error (MSE)

 If MAE = 0, the model makes perfect predictions.

 A lower MAE means better model performance.

✔️Simple and easy to understand.

❌ Treats all errors equally, even large ones.

 If MSE = 0, the model makes perfect predictions.

 A lower MSE means better model performance.

 MSE is more sensitive to large errors than MAE.

You might also like