Machine Learning Model Evaluation Metrics

The document provides an overview of model performance and evaluation metrics in machine learning, focusing on classification and regression metrics such as accuracy, precision, F1 score, and ROC AUC. It discusses the importance of proper dataset partitioning, underfitting and overfitting, and hyperparameter tuning through techniques like holdout and k-fold cross-validation. The content emphasizes the need for appropriate evaluation metrics, especially in cases of imbalanced data, to ensure accurate model performance assessment.

Uploaded by

sm08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views17 pages

Machine Learning Model Evaluation Metrics

Uploaded by

sm08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Introduction to Machine Learning

Course Teacher:
Dr. M. Shahidur Rahman
Professor, DoCSE, SUST
2 Model Performance and Evaluation Metrics

Topics covered:
 Evaluation Metrics
 Model Performance Evaluation
 Model Selection
Model Performance and Evaluation Metrics

 In classification domain, the simplest visualization of the success of a

model is normally described using the confusion matrix.
Evaluation Metrics

Accuracy:

 True positive rate (TPR) or recall or hit rate or sensitivity:

 Precision or positive predictive value:

 F1 Score:
Evaluation Metrics…

 Specificity:

 Miss rate or false negative rate:

 False Positive Rate (FPR):

Evaluation Metrics…

 Accuracy and classification error are informative measures of success

when the data is balanced in terms of the classes
 When the data is imbalanced, i.e., one class is represented in larger
proportion over the other class in the dataset, these measures become
biased towards the majority class and give a wrong estimate of success.
 In such cases, base measures, such as true positive rate (TPR), false
positive rate (FPR), true negative rate (TNR), and false negative rate (FNR),
become useful.
 Metrics such as F1 score combines the base measures to give an overall
measure of success.
Evaluation Metrics…

 The curve that plots TPR and FPR for a classifier at various thresholds is
known as the receiver-operating characteristic (ROC) curve.
 Precision and recall can be plotted at different thresholds, giving the
precision-recall curve (PRC)
 The areas under each curve are respectively known as auROC and auPRC
and are popular metrics of performance.
 In particular, auPRC is generally considered to be an informative metric
in the presence of imbalanced classes.
8 ROC AUC

 A perfect classifier would fall into

the top-left corner of the graph
with a TPR of 1 and an FPR of 0.
 Based on the ROC curve, we
compute the ROC area under the
curve (ROC AUC) to characterize
the performance of a classification
model.
 Higher ROC AUC means better
classification performance.
Regression Evaluation Metrics

 Average prediction error:

 Mean absolute error (MAE):

 Root mean squared error (RMSE):

 Relative squared error (RSE) is used when two errors are measured in
different units:
10 Ratio for partitioning a dataset into training and
test datasets
 In general, we don't want to allocate too much information to the test set.
 However, the smaller the test set, the more inaccurate the estimation of the
generalization error.
 Dividing a dataset into training and test datasets is all about balancing this
tradeoff.
 In practice, the most commonly used splits are 60:40, 70:30, or 80:20,
depending on the size of the initial dataset.
 For large datasets, 90:10 or 99:1 splits are also common and appropriate.
 For example, if the dataset contains more than 100,000 training examples, it
might be fine to withhold only 10,000 examples for testing in order to get a
good estimate of the generalization performance.
11 Underfitting and overfitting

 model can also suffer from underfitting (high bias), which means that
our model is not complex enough to capture the pattern in the training
data well and suffers from low performance on unseen data.
 If a model is too complex for a given training dataset—there are too
many parameters in this model—the model tends to overfit the training
data and does not generalize well to unseen data
12 Debugging algorithms with learning and
validation curves
Hyperparameter tuning

 Validation techniques are meant to

answer the question of how to select
a model(s) with the right
hyperparameter values.
 Hyperparameters are parameters set
before training a machine learning
model. They are not learned from the
data but are manually configured to
optimize model performance. Ex.
Learning Rate (𝛼), Number of Trees, Hyperparameter C is the inverse
Kernal type in SVM. regularization parameter of the
LogisticRegression classifier,
where C=1 provides best performance.
14 Holdout cross-validation

 For estimating the

generalization
performance of ML
models is holdout cross-
validation
K-fold cross validation

 The validation process needs a

large number of labeled data
points for creating the training
set and the validation set.
 Collecting a large labeled set is
usually difficult
 In such cases, instead of
physically separating the training
set and validation set, k-fold
cross-validation is used.
16 K-fold cross validation…

 Once we have found satisfactory hyperparameter values, we can retrain

the model on the complete training dataset and obtain a final
performance estimate using the independent test dataset.
 Value of k in k-fold cross-validation is typically k = 10.
 A special case of k-fold cross-validation is the leave-one-out cross-
validation (LOOCV) method, where k = n, number of training examples.
 It is recommended for working with very small datasets.
17 Model Selection

 Use cross-validation or k-fold cross-validation for fine-tuning the

performance of an ML model by varying its hyperparameter values
 Choose the model that performs best on relevant criteria such as
accuracy.

Machine Learning Evaluation Techniques
No ratings yet
Machine Learning Evaluation Techniques
121 pages
Machine Learning Evaluation Techniques
No ratings yet
Machine Learning Evaluation Techniques
19 pages
Model Evaluation and Hyperparameter Tuning
No ratings yet
Model Evaluation and Hyperparameter Tuning
48 pages
Model Evaluation Techniques in ML
No ratings yet
Model Evaluation Techniques in ML
20 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
4 pages
Chapter 3 EDA
No ratings yet
Chapter 3 EDA
33 pages
Deep Learning Model Evaluation Metrics
No ratings yet
Deep Learning Model Evaluation Metrics
11 pages
Intro to Model Evaluation Metrics
No ratings yet
Intro to Model Evaluation Metrics
24 pages
Model Evaluation Metrics in Machine Learning
No ratings yet
Model Evaluation Metrics in Machine Learning
49 pages
Model Evaluation Metrics Explained
No ratings yet
Model Evaluation Metrics Explained
6 pages
Machine Learning Model Evaluation Guide
No ratings yet
Machine Learning Model Evaluation Guide
31 pages
Module 5
No ratings yet
Module 5
10 pages
Model Selection and Evaluation in ML
No ratings yet
Model Selection and Evaluation in ML
24 pages
Essential Steps in ML Workflow
No ratings yet
Essential Steps in ML Workflow
20 pages
Machine Learning Performance Metrics Guide
No ratings yet
Machine Learning Performance Metrics Guide
43 pages
Evaluating Machine Learning Models
No ratings yet
Evaluating Machine Learning Models
46 pages
Machine Learning Model Evaluation Metrics
No ratings yet
Machine Learning Model Evaluation Metrics
36 pages
Machine Learning Model Evaluation Guide
No ratings yet
Machine Learning Model Evaluation Guide
22 pages
Evaluating Machine Learning in Healthcare
No ratings yet
Evaluating Machine Learning in Healthcare
34 pages
Machine Learning Performance Metrics Guide
No ratings yet
Machine Learning Performance Metrics Guide
24 pages
Machine Learning Algorithm Evaluation Guide
No ratings yet
Machine Learning Algorithm Evaluation Guide
26 pages
Deep Learning Model Evaluation Metrics
No ratings yet
Deep Learning Model Evaluation Metrics
10 pages
Understanding Cross-Validation in ML
No ratings yet
Understanding Cross-Validation in ML
4 pages
Cross Validation Techniques in ML
No ratings yet
Cross Validation Techniques in ML
52 pages
Evaluating Machine Learning Models
100% (2)
Evaluating Machine Learning Models
10 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
40 pages
Machine Learning Classifier Evaluation Guide
No ratings yet
Machine Learning Classifier Evaluation Guide
61 pages
Hyperparameter Tuning and Overfitting
No ratings yet
Hyperparameter Tuning and Overfitting
17 pages
Model Selection & Evaluation in ML
No ratings yet
Model Selection & Evaluation in ML
31 pages
MML Fall2025 Lec11
No ratings yet
MML Fall2025 Lec11
32 pages
Model Evaluation and Tuning Techniques
No ratings yet
Model Evaluation and Tuning Techniques
9 pages
Machine Learning Modeling and Evaluation Guide
No ratings yet
Machine Learning Modeling and Evaluation Guide
7 pages
Machine Learning Basics and Evaluation
No ratings yet
Machine Learning Basics and Evaluation
48 pages
Model Evaluation and Selection in Data Mining
No ratings yet
Model Evaluation and Selection in Data Mining
86 pages
Machine Learning Model Training & Testing
No ratings yet
Machine Learning Model Training & Testing
23 pages
Model Validation and Interpretability Techniques
No ratings yet
Model Validation and Interpretability Techniques
33 pages
Machine Learning Process Overview
No ratings yet
Machine Learning Process Overview
19 pages
ML Model Evaluation Metrics Guide
No ratings yet
ML Model Evaluation Metrics Guide
33 pages
Machine Learning Model Evaluation Guide
No ratings yet
Machine Learning Model Evaluation Guide
34 pages
Evaluating Machine Learning Performance
No ratings yet
Evaluating Machine Learning Performance
42 pages
Lecture 3. Basic Concept of Supervised Learning and Rule Induction
No ratings yet
Lecture 3. Basic Concept of Supervised Learning and Rule Induction
48 pages
12 Key Machine Learning Evaluation Metrics
No ratings yet
12 Key Machine Learning Evaluation Metrics
16 pages
Deep Learning Model Evaluation Metrics
No ratings yet
Deep Learning Model Evaluation Metrics
21 pages
Model Evaluation Techniques in ML
No ratings yet
Model Evaluation Techniques in ML
65 pages
ML Performance Evaluation Insights
No ratings yet
ML Performance Evaluation Insights
30 pages
Model Evaluation Techniques in ML
No ratings yet
Model Evaluation Techniques in ML
44 pages
Model Evaluation Biological Insights Week 4
No ratings yet
Model Evaluation Biological Insights Week 4
20 pages
Machine Learning Metrics and Techniques
No ratings yet
Machine Learning Metrics and Techniques
41 pages
ML Interview Questions Answers
No ratings yet
ML Interview Questions Answers
49 pages
Model Evaluation and Performance Metrics
No ratings yet
Model Evaluation and Performance Metrics
31 pages
Machine Learning Performance Metrics Guide
No ratings yet
Machine Learning Performance Metrics Guide
19 pages
Machine Learning Metrics & Tips
No ratings yet
Machine Learning Metrics & Tips
2 pages
Cheatsheet Machine Learning Tips and Tricks
No ratings yet
Cheatsheet Machine Learning Tips and Tricks
2 pages
Unit III ML Model Deployment
No ratings yet
Unit III ML Model Deployment
31 pages
Machine Learning Model Evaluation Metrics
No ratings yet
Machine Learning Model Evaluation Metrics
40 pages
Machine Learning Performance Metrics Guide
No ratings yet
Machine Learning Performance Metrics Guide
38 pages
Module 6 - Evaluation Metrics
No ratings yet
Module 6 - Evaluation Metrics
23 pages
K-Fold Cross Validation in Python
No ratings yet
K-Fold Cross Validation in Python
11 pages
Machine Learning Basics Overview
No ratings yet
Machine Learning Basics Overview
21 pages
Data Visualization with Logarithms
No ratings yet
Data Visualization with Logarithms
12 pages
Pandas Series and DataFrame Basics
No ratings yet
Pandas Series and DataFrame Basics
10 pages
Theory of Order and Lattices Course MAT 421
No ratings yet
Theory of Order and Lattices Course MAT 421
1 page
DataFrame Merging and Concatenation in Pandas
No ratings yet
DataFrame Merging and Concatenation in Pandas
9 pages
ARGUS 606S Syringe Pump Training Guide
No ratings yet
ARGUS 606S Syringe Pump Training Guide
25 pages
Comparing and Ordering Integers
No ratings yet
Comparing and Ordering Integers
21 pages
Introduction to Data Science Course
No ratings yet
Introduction to Data Science Course
2 pages
GATE 2023 Mechanical Question Paper PDF
No ratings yet
GATE 2023 Mechanical Question Paper PDF
13 pages
Smart Antenna Design via Neural Networks
No ratings yet
Smart Antenna Design via Neural Networks
9 pages
Electrolysis of Water: Gas Production Insights
No ratings yet
Electrolysis of Water: Gas Production Insights
4 pages
Telangana Physics Lab Manual 2023
No ratings yet
Telangana Physics Lab Manual 2023
107 pages
Multiply With Regrouping
No ratings yet
Multiply With Regrouping
2 pages
Measurement-Based Quantum Computing Basics
No ratings yet
Measurement-Based Quantum Computing Basics
4 pages
Lab Report - Integrator and Differentiator Circuits
No ratings yet
Lab Report - Integrator and Differentiator Circuits
7 pages
printf Function Test Questions
No ratings yet
printf Function Test Questions
15 pages
30-Day Pilot Exam Study Guide
No ratings yet
30-Day Pilot Exam Study Guide
9 pages
Assessment Practices in Math Classrooms
No ratings yet
Assessment Practices in Math Classrooms
22 pages
Central Limit Theorem Explained
No ratings yet
Central Limit Theorem Explained
3 pages
Understanding CSS: Rules, Selectors, and Styles
No ratings yet
Understanding CSS: Rules, Selectors, and Styles
69 pages
Solar Air Heater Technology Review
No ratings yet
Solar Air Heater Technology Review
11 pages
Wire Rope Sling Specifications Guide
No ratings yet
Wire Rope Sling Specifications Guide
20 pages
Irrigation Flow Measurement Techniques
No ratings yet
Irrigation Flow Measurement Techniques
9 pages
Civil Engineering Drawing Submission Notice
No ratings yet
Civil Engineering Drawing Submission Notice
2 pages
Reading Comprehension Practice Questions
No ratings yet
Reading Comprehension Practice Questions
10 pages
Understanding Decision Theory Basics
No ratings yet
Understanding Decision Theory Basics
13 pages
Complaints Resolving System Project Report
No ratings yet
Complaints Resolving System Project Report
60 pages
Global Misinformation Susceptibility in COVID-19
No ratings yet
Global Misinformation Susceptibility in COVID-19
15 pages
AC Servo Controller NCR-CA Manual
No ratings yet
AC Servo Controller NCR-CA Manual
304 pages
Computer Architecture Model Exam 2016
No ratings yet
Computer Architecture Model Exam 2016
1 page
Towing Safety and Engine Hazards Guide
No ratings yet
Towing Safety and Engine Hazards Guide
7 pages
Understanding Arterial Blood Pressure
No ratings yet
Understanding Arterial Blood Pressure
16 pages
Class 11 Mathematics Long Test Paper
No ratings yet
Class 11 Mathematics Long Test Paper
4 pages
Existence of 2-(25,5,1) Design
No ratings yet
Existence of 2-(25,5,1) Design
54 pages
Van Gorp Idlers and Pulleys Overview
No ratings yet
Van Gorp Idlers and Pulleys Overview
20 pages