Chapter 3
Evaluating Models
Evaluation
In the evaluation stage ,we will explore different methods of evaluating an AI
[Link] evaluation is an integrel part of the model development process. It
helps to find the best model that represents our data and how well the chosen
model will work in the future.
Model evaluation
Model evaluation is the process of using different evaluation metrics to
understand a machine learning model’s performance.
An AI model gets better with constructive feedback.
you build a model, get feedback from metrics, make improvements and
continue until you achieve a desirable accuracy.
Why we need evaluation model?
Evaluation models are methods for evaluating and choosing the best model
during the modeling process. The model evaluation is like giving your AI model a
report card. It helps you understand its strengths, weaknesses, and suitability for
the task at hand. This feedback loop is essential for building trustworthy and
reliable AI systems.
Splitting the training set data for Evaluation
Train test split
The train-test split is a technique for evaluating the performance of a
machine learning algorithm.
It can be used for any supervised learning algorithm.
The evaluation model divides the dataset into a training dataset and a
testing dataset.
The train test procedure is appropriate when there is a sufficiently large
data available
Why we need of Train-test split?
The train dataset is used to make the model learn.
The input elements of the test dataset are provided to the trained model.
The model makes predictions, and the predicted values are compared to
the expected values.
The objective is to estimate the performance of the machine learning
model on new data: data not used to train the model.
Overfitting
Overfitting means the modl is too closely fit to the training data and fails to
generalize to new data.
Accuracy and Error
Accuracy – Accuracy is an evaluation metric that allows you to measure the total
number of predictions a model gets right. The accuracy of the model and
performance of the model is directly proportional, and hence better the
performance of the model, the more accurate are the predictions.
Error – Error can be described as an action that is inaccurate or wrong. In Machine
Learning, the error is used to see how accurately our model can predict data it
uses to learn new unseen data. Based on our error, we choose the machine
learning model which performs best for a particular dataset.
How to find accuracy of the AI model
To find the accuracy of an AI model, we have to first calculate the percentage of
correct predictions made by the testing dataset. The formula to find the accuracy
is—Learning apps
Error = Actual – Predicted
Error Rate = Error / Actual Price
Accuracy = 1 – Error Rate
Accuracy in percentage = Accuracy * 100
Predicted Accuracy
Actual House Error Abs Error Rate Accuracy%
House Price (1-Error
Price (USD) (Actual-Predicted) (Error/Actual) (Accuracy*100)%
(USD) rate)
1-0.027= 0.973*100%=
391k 402k Abs (402k-391k)=11k 11k/402k=0.027
0.973 97.3%
Evaluation metrics for classification
Classification metrics are used to evaluate the performance of a classification
model in machine learning.
Popular metrics used for classification model
Confusion matrix
Classification accuracy
Precision
Recall
F1 Score
1. What is confusion matrix?
The confusion matrix is a handy presentation of the accuracy of a model
with two or more classes.
The table presents the actual values on the y-axis and predicted
values on the x-axis
The numbers in each cell represents the number of predictions made
by a machine learning algorithm that falls into that particular category
The confusion matrix allows us to understand the prediction results.
It consists of four values:
True Positive ▪ True Positive (TP) is the outcome of the model
correctly predicting the positive class
True Negative ▪ True Negative (TN) is the outcome of the model
correctly predicting the negative class.
False Positive ▪ False Positive (FP) is the outcome of the model
wrongly predicting the negative class as positive class.
False Negative ▪ False Negative (FN) is the outcome of the model
wrongly predicting the positive class as the negative class.
2. Classification accuracy
Classification accuracy allows you to count the total number of accurate
predictions made by a model. The accuracy calculation is as follows:
Can we use Accuracy all the time?
It is only suitable when there are an equal number of observations in each
class, i.e., a balanced dataset (which is rarely the case), and that all
predictions and prediction errors are equally important, which is often not
the case.
3. Precision
Precision is the ratio of the total
number of correctly classified positive
examples and the total number of
predicted positive examples.
Correct positive predictions
Precision = Total positive predictions
TP
=
TP+FP
3. Recall
The recall is the measure of our model correctly identifying True Positives
4. F1 Score
F1-Score provides a way to combine both precisions and recall
into single measure that captures both properties.
In those use cases, where the dataset is unbalanced, and we are unable to
decide whether FP is more important or FN, we should use the F1 score as
the suitable metric
Ethical concerns around model evaluation
Textual exercise
a. Case Study 1:
A spam email detection system is used to classify emails as either spam (1) or
not spam (0). Out of 1000 emails:
True Positives(TP): 150 emails were correctly classified asspam.
False Positives(FP): 50 emails were incorrectly classified asspam.
True Negatives(TN): 750 emails were correctly classified as not
spam.
False Negatives(FN): 50 emails were incorrectly classified as not
spam.
Answer:
Accuracy=(TP+TN) / (TP+TN+FP+FN)
=(150+750)/(150+750+50+50)
=900/1000
=0.90
Precision=(TP/(TP+FP))100
=150/(150+50)
=150/200
=0.75
Recall=TP/(TP+FN)
=150/(150+50)
=150/200
=0.75
F1 Score = 2 * Precision * Recall / ( Precision + Recall )
=2 * 0.75 * 0.75 / (0.75+0.75)
=0.75
=75%
b. Case Study 2:
A credit scoring model is used to predict whether an applicant is likely to default
on a loan (1) or not (0). Out of 1000 loan applicants:
True Positives(TP): 90 applicants were correctly predicted to default
on the loan.
False Positives(FP): 40 applicants were incorrectly predicted to
default on the loan.
True Negatives(TN): 820 applicants were correctly predicted not to
default on the loan.
False Negatives (FN): 50 applicants were incorrectly predicted not to
default on the loan.
Calculate metrics such as accuracy, precision, recall, and F1-score.
Ans:
Accuracy=(TP+TN) / (TP+TN+FP+FN)
=(90+820)/(90+820+40+50)
=910/1000
=0.91
Precision=TP/(TP+FP)
=90/(90+40)
=90/130
=0.692
Recall=TP/(TP+FN)
=90/(90+50)
=90/140
=0.642
F1 Score = 2 * Precision * Recall / ( Precision + Recall )
=2 * 0.692 * 0.642 / (0.692+0.642)
=0.666
=66.6%