Unit 3: Evaluating Models
Model Evaluation
• Definition:
Model evaluation is the process of assessing how
well a model performs on unseen data.
• Explanation:
It helps determine the model’s accuracy,
robustness, and suitability for solving the
problem. It ensures that the model is not just
memorizing data (overfitting) but can generalize.
Importance of Model Evaluation
• Evaluation refers to the process of assessing a machine learning
model's performance on data that it hasn't seen before (usually
from a test dataset). The goal is to determine:
– How well the model is making predictions.
– Whether it is generalizing correctly beyond the training data.
– If the model is accurate, reliable, and fair in practical scenarios.
• This process is essential to identify:
– Overfitting (model performs well on training data but poorly on new
data).
– Underfitting (model fails to capture patterns even in training data).
Areas where the model can be improved before deployment.
– Evaluating a model helps determine its performance and reliability.
Splitting the Training Set
• It ensures the model generalizes well and is
not over fitted.
Train-Test Split
• Definition:
Train-Test Split is dividing the dataset into two parts –
training and testing.
• Explanation:
Training set is used to build the model; the test set is
used to evaluate it. It mimics real-world data to
measure performance realistically.
Need of Train-test split
❖ The train dataset is used to make the model learn
❖ The input elements of the test dataset are
provided to the trained model. The model makes
predictions, and the predicted values are
compared to the expected values
❖ The objective is to estimate the performance of the
machine learning model on new data: data not
used to train the model
What is Accuracy and Error?
• Accuracy:
• Accuracy is an evaluation metric that allows you
to measure the total number of
predictions a model gets right.
• The accuracy of the model and performance of
the model is directly proportional, and hence
better the performance of the model, the more
accurate are the predictions.
What is Error?
Error can be described as an action that is inaccurate
or wrong.
In Machine Learning, the error is used to see how
accurately our model can predict data it uses to learn
new, unseen data.
Based on our error, we choose the machine learning
model which performs best for a particular dataset
3.3 Accuracy and Error
Definitions:
• Accuracy = (Correct Predictions / Total
Predictions) × 100%
• Error = 1 - Accuracy
• Explanation:
High accuracy means fewer prediction mistakes.
Error helps track how much the model deviates
from the actual results.
Evaluation Metrics for Classification
• Definition:
Classification metrics help in understanding
model performance on classification
problems.
Classification Metrics
Popular metrics used for classification model
▪ Confusion matrix
▪ Classification accuracy
▪ Precision
▪ Recall
The confusion matrix
Confusion Matrix: Table to visualize performance.
• The confusion matrix is a handy presentation of the accuracy of a model with
two or more classes
• The table presents the actual values on the y-axis and predicted values on the x-
axis
• The numbers in each cell represents the number of predictions made by a
machine learning algorithm that falls into that particular category
True Positive and True Negative
1. True Positive: True Positive (TP) is the outcome of
the model correctly predicting the positive class.
Example: You had predicted that France
would win the world cup, and it won.
2. True Negative : True Negative (TN) is the outcome
of the model correctly predicting the negative
class.
Example: You had predicted that Germany
would not win, and it lost
False Positive and False Negative
False Positive: False Positive (FP) is the outcome of the
model wrongly predicting the negative class as positive
class
Example: You had predicted that Germany would win, but it
lost.
False Negative: False Negative (FN) is the outcome of the
model wrongly predicting the positive class as the negative
class.
Example: You had predicted that France would not win but
it won
Calculations
Precision from Confusion matrix
Precision is the ratio of the total number of
correctly classified positive examples and the
total number of predicted positive examples.
Precision = Correct positive predictions
Total positive predictions
TP
TP+FP
Recall from Confusion matrix
The recall is the measure of our model correctly identifying
True Positives
Recall = Correct positive predictions
Total actual positive values
TP
TP+FN
• Recall is also called as Sensitivity or True Positive Rate
• Recall is generally used for unbalanced dataset when
dealing with the False Negatives become important and
the model needs to reduce the FNs as much as possible.
Example
The case of predicting a good day based on
weather conditions to launch satellite.
Missing out on predicting a good weather day is
okay (low recall) but predicting the bad weather
day (Negative class) as a good weather
day (Positive class) to launch the satellite can be
disastrous.
Ethical Concerns in Model Evaluation
• Bias: A model may favor one group over
another unfairly.
• Transparency: Model logic should be
understandable.
• Accountability: Developers must ensure models
behave ethically.
• These concerns ensure responsible use of AI
systems.
F1 Score
F1-Score provides a way to combine both precisions
and recall into a single measure that captures both
properties
Used where the dataset is unbalanced, and we are
unable to decide whether FP is more important or FN,
we should use the F1 score as the suitable metric.
F1 Score = 2 x Precision x Recall
Precision + Recall
Ethical Concerns in Evaluation
• Bias: Ensuring fairness in model outcomes
• Transparency: Clear model operations
• Accountability: Responsibility for model
decisions