UNIT 4
Performance measurement of models in terms of accuracy, confusion matrix, precision & recall, F1
score, receiver Operating Characteristic Curve (ROC) curve and AUC, Median absolute deviation
(MAD), Distribution of errors.
Performance Metrics in Machine Learning
Evaluating the performance of a Machine learning model is one of the important
steps while building an effective ML model. To evaluate the performance or
quality of the model, different metrics are used, and these metrics are known
as performance metrics or evaluation metrics. These performance metrics help
us understand how well our model has performed for the given data. In this way, we
can improve the model's performance by tuning the hyper-parameters. Each ML
model aims to generalize well on unseen/new data, and performance metrics help
determine how well the model generalizes on the new dataset.
In machine learning, each task or problem is divided
into classification and Regression. Not all metrics can be used for all types of
problems; hence, it is important to know and understand which metrics should be
used. Different evaluation metrics are used for both Regression and Classification
tasks. In this topic, we will discuss metrics used for classification and regression
tasks.
1. Performance Metrics for Classification
In a classification problem, the category or classes of data is identified based on
training data. The model learns from the given dataset and then classifies the new
data into classes or groups based on the training. It predicts class labels as the
output, such as Yes or No, 0 or 1, Spam or Not Spam, etc. To evaluate the
performance of a classification model, different metrics are used, and some of them
are as follows:
o Accuracy
o Confusion Matrix
o Precision
o Recall
o F-Score
o AUC(Area Under the Curve)-ROC
I. Accuracy
The accuracy metric is one of the simplest Classification metrics to implement, and it
can be determined as the number of correct predictions to the total number of
predictions.
It can be formulated as:
To implement an accuracy metric, we can compare ground truth and predicted
values in a loop, or we can also use the scikit-learn module for this.
Firstly, we need to import the accuracy_score function of the scikit-learn library as
follows:
1. from [Link] import accuracy_score
2.
3. Here, metrics is a class of sklearn.
4.
5.
Then we need to pass the ground truth and predicted values in the function to
calculate the accuracy.
6.
7. print(f'Accuracy Score is {accuracy_score(y_test,y_hat)}')
Although it is simple to use and implement, it is suitable only for cases where an
equal number of samples belong to each class.
When to Use Accuracy?
It is good to use the Accuracy metric when the target variable classes in data are
approximately balanced. For example, if 60% of classes in a fruit image dataset are
of Apple, 40% are Mango. In this case, if the model is asked to predict whether the
image is of Apple or Mango, it will give a prediction with 97% of accuracy.
When not to use Accuracy?
It is recommended not to use the Accuracy measure when the target variable majorly
belongs to one class. For example, Suppose there is a model for a disease
prediction in which, out of 100 people, only five people have a disease, and 95
people don't have one. In this case, if our model predicts every person with no
disease (which means a bad prediction), the Accuracy measure will be 95%, which is
not correct.
II. Confusion Matrix
A confusion matrix is a tabular representation of prediction outcomes of any binary
classifier, which is used to describe the performance of the classification model on a
set of test data when true values are known.
The confusion matrix is simple to implement, but the terminologies used in this
matrix might be confusing for beginners.
A typical confusion matrix for a binary classifier looks like the below image(However,
it can be extended to use for classifiers with more than two classes).
We can determine the following from the above matrix:
o In the matrix, columns are for the prediction values, and rows specify the
Actual values. Here Actual and prediction give two possible classes, Yes or
No. So, if we are predicting the presence of a disease in a patient, the
Prediction column with Yes means, Patient has the disease, and for NO, the
Patient doesn't have the disease.
o In this example, the total number of predictions are 165, out of which 110 time
predicted yes, whereas 55 times predicted No.
o However, in reality, 60 cases in which patients don't have the disease,
whereas 105 cases in which patients have the disease.
In general, the table is divided into four terminologies, which are as follows:
1. True Positive(TP): In this case, the prediction outcome is true, and it is true in
reality, also.
2. True Negative(TN): in this case, the prediction outcome is false, and it is false
in reality, also.
3. False Positive(FP): In this case, prediction outcomes are true, but they are
false in actuality.
4. False Negative(FN): In this case, predictions are false, and they are true in
actuality.
III. Precision
The precision metric is used to overcome the limitation of Accuracy. The precision
determines the proportion of positive prediction that was actually correct. It can be
calculated as the True Positive or predictions that are actually true to the total
positive predictions (True Positive and False Positive).
IV. Recall or Sensitivity
It is also similar to the Precision metric; however, it aims to calculate the proportion
of actual positive that was identified incorrectly. It can be calculated as True Positive
or predictions that are actually true to the total number of positives, either correctly
predicted as positive or incorrectly predicted as negative (true Positive and false
negative).
The formula for calculating Recall is given below:
When to use Precision and Recall?
From the above definitions of Precision and Recall, we can say that recall
determines the performance of a classifier with respect to a false negative, whereas
precision gives information about the performance of a classifier with respect to a
false positive.
So, if we want to minimize the false negative, then, Recall should be as near to
100%, and if we want to minimize the false positive, then precision should be close
to 100% as possible.
In simple words, if we maximize precision, it will minimize the FP errors, and if we
maximize recall, it will minimize the FN error.
V. F-Scores
F-score or F1 Score is a metric to evaluate a binary classification model on the basis
of predictions that are made for the positive class. It is calculated with the help of
Precision and Recall. It is a type of single score that represents both Precision and
Recall. So, the F1 Score can be calculated as the harmonic mean of both
precision and Recall, assigning equal weight to each of them.
The formula for calculating the F1 score is given below:
When to use F-Score?
As F-score make use of both precision and recall, so it should be used if both of
them are important for evaluation, but one (precision or recall) is slightly more
important to consider than the other. For example, when False negatives are
comparatively more important than false positives, or vice versa.
VI. AUC-ROC
Sometimes we need to visualize the performance of the classification model on
charts; then, we can use the AUC-ROC curve. It is one of the popular and important
metrics for evaluating the performance of the classification model.
Firstly, let's understand ROC (Receiver Operating Characteristic curve) curve. ROC
represents a graph to show the performance of a classification model at
different threshold levels. The curve is plotted between two parameters, which are:
o True Positive Rate
o False Positive Rate
TPR or true Positive rate is a synonym for Recall, hence can be calculated as:
FPR or False Positive Rate can be calculated as:
To calculate value at any point in a ROC curve, we can evaluate a logistic regression
model multiple times with different classification thresholds, but this would not be
much efficient. So, for this, one efficient method is used, which is known as AUC.
AUC: Area Under the ROC curve
AUC is known for Area Under the ROC curve. As its name suggests, AUC
calculates the two-dimensional area under the entire ROC curve, as shown below
image:
AUC calculates the performance across all the thresholds and provides an
aggregate measure. The value of AUC ranges from 0 to 1. It means a model with
100% wrong prediction will have an AUC of 0.0, whereas models with 100% correct
predictions will have an AUC of 1.0.
When to Use AUC
AUC should be used to measure how well the predictions are ranked rather than
their absolute values. Moreover, it measures the quality of predictions of the model
without considering the classification threshold.
When not to use AUC
As AUC is scale-invariant, which is not always desirable, and we need calibrating
probability outputs, then AUC is not preferable.
Further, AUC is not a useful metric when there are wide disparities in the cost of
false negatives vs. false positives, and it is difficult to minimize one type of
classification error.
2. Performance Metrics for Regression
Regression is a supervised learning technique that aims to find the relationships
between the dependent and independent variables. A predictive regression model
predicts a numeric or discrete value. The metrics used for regression are different
from the classification metrics. It means we cannot use the Accuracy metric
(explained above) to evaluate a regression model; instead, the performance of a
Regression model is reported as errors in the prediction. Following are the popular
metrics that are used to evaluate the performance of Regression models.
o Mean Absolute Error
o Mean Squared Error
o R2 Score
o Adjusted R2
I. Mean Absolute Error (MAE)
Mean Absolute Error or MAE is one of the simplest metrics, which measures the
absolute difference between actual and predicted values, where absolute means
taking a number as Positive.
To understand MAE, let's take an example of Linear Regression, where the model
draws a best fit line between dependent and independent variables. To measure the
MAE or error in prediction, we need to calculate the difference between actual values
and predicted values. But in order to find the absolute error for the complete dataset,
we need to find the mean absolute of the complete dataset.
The below formula is used to calculate MAE:
Here,
Y is the Actual outcome, Y' is the predicted outcome, and N is the total number of
data points.
MAE is much more robust for the outliers. One of the limitations of MAE is that it is
not differentiable, so for this, we need to apply different optimizers such as Gradient
Descent. However, to overcome this limitation, another metric can be used, which is
Mean Squared Error or MSE.
II. Mean Squared Error
Mean Squared error or MSE is one of the most suitable metrics for Regression
evaluation. It measures the average of the Squared difference between predicted
values and the actual value given by the model.
Since in MSE, errors are squared, therefore it only assumes non-negative values,
and it is usually positive and non-zero.
Moreover, due to squared differences, it penalizes small errors also, and hence it
leads to over-estimation of how bad the model is.
MSE is a much-preferred metric compared to other regression metrics as it is
differentiable and hence optimized better.
The formula for calculating MSE is given below:
Here,
Y is the Actual outcome, Y' is the predicted outcome, and N is the total number of
data points.
III. R Squared Score
R squared error is also known as Coefficient of Determination, which is another
popular metric used for Regression model evaluation. The R-squared metric enables
us to compare our model with a constant baseline to determine the performance of
the model. To select the constant baseline, we need to take the mean of the data
and draw the line at the mean.
The R squared score will always be less than or equal to 1 without concerning if the
values are too large or small.
IV. Adjusted R Squared
Adjusted R squared, as the name suggests, is the improved version of R squared
error. R square has a limitation of improvement of a score on increasing the terms,
even though the model is not improving, and it may mislead the data scientists.
To overcome the issue of R square, adjusted R squared is used, which will always
show a lower value than R². It is because it adjusts the values of increasing
predictors and only shows improvement if there is a real improvement.
We can calculate the adjusted R squared as follows:
Here,
n is the number of observations
k denotes the number of independent variables
and Ra2 denotes the adjusted R2
Mean Absolute Deviation
Mean Absolute Deviation is one of the metrics of statistics that helps us find
out the average spread of the data i.e., Mean Absolute Deviation shows the
average distance of the observation of the dataset from the mean of the
dataset. It is helpful in the analysis of data and understanding of the data with
a much better understanding. Mean Absolute Deviation is one of the
measures of the spread which include other measures i.e., range, quartiles,
interquartile range, standard deviation, and variance.
What is Mean Absolute Deviation?
Mean Absolute Deviation (MAD) of a data set is the average distance between
each data point of the data set and the mean of data. i.e. it represents the
amount of variation that occurs around the mean value in the data set. It is
also a measure of spread. It is calculated as the average of the sum of the
absolute difference between each value of the data set and the mean.
What is Measure of Spread?
The measure of spread represents the amount of dispersion in a data set. i.e.,
how spread out are the values of the dataset around the central value
(example- mean/mode/median). It tells how far away the data points tend to
fall from the central value.
The lower value of the measure of spread reflects that the data
points are close to the central value. In this case, the values in a
data set are more consistent.
Further, the distance of the data points from the central value, the
greater the spread. whereas here, the values are not much
consistent.
Using the above diagram, we can infer that the narrow distribution represents
a lower spread, and the broad distribution represents a higher spread.
Mean Absolute Deviation Formula
As Mean Absolute Deviation is the average of the absolute value of deviation
about the mean of the data, its formula for grouped as well as ungrouped data
is given as follows:
For Ungrouped Data
The Mean Absolute Deviation Formula for ungrouped data is given as follows:
The Mean Absolute Deviation Formula for ungrouped data is given as
follows:
where,
x represents the each observation of the dataset,
i
μ is the mean of the data set, and
n is the number of observations in the data set.
For Grouped Data
The Mean Absolute Deviation Formula for grouped data is given as follows:
Where,
x represents the each observation of the dataset,
i
is mean of dataset
f represents frequency of corresponding observation x ,
i i
1 < i < n and n is the number of data points in the data set.
How to Calculate Mean Absolute Deviation?
To calculate the mean absolute deviation for a set of values, we can use the
following steps:
Step 1: Identify whether the data set is either grouped or ungrouped and
calculate the Mean.
Step 2: Calculate the absolute difference between each data point and the
mean.
Step 3: Add the Absolute Difference calculated for each data point in the step
2.
Step 4: Dividing the sum of absolute difference by the number of data points
given to calculate the mean abosolute deviation.
Using these steps, we can calculate the Mean Absolute Deviation of any
dataset either grouped or ungrouped.
Mean Absolute Deviation vs. Standard Deviation
There are some differences between Mean Absolute Deviation and Standard
Deviation, which are as follows:
Mean Absolute
Parameters Deviation Standard Deviation
The average distance
The measure of how spread out the
between each
data is from the mean.
Definition data point and the mean.
1. Calculate the mean of the
1. Calculate the mean of the data set.
data set.
2. Calculate the difference between
2. Calculate the absolute
each data point and the mean.
value of the difference
3. Square each of those differences.
between each data point
4. Take the average of the squared
and the mean.
differences.
3. Take the average of those
5. Take the square root of the result.
Calculation absolute values.
Useful when the data set Useful when the data set does not
contains outliers, as it contain outliers,
is not affected by extreme as it provides a more accurate
Use values. measure of the spread of the data.
Example: Mean Absolute Deviation About the Mean
Suppose that we start with the following data set:
1, 2, 2, 3, 5, 7, 7, 7, 7, 9.
The mean of this data set is 5. The following table will organize our work in
calculating the mean absolute deviation about the mean.
Data Deviation from mean Absolute Value of
Value Deviation
1 1 - 5 = -4 |-4| = 4
2 2 - 5 = -3 |-3| = 3
2 2 - 5 = -3 |-3| = 3
3 3 - 5 = -2 |-2| = 2
5 5-5=0 |0| = 0
7 7-5=2 |2| = 2
7 7-5=2 |2| = 2
7 7-5=2 |2| = 2
7 7-5=2 |2| = 2
9 9-5=4 |4| = 4
Total of Absolute 24
Deviations:
We now divide this sum by 10, since there are a total of ten data values. The
mean absolute deviation about the mean is 24/10 = 2.4.
Example: Mean Absolute Deviation About the Mean
Now we start with a different data set:
1, 1, 4, 5, 5, 5, 5, 7, 7, 10.
Just like the previous data set, the mean of this data set is 5.
Data Deviation from mean Absolute Value of
Value Deviation
1 1 - 5 = -4 |-4| = 4
1 1 - 5 = -4 |-4| = 4
4 4 - 5 = -1 |-1| = 1
5 5-5=0 |0| = 0
5 5-5=0 |0| = 0
5 5-5=0 |0| = 0
5 5-5=0 |0| = 0
7 7-5=2 |2| = 2
7 7-5=2 |2| = 2
10 10 - 5 = 5 |5| = 5
Total of Absolute 18
Deviations:
Thus the mean absolute deviation about the mean is 18/10 = 1.8. We compare
this result to the first example. Although the mean was identical for each of
these examples, the data in the first example was more spread out. We see
from these two examples that the mean absolute deviation from the first
example is greater than the mean absolute deviation from the second example.
The greater the mean absolute deviation, the greater the dispersion of our
data.
Example: Mean Absolute Deviation About the Median
Start with the same data set as the first example:
1, 2, 2, 3, 5, 7, 7, 7, 7, 9.
The median of the data set is 6. In the following table, we show the details of
the calculation of the mean absolute deviation about the median.
Data Deviation from median Absolute Value of
Value Deviation
1 1 - 6 = -5 |-5| = 5
2 2 - 6 = -4 |-4| = 4
2 2 - 6 = -4 |-4| = 4
3 3 - 6 = -3 |-3| = 3
5 5 - 6 = -1 |-1| = 1
7 7-6=1 |1| = 1
7 7-6=1 |1| = 1
7 7-6=1 |1| = 1
7 7-6=1 |1| = 1
9 9-6=3 |3| = 3
Total of Absolute 24
Deviations:
Again we divide the total by 10 and obtain a mean average deviation about the
median as 24/10 = 2.4.
Example: Mean Absolute Deviation About the Median
Start with the same data set as before:
1, 2, 2, 3, 5, 7, 7, 7, 7, 9.
This time we find the mode of this data set to be 7. In the following table, we
show the details of the calculation of the mean absolute deviation about the
mode.
Dat Deviation from mode Absolute Value of
a Deviation
1 1 - 7 = -6 |-5| = 6
2 2 - 7 = -5 |-5| = 5
2 2 - 7 = -5 |-5| = 5
3 3 - 7 = -4 |-4| = 4
5 5 - 7 = -2 |-2| = 2
7 7-7=0 |0| = 0
7 7-7=0 |0| = 0
7 7-7=0 |0| = 0
7 7-7=0 |0| = 0
9 9-7=2 |2| = 2
Total of Absolute 22
Deviations:
We divide the sum of the absolute deviations and see that we have a mean
absolute deviation about the mode of 22/10 = 2.2.
Distribution of errors
Error Distribution of Machine Learning
Model