M.E.
S INDIAN SCHOOL, DOHA - QATAR
Notes 5 2025-2026
Section: Girls’/Boys’ Date : 13-09-2025
Class & Div. : 10 All Divisions Subject: AI
Lesson / Topic: Unit 6:Evaluating models
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Evaluation
Model evaluation is the process of using different evaluation metrics to understand a
machine learning model’s performance. An AI model gets better with constructive
feedback, you build a model, get feedback from metrics, make improvements and
continue until you achieve a desirable accuracy.
Evaluating Models
The evaluation process uses different evaluation metrics to understand a machine
learning model’s performance, strengths, and weaknesses. Evaluation is the process
of understanding the reliability of any AI model, based on outputs by feeding test
dataset into the model and comparing with actual answers. There can be different
Evaluation techniques, depending on the type and purpose of the model.
Need of an evaluation model
Evaluation models are methods for evaluating and choosing the best model during
the modeling process. The model evaluation is like giving your AI model a report card.
It helps you understand its strengths, weaknesses, and suitability for the task at hand.
This feedback loop is essential for building trustworthy and reliable AI systems.
F 061, Rev 01, dtd 10th March 2020 1|Page
Splitting the training set data for Evaluation
The train-test split is a technique for evaluating the performance of a machine
learning algorithm. It can be used for any supervised learning algorithm. The
evaluation model divides the dataset into a training set and a testing set.
Need of Train-test split
The train dataset is used to make the model learn, the input elements of the test
dataset are provided to the trained model. The model makes predictions, and the
predicted values are compared to the expected values. The objective is to estimate
the performance of the machine learning model on new data: data not used to train
the model.
Accuracy and Error
In AI model evaluation accuracy and error are key metrics which helps to understand
how well a model performs and identify the areas for improvement. In AI model
evaluation, higher accuracy means a model is better, while lower error indicates less
mistakes.
Accuracy – Accuracy is an evaluation metric that allows you to measure the
total number of predictions a model gets right. The accuracy of the model and
performance of the model is directly proportional, and hence better the
performance of the model, the more accurate are the predictions.
Error – Error can be described as an action that is inaccurate or wrong. In
Machine Learning, the error is used to see how accurately our model can
predict data it uses to learn new, unseen data. Based on our error, we choose
the machine learning model which performs best for a particular dataset.
To find accuracy of the AI model
To find the accuracy of an AI model, we have to first calculate the
percentage of correct predictions made by the testing dataset. The formula to find the
accuracy is—
Error = Actual – Predicted
Error Rate = Error / Actual Price
Accuracy = 1 – Error Rate
Accuracy in percentage = Accuracy * 100
Calculate the accuracy of the House Price prediction AI model
F 061, Rev 01, dtd 10th March 2020 2|Page
Predicted Actual
Error Abs Accuracy
House House Error Rate Accuracy%
(Actual- (1-Error
Price Price (Error/Actual) (Accuracy*100)%
Predicted) rate)
(USD) (USD)
Abs (402k- 1-0.027= 0.973*100%=
391k 402k 11k/402k=0.027
391k)=11k 0.973 97.3%
Given values:
Predicted House Price = 391k
Actual House Price = 402k
Step 1: Calculate Absolute Error
Error: 402k−391k = 11k
Step 2: Calculate Error Rate
Error Rate: 11 / 402 = 0.0274
Step 3: Calculate Accuracy
Accuracy: 1 – 0.0274 = 0.9726
Step 4: Convert to Percentage
Accuracy in percentage: 0.973 × 100 = 97.3%
Evaluation metrics for classification
Classification
In artificial intelligence classification is a technique that organizes data into categories.
It’s a type of machine learning that uses algorithms to sort data into predefined
classes. You go to a supermarket and were given two trolleys, in one, you have to
place the fruits and vegetables; in the other, you must put the grocery items like
bread, oil, egg, etc. So basically, you are classifying the items of the supermarket into
two classes:
fruits and vegetables
grocery
F 061, Rev 01, dtd 10th March 2020 3|Page
Classification metrics
Classification metrics are used to evaluate the performance of a classification model
in machine learning, or you can say that it is performance measures used to evaluate
the effectiveness of the model. It helps to compare between different models and
identify the best one.
Different types of classification techniques in AI
Popular metrics used for classification model
Confusion matrix
Classification accuracy
Precision
Recall
F1 Score
1. Confusion Matrix
The confusion matrix is a handy presentation of the accuracy of a model with two or
more classes. The confusion matrix comparison between the prediction and reality
and can be recorded in what we call the confusion matrix. The confusion matrix
allows us to understand the prediction results.
It consists of four values:
True Positive (TP): Correctly predicted positive cases.
False Negative (FN): Model predicted negative, but it was actually positive.
False Positive (FP): Model predicted positive, but it was actually negative.
True Negative (TN): Correctly predicted negative cases.
Prediction and Reality can be easily mapped together with the help of this confusion
matrix.
2. Classification accuracy
F 061, Rev 01, dtd 10th March 2020 4|Page
Classification accuracy allows you to count the total number of accurate predictions
made by a model. The accuracy calculation is as follows: How many of the model
predictions were accurate will be determined by accuracy. True Positives and True
Negatives are what accuracy considers.
Converting the accuracy to percentage: = %
Here, total observations cover all the possible cases of prediction that can be True
Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN).
3. Precision
Precision is the ratio of the total number of correctly classified positive examples and
the total number of predicted positive examples.
Precision use case example
▪ For example, take the case of predicting a good day based on weather conditions to
launch satellite.
▪ Let’s assume a day with favorable weather condition is considered Positive class
and a day with non-favorable weather condition is considered as Negative class.
▪ Missing out on predicting a good weather day is okay (low recall) but predicting the
bad weather day (Negative class) as a good weather day (Positive class) to launch
the satellite can be disastrous.
▪ So, in this case, the FPs need to be reduced as much as possible.
F 061, Rev 01, dtd 10th March 2020 5|Page
3. Recall
The recall is the measure of our model correctly identifying True Positives ▪ Thus, for
all the patients who actually have heart disease, recall tells us how many we correctly
identified as having a heart disease.
Recall use case example
For example, for a covid-19 prediction classifier, let’s consider detection of a covid-19
affected case as positive class and detection of covid-19 non-affected case as
negative class.
▪ Imagine if a covid-19 affected person (Positive) is falsely predicted as non-affected
of Covid-19 (Negative), the person if rely solely on the AI would not get any treatment
and also may end up infecting many other persons.
▪ So, in this case, the FNs needs to be reduced as much as possible.
4. F1 Score
F1 score can be defined as the measure of balance between precision and recall or
F1-Score provides a way to combine both precisions and recall into a single measure
that captures both properties.
Take a look at the formula and think of when can we get a perfect F1 score?
An ideal situation would be when we have a value of 1 (that is 100%) for both
Precision and Recall. In that case, the F1 score would also be an ideal 1 (100%). It is
known as the perfect value for F1 Score. As the values of both Precision and Recall
ranges from 0 to 1, the F1 score also ranges from 0 to 1.
Let us explore the variations we can have in the F1 Score:
F 061, Rev 01, dtd 10th March 2020 6|Page
1. Let’s assume we were predicting the presence of a disease; for example, “yes”
would mean they have the disease, and “no” would mean they don’t have the
disease. So, the AI model will have output is Yes or No.
Actual Predicted
Yes Yes
No No
Yes No
No Yes
Yes Yes
Yes No
No No
Yes Yes
No Yes
Yes Yes
Now, count each type of prediction:
TP (Yes, Yes) = 4
FN (Yes, No) = 2
F 061, Rev 01, dtd 10th March 2020 7|Page
FP (No, Yes) = 2
TN (No, No) = 2
The matrix based on the table given here
Let’s find the accuracy:
Accuracy = (TP + TN) / Total Predictions
= (4+2) / 10
=6 / 10
= 0.6
The model correctly predicted 6 out of 10 cases, meaning the accuracy is 60%
Can we use Accuracy all the time?
It is only suitable when there are an equal number of observations in each class, i.e.,
a balanced dataset (which is rarely the case), and that all predictions and prediction
errors are equally important, which is often not the case.
Classification Accuracy Calculation
[Link]’s assume you are testing your model on 1000 total test data. Out of which the
actual values are 900 Yes and only 100 No (Unbalanced dataset). Let’s assume that
F 061, Rev 01, dtd 10th March 2020 8|Page
you have built a faulty model which, irrespective of any input, will give a prediction as
Yes.
True Positives (TP) = 900
False Negatives (FN) = 0
False Positives (FP) = 100
True Negatives (TN) = 0
Now, applying the formula:
Accuracy = (TP + TN) / (TP + TN + FP + FN)
= (900 + 0) / (900 + 0 + 100 + 0)
= 900 / 1000
= 0.9
Accuracy = 0.9 x 100 = 90%
Now the model is 90% accurate; it is misleading because it never predicts “no.” We
should use precision, recall, and F1 score to get better evaluation.
Ethical concerns around model evaluation
Ethical concerns around model evaluation primarily focus on three aspects: bias,
transparency, and accuracy. Nowadays, we are moving from the Information era to
the Artificial Intelligence era. Now we do not use data or information, but the
intelligence collected from the data to build solutions. We need to keep aspects
relating to ethical practices in mind while developing solutions using AI. Let us
understand some of the ethical concerns in detail.
F 061, Rev 01, dtd 10th March 2020 9|Page
Bias – Bias occurs when a model generates unfair or discriminatory results.
This can happen due to the model favoring certain groups or due to the
algorithm. For example, if the AI application of Amazon is favoring male
candidates only, then the maximum product suggestion will be shown only to
male candidates, which will decrease the profit of the company.
Transparency – The AI decision-making process should be transparent;
people can easily understand and interpret the result. If the lack of
transparency issue is there, then the people will not trust the model. For
example, if any person has applied for a loan and the AI model denies a loan
application of any candidate, then it is the duty of the AI that the applicant
should know why the loan application is rejected.
Accuracy – The AI model should predict the correct result. The accurate model
makes error-free and reliable results. For example, in medicine, an AI model
should diagnose and generate accurate predictions; otherwise, due to wrong
diagnoses, it can lead to a serious illness in the people
*******************************THE END*********************************
***************************************
F 061, Rev 01, dtd 10th March 2020 10 | P a g e