Class X
Subject: AI
Topic: Evaluation
Worksheet with Solution
Q1. What will happen if you deploy an AI model without evaluating it with known test set
data?
Test sets simulate real-world scenarios in a controlled way. If we deploy AI model without evaluating then
the following may be the consequences:
1. We won’t know how well the model performs on unseen data.
2. This can lead to incorrect predictions, low reliability, and poor user experience.
3. Without testing, biases in training data may go unnoticed.
4. Businesses may face financial loss or reputational damage.
Q2. Do you think evaluating an AI model is that essential in an AI project cycle?
Yes, in essence, model evaluation is like giving your AI model a report card. It helps you understand its
strengths, weaknesses, and suitability for the task at hand. This feedback loop is essential for building
trustworthy and reliable AI systems.
Q3. Explain train-test split with an example.
-The train-test split is a technique for evaluating the performance of a machine learning algorithm
-It can be used for any supervised learning algorithm
- The procedure involves taking a dataset and dividing it into two subsets: The training dataset and the
testing dataset
-The train-test procedure is appropriate when there is a sufficiently large dataset available
Example:
Q4. “Understanding both error and accuracy is crucial for effectively evaluating and
improving AI models.” Justify this statement.
The statement “Understanding both error and accuracy is crucial for effectively evaluating and improving
AI models” is absolutely justified because error and accuracy are two sides of the same coin—they provide
a balanced and complete view of model performance. Accuracy is an evaluation metric that allows you to
measure the total number of predictions a model gets right. Error refers to the difference between a
model's prediction and the actual outcome. It quantifies how often the model makes mistakes. The goal is
to minimize error and maximize accuracy.
Q5. What is classification accuracy? Can it be used all times for evaluating AI models?
Classification accuracy is the number of correct predictions made as a ratio of all predictions made.
In case of imbalanced datasets (e.g., 95% "no disease", 5% "disease"), a model can achieve 95% accuracy
by always predicting “no disease” but fails the 5% who need diagnosis. Hence Accuracy cannot be used
always.
Case study-based questions:
Q1. Identify which metric (Precision or Recall) is to be used in the following cases and
why?
a) Email Spam Detection
b) Cancer Diagnosis
c) Legal Cases (Innocent until proven guilty)
d) Fraud Detection
e) Safe Content Filtering (like Kids YouTube)
False Positive is more costly, and hence Precision False Negative is more costly hence Recall metric
metric will be used will be used
a) Email Spam Detection b) Cancer Diagnosis
c) Legal Cases (Innocent until proven guilty) d) Fraud Detection
e) Safe Content Filtering (like Kids YouTube)
Q2. Examine the following case studies. Draw the confusion matrix and calculate metrics
such as accuracy, precision, recall, and F1-score for each one of them.
a. Case Study 1: A spam email detection system is used to classify emails as either spam
(1) or not spam (0). Out of 1000 emails: -
-150 emails were correctly classified as spam.
- 50 emails were incorrectly classified as spam.
- 750 emails were correctly classified as not spam.
- 50 emails were incorrectly classified as not spam
Confusion Matrix
Reality/Prediction→ Yes No
Yes 150 (TP) 50 (FN)
No 50 (FP) 750 (TN)
(TP+TN)
Classification Accuracy%= 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 𝑥100= (150+750)x100/1000=90%
(TP)
Precision=𝑇𝑃+𝐹𝑃= 150/(150+50)=0.75
(TP)
Recall=𝑇𝑃+𝐹𝑁=150/(150+50)=0.75
PrecisionxRecall
F1 Score=𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙 𝑥2= 2x(0.75x0.75)/(0.75+0.75)=1.125/1.5=0.75