0% found this document useful (0 votes)
21 views18 pages

AI Model Evaluation Question Bank

The document is a question bank for evaluating AI models, consisting of multiple-choice, short answer, long answer, and case study-based questions. It covers various evaluation metrics such as accuracy, precision, recall, and F1 score, and emphasizes the importance of model evaluation in AI projects. Additionally, it includes practical examples and calculations related to model performance assessment.

Uploaded by

sandipanasane999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views18 pages

AI Model Evaluation Question Bank

The document is a question bank for evaluating AI models, consisting of multiple-choice, short answer, long answer, and case study-based questions. It covers various evaluation metrics such as accuracy, precision, recall, and F1 score, and emphasizes the importance of model evaluation in AI projects. Additionally, it includes practical examples and calculations related to model performance assessment.

Uploaded by

sandipanasane999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Unit–3 Evaluating Models

Question Bank

Q.1 MULTIPLE CHOICE QUESTIONS:

Q.2 SHORT ANSWER TYPE QUESTIONS

Q.3 LONG ANSWER TYPE QUESTIONS

Q.4 CASE STUDY BASED QUESTIONS

Sanskriti School, Pune Page 1


Q.1 MULTIPLE CHOICE QUESTIONS

1. Which of the following sentences is correct regarding Evaluation?


a) It is a process to collect and clean the data for machine learning
b) It is the process to separate data for training and testing purpose
c) It is the process to develop algorithm and train it to get intelligent output
d) It is the process of using different evaluation metrics to understand a machine learning
model’s performance

2. The goal of evaluating an AI model is to:


a) Maximize error and minimize accuracy
b) Minimize error and maximize accuracy
c) Focus solely on the number of data points used
d) Prioritize the complexity of the model

3. A high F1 score generally suggests:


a) A significant imbalance between precision and recall
b) A good balance between precision and recall
c) A model that only performs well on specific data points
d) The need for more training data

4. How is the relationship between model performance and accuracy described?


a) Inversely proportional
b) Not related
c) Directly proportional
d) Randomly fluctuating

5. What is not true about error in evaluating a regression model?


a) Error refers to the difference between a model's prediction and the actual outcome.
b) Based on our error, we choose the machine learning model which performs best for a
particular dataset.
c) It quantifies how often the model makes mistakes.
d) Error = Actual value / Predicted value

6. Which of the following is a classification use case example?


a) House price prediction
b) Credit card fraud detection
c) Salary prediction
d) Product recommendation
Sanskriti School, Pune Page 2
Assertion and reasoning-based questions:
Q7. Assertion (A): Accuracy is an evaluation metric that allows you to measure the total
number of predictions a model gets right.
Reasoning(R): The accuracy of the model and performance of the model is directly
proportional, and hence better the performance of the model, the more accurate are the
predictions.
Choose the correct option:
a) Both A and R are true and R is the correct explanation for A
b) Both A and R are true but R is not the correct explanation for A
c) A is True but R is False
d) A is false but R is True

Q8. Assertion (A): The sum of the values in a confusion matrix's row represents the total
number of instances for a given actual class.
Reasoning (R): This enables the calculation of class-specific metrics such as precision and
recall, which are essential for evaluating a model's performance across different classes.
Choose the correct option:
a) Both A and R are true and R is the correct explanation for A
b) Both A and R are true but R is not the correct explanation for A
c) A is True but R is False
d) A is false but R is True

Q.2 SHORT ANSWER TYPE QUESTIONS


1. “The data we use to train the model, should not be used for its evaluation”. Why?
Ans – We should not use the training data during evaluation because our model will simply
remember the whole training set, and will therefore always predict the correct label for any
point in the training set. This is known as overfitting.

2. Name the Popular metrics used for classification model


Ans -
▪ Confusion matrix
▪ Classification accuracy
▪ Precision
▪ Recall

3. What do you mean by Accuracy in a confusion matrix in classification model? How is it


calculated?
Ans - Classification accuracy is the number of correct predictions made as a ratio of all
predictions made.

Sanskriti School, Pune Page 3


𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
Classification Accuracy =
𝑇𝑜𝑡𝑎𝑙 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
𝑇𝑃+𝑇𝑁
=
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁

4. What do you mean by Precision in a confusion matrix in classification model? Write the
formula to calculate it.
Ans –
Precision is the ratio of the total number of correctly classified positive examples and the
total number of predicted positive examples.

5. Where should we use the precision matrix?


Ans - The metrics Precision is generally used for unbalanced datasets when dealing with the
False Positives become important, and the model needs to reduce the FPs as much as
possible.

6. What is a Recall value in confusion matrix? Write the formula to calculate it.
Ans -
The recall is the measure of our model correctly identifying True Positives

7. Where should we use the Recall matrix?


Ans - The metrics Recall is generally used for unbalanced dataset when dealing with the
False Negatives become important and the model needs to reduce the FNs as much as
possible.

8. In which situation shall we use the F1 Score matrix? Write formula to calculate it.
Ans -
In those use cases, where the dataset is unbalanced, and we are unable to decide whether
FP is more important or FN, we should use the F1 score as the suitable metric.

Sanskriti School, Pune Page 4


F1-Score provides a way to combine both precisions and recall into a single measure that
captures both properties

Q.9 What will happen if you deploy an AI model without evaluating it with known test set
data?
Ans – Model evaluation is like giving your AI model a report card. It helps you understand its
strengths, weaknesses, and suitability for the task at hand. This feedback loop is essential
for building trustworthy and reliable AI systems.
If you will deploy a model without evaluating it with known data set you will not be able to
understand the performance of that model and the predictions of that model may not be
trustworthy.

Q.3 LONG QUESTION ANSWERS-

Q. 1 Do you think evaluating an AI model is that essential in an AI project cycle?


Ans –
• Yes. Evaluation of an AI model is essential in AI Project Cycle.
• Model evaluation is the process of using different evaluation metrics to understand
its performance.
• An AI model gets better with constructive feedback.
• You build a model, get feedback from metrics, make improvements and continue until
you achieve a desirable accuracy

Q. 2 Explain train-test split with an example.


Ans –
The train-test split is a technique for evaluating the performance of a machine learning
algorithm. The train-test procedure is appropriate when there is a sufficiently large dataset
available
Working –
1. The given dataset is divided into two subsets: The training dataset and the testing
dataset.
2. The train dataset is used to make the model learn.
3. Then the testing dataset (the data which is not used to train the model) is provided to
the trained model and the model makes predictions.
4. The predicted values are compared with the expected values and the performance of
the model is evaluated.

Sanskriti School, Pune Page 5


Ex. Consider that there are 10000 labelled data for image classification model. Then 8000
labelled images will be used for training and remaining 2000 images will be used for testing.

Q.3 “Understanding both error and accuracy is crucial for effectively evaluating and improving
AI models.” Justify this statement.
Ans –
• Understanding both errors and accuracy is essential for Balanced Evaluation.
• Accuracy tells us how often the model makes correct predictions.
• Error (often represented as the complement of accuracy, i.e., 1 – accuracy) tells us
how often it makes mistakes.
• Relying on only one measure (like accuracy) may give a false sense of performance,
especially in imbalanced datasets (e.g., rare disease detection).
So, Understanding both error and accuracy is crucial for effectively evaluating and
improving AI models.

Q. 4 What is classification accuracy? Can it be used all times for evaluating AI models?
Ans – Classification Accuracy is a metric used to evaluate how well an AI model (especially
in classification tasks) is performing. It is defined as:

No, we cannot use Accuracy for evaluating AI models because accuracy is not always a reliable
metric. While it's simple and intuitive, it can be misleading in certain situations as follows-
1. Imbalanced Datasets:
If 95% of the samples belong to one class, a model that always predicts that class will have
95% accuracy—but it is of no use because it fails to detect the minority class.
Example:
o Disease detection dataset: 990 healthy, 10 sick
o Model predicts all as healthy → Accuracy = 99%
o But it missed all the sick cases (0 true positives)

Sanskriti School, Pune Page 6


2. Does not reflect the types of errors:
o Accuracy doesn't specify what kind of mistakes the model is making—false
positives vs false negatives—which may be critical in areas like:
▪ Medical diagnosis
▪ Fraud detection
▪ Spam filtering

Conclusion-
Accuracy is a basic and useful metric but is not always sufficient, especially with
imbalanced data. It is best to use it alongside other metrics (such as precision, recall,
F1 score) for a complete evaluation of AI models.

Q.4 CASE STUDY-BASED QUESTIONS –


Q.1 Calculate the accuracy of the House Price prediction AI model
Read the instructions and fill in the blank cells in the table.
The formula for finding error and accuracy is shown in the table.
Accuracy of the AI model is the mean accuracy of all five samples
Percentage accuracy can be seen by multiplying the accuracy with 100
Ans –
Predicted Actual House Error Error Rate Accuracy Accuracy%
House Price (USD) Abs (Error/Actual) (1-Eror Rate) (Accuracy*100)%
Price (Actual-
(USD) Predicted)
391K 402K Abs(402K- 11K/402K = 0.027 1-0.027=0.973 0.973*100%=97.3%
391K)=11K

453K 488K Abs(488K- 35K/488K= 0.071 1-0.071= 0.92 0.92*100% = 92%


453K)=35K
125K 97K Abs(97K-125K) 28K/97K=0.288 1-0.288=0.712 0.712*100% =
= 28K 71.2%
871K 907K Abs(907K- 36K/907K= 0.039 1-0.039 = 0.960 0.960*100% =
871K)= 36K 96.0%
322K 425 Abs(425K- 103K/425K = 1- 0.242=0.757 0.757*100%=75.7%
322K)=103K 0.242

Q.2 In a medical test for a rare disease, out of 1000 people tested, 50 actually have the
disease while 950 do not. The test correctly identifies 40 out of the 50 people with the
disease as positive, but it also wrongly identifies 30 of the healthy individuals as positive.
What is the accuracy of the test?
Ans -
Accuracy = (TP+TN) / (TP+TN+FP+FN)
Sanskriti School, Pune Page 7
Here,
TP = No. of people predicted as having disease and actually having disease
= 40
TN = No. of people predicted as having no disease and actually having no disease
= (950-30)
= 920
FP = No. of people predicted as having disease but actually having no disease
= 30
FN = No. of people predicted as having no disease but actually having disease
= 50-40
= 10

Confusion matrix will be as follows –


Predicted Values
Yes No
Actual values

TP FN
Yes

40 10
FP TN
30 920
No

𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
Accuracy =
𝑇𝑜𝑡𝑎𝑙 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
𝑇𝑃+𝑇𝑁
=
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
= (40 + 920) / (40 + 10 + 30 +920)
= 960/1000
= 0.96
Accuracy = 0.96
Accuracy percentage = 96%
So, the Accuracy of this model is 96%

Q.3 A student solved 90 out of 100 questions correctly in a multiple- choice exam. What is
the error rate of the student's answers?
Total answers = 100
Student’s correct answers = 90
Error = Abs(Actual- Predicted)
= 100-90
= 10
Error Rate = (Error / Total correct answers)
= 10/100
= 0.1
Sanskriti School, Pune Page 8
Q.4 In a spam email detection system, out of 1000 emails received, 300 are spam. The
system correctly identifies 240 spam emails as spam, but it also marks 60 legitimate
emails as spam. What is the precision of the system?
Ans –
TP = No. of predicted spam mails that are actually spam
= 240
FN = No. of mails predicted as not spam but actually they are spam
= 300-240
= 60
FP = No. of legitimate mails predicted as spam
= 60
TN = No. of legitimate(Not spam) mails predicted as not spam
= 700 - 60
= 640
Predicted Values
Yes No
Actual values

TP FN
Yes
240 60
FP TN
No 60 640

= 240/(240 + 60)
= 240 / 300
= 4/5
= 0.8

Q.5 In a binary classification problem, a model predicts 70 instances as positive out of


which 50 are actually positive. What is the recall of the model?
Ans =
Recall = TP / (TP + FN)
Here, we don’t know the total number of actual positive cases, we can’t calculate FN which
is essential in this formula.
But we can calculate Precision of this model –
Precision = TP / (TP+FP)
= 50 / (50 + 20)
= 50 / 70
= 0.71
Sanskriti School, Pune Page 9
Q.6 In a sentiment analysis task, a model correctly predicts 120 positive sentiments out
of 200 positive instances. However, it also incorrectly predicts 40 negative sentiments as
positive. What is the F1 score of the model? (Options not matching with correct answer)
A) 0.8
B) 0.75
C) 0.72
D) 0.82
Ans –

Precision = TP/ (TP + FP)


Recall = TP / (TP + FN)
Here,
TP = 120,
FN = 200-120 = 80
FP = 40
Precision = 120/(120+40)
= 120/160
= 0.75
Recall = 120 / (120+80)
= 120 / 200
= 3/5
= 0.6
So,
F1 Score = 2 X (0.75 X 0.6) / (0.75 +0.6)
= 2 X 0.45/1.35
= 2 X 0.333
= 0.667

Q.7 A medical diagnostic test is designed to detect a certain disease. Out of 1000 people
tested, 100 have the disease, and the test identifies 90 of them correctly. However, it
also wrongly identifies 50 healthy people as having the disease. What is the precision of
the test? (Options not matching with correct answer)

A) 90%
B) 80%
C) 70%
D) 60%

Sanskriti School, Pune Page 10


Ans = Predicted Values
Yes No

Actual values
TP FN
Yes 90 10
FP TN
No 50 850

Precision = TP / (TP + FP)


= 90 / (90+50)
= 90/ 140
= 0.6429
Precision = 0.6429
Precision Percentage = 64.29%

Q.8 A teacher's marks prediction system predicts the marks of a student as 75, but the
actual marks obtained by the student are 80. What is the absolute error in the prediction?
Ans – Actual marks = 80
Predicted marks = 75
Absolute error = Abs(Actual – predicted)
= Abs(80-75)
=5
a) 5
b) 10
c) 15
d) 20

Q.9 Identify which metric (Precision or Recall) is to be used in the following cases and
why?
a) Email Spam Detection
b) Cancer Diagnosis
c) Legal Cases (Innocent until proven guilty)
d) Fraud Detection
e) Safe Content Filtering (like Kids YouTube)
Ans –

Sanskriti School, Pune Page 11


a) Email Spam Detection –

Predicted Values
Spam Legitimate
TP FN

Actual values
Spam
FP TN
Legitimate

FP = Legitimate (Not spam) Email wrongly predicted as spam is more harmful than
FN = Spam mail wrongly predicted as legitimate mail.
Here, FP is more harmful than FN, so FP should be minimized. So the metric to be
used is – Precision.

b) Cancer diagnosis –
Predicted Values
Yes No
Actual values

TP FN
Yes
FP TN
No

FP = Patient with no cancer is predicted as having cancer is less harmful than


FN = Patient with cancer is predicted as not having cancer.
Here, FN is more harmful than FP, so FN should be minimized. So, the metric to be
used is – Recall

c) Legal Cases (Innocent until proven guilty) :


Predicted Values
Guilty Innocent
TP FN
Actual values

Guilty

FP TN
Inno.

If the model is predicting Guilty people, then FP (that means, a model predicted a
person as guilty but he/she was actually innocent) is more important. So, here
Precision must be used.

Sanskriti School, Pune Page 12


d) Fraud Detection:
Predicted Values
Fraud not-fraud
TP FN

Actual values
Fraud

FP TN
Not-
Fraud

If the model is predicting for fraud instances, then a ‘Fraud’ instance not being detected by
the model(FN) is more harmful than a ‘not-fraud’ instance predicted as fraud(FP). So, Recall
should be used.

e) Safe Content Filtering (like Kids YouTube) :


Predicted Values
Safe Unsafe
TP FN
Actual values

Safe

FP TN
Unsafe

If the model is predicting the unsafe content to be safe(FP) is more harmful than model
predicting safe content as Unsafe(FN). So, Precision should be used.

Q.10 Examine the following case studies. Draw the confusion matrix and calculate metrics
such as accuracy, precision, recall, and F1-score for each one of them.
a. Case Study 1:
A spam email detection system is used to classify emails as either spam (1) or not spam (0).
Out of 1000 emails:
True Positives (TP): 150 emails were correctly classified as spam.
False Positives (FP): 50 emails were incorrectly classified as spam.
True Negatives (TN): 750 emails were correctly classified as not spam.
False Negatives (FN): 50 emails were incorrectly classified as not spam.
𝑇𝑃+𝑇𝑁
Ans – Accuracy =
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
150 +750
=
150+750+50+50
900
=
1000
= 0.9
So, Accuracy = 0.9
Accuracy Percentage = 90%

Sanskriti School, Pune Page 13


𝑇𝑃
Precision =
𝑇𝑃+𝐹𝑃
150
=
150+50
150
=
200
= 0.75
So, Precision = 0.75
𝑇𝑃
Recall =
𝑇𝑃+𝐹𝑁
150
=
150+50
150
=
200
= 0.75
So, recall = 0.75
2 𝑋 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑋 𝑅𝑒𝑐𝑎𝑙𝑙
F1 Score =
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙

2 𝑋 0.75 𝑋 0.75
=
0.75+0.75

1.125
=
1.5
=0.75
So, F1 score = 0.75

b. Case Study 2:
A credit scoring model is used to predict whether an applicant is likely to default on a loan
(1) or not (0). Out of 1000 loan applicants:
True Positives (TP): 90 applicants were correctly predicted to default on the loan.
False Positives (FP): 40 applicants were incorrectly predicted to default on the loan.
True Negatives (TN): 820 applicants were correctly predicted not to default on the loan.
False Negatives (FN): 50 applicants were incorrectly predicted not to default on the loan.
Calculate metrics such as accuracy, precision, recall, and F1-score.
𝑇𝑃+𝑇𝑁
Ans = Accuracy =
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁

90+820 910
= = = 0.91
90+820+40+50 1000

So, Accuracy = 0.91


Accuracy percentage = 91%

Sanskriti School, Pune Page 14


𝑇𝑃
Precision =
𝑇𝑃+𝐹𝑃

90 90
= = = 0.692
90+40 130

So, Precision = 0.692

𝑇𝑃
Recall =
𝑇𝑃+𝐹𝑁
90 90
= = = 0.643
90+50 140

So, recall = 0.643

2 𝑋 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑋 𝑅𝑒𝑐𝑎𝑙𝑙
F1 Score =
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙

2 𝑋 0.692 𝑋 0.643 0.889


= = = 0.665
0.692+0.643 1.335

So, F1 score = 0.665

c. Case Study 3:
A fraud detection system is used to identify fraudulent transactions (1) from legitimate ones
(0). Out of 1000 transactions:
True Positives (TP): 80 transactions were correctly identified as fraudulent.
False Positives (FP): 30 transactions were incorrectly identified as fraudulent.
True Negatives (TN): 850 transactions were correctly identified as legitimate.
False Negatives (FN): 40 transactions were incorrectly identified as legitimate.

Ans –
𝑇𝑃+𝑇𝑁
Accuracy =
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁

80+850 930
= = = 0.93
80+850+30+40 1000

So, Accuracy = 0.93


Accuracy Percentage = 93%

Sanskriti School, Pune Page 15


𝑇𝑃
Precision =
𝑇𝑃+𝐹𝑃

80 80
= = = 0.727
80+30 110

So, Precision = 0.727

𝑇𝑃
Recall =
𝑇𝑃+𝐹𝑁
80 80
= = = 0.667
80+40 120

So, recall = 0.667

2 𝑋 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑋 𝑅𝑒𝑐𝑎𝑙𝑙
F1 Score =
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙

2 𝑋 0.727 𝑋 0.667 0.969


= = = 0.695
0.727 +0.667 1.394

So, F1 score = 0.695

d. Case Study 4:
A medical diagnosis system is used to classify patients as having a certain disease (1) or not
having it (0). Out of 1000 patients:
True Positives (TP): 120 patients were correctly diagnosed with the disease.
False Positives (FP): 20 patients were incorrectly diagnosed with the disease.
True Negatives (TN): 800 patients were correctly diagnosed as not having the disease.
False Negatives (FN): 60 patients were incorrectly diagnosed as not having the disease.

Ans –
𝑇𝑃+𝑇𝑁
Accuracy =
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁

120+800 920
= = = 0.92
120+800+20+60 1000

So, Accuracy = 0.92


Accuracy Percentage = 92%
Sanskriti School, Pune Page 16
𝑇𝑃
Precision =
𝑇𝑃+𝐹𝑃

120 120
= = = 0.857
120+20 140

So, Precision =

𝑇𝑃
Recall =
𝑇𝑃+𝐹𝑁
120 120
= = = 0.667
120+60 180

So, recall =

2 𝑋 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑋 𝑅𝑒𝑐𝑎𝑙𝑙
F1 Score =
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙

2 𝑋 0.857 𝑋 0.667 1.143


= = = 0.75
0.857 +0.667 1.524

So, F1 score = 0.75

e. Case Study 5:
An inventory management system is used to predict whether a product will be out of stock
(1) or not (0) in the next month. Out of 1000 products:
True Positives (TP): 100 products were correctly predicted to be out of stock.
False Positives (FP): 50 products were incorrectly predicted to be out of stock.
True Negatives (TN): 800 products were correctly predicted not to be out of stock.
False Negatives (FN): 50 products were incorrectly predicted not to be out of stock.

Ans –
𝑇𝑃+𝑇𝑁
Accuracy =
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁

100+800 900
= = = 0.9
100+800+50+50 1000

So, Accuracy = 0.9


Accuracy Percentage = 90%
Sanskriti School, Pune Page 17
𝑇𝑃
Precision =
𝑇𝑃+𝐹𝑃

100 100
= = = 0.667
100+50 150

So, Precision = 0.667

𝑇𝑃
Recall =
𝑇𝑃+𝐹𝑁
100 100
= = = 0.667
100+50 150

So, recall =

2 𝑋 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑋 𝑅𝑒𝑐𝑎𝑙𝑙
F1 Score =
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙

2 𝑋 0.667 𝑋 0.667 0.889


= = = 0.666
0.667 +0.667 1.334

So, F1 score = 0.666

Sanskriti School, Pune Page 18

You might also like