0% found this document useful (0 votes)
16 views5 pages

Evaluating AI Models: Techniques & Metrics

The document provides notes on evaluating AI models, focusing on the evaluation process, train-test split technique, accuracy, error, and classification metrics. It discusses the importance of understanding model performance through metrics like precision, recall, and confusion matrix, while also addressing ethical concerns in model evaluation. Additionally, it explains concepts such as overfitting, true positives, false positives, and includes examples for practical understanding.

Uploaded by

sreekargedela94
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views5 pages

Evaluating AI Models: Techniques & Metrics

The document provides notes on evaluating AI models, focusing on the evaluation process, train-test split technique, accuracy, error, and classification metrics. It discusses the importance of understanding model performance through metrics like precision, recall, and confusion matrix, while also addressing ethical concerns in model evaluation. Additionally, it explains concepts such as overfitting, true positives, false positives, and includes examples for practical understanding.

Uploaded by

sreekargedela94
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Class X

Part B – Unit 3: Evaluating Models Notes

1. Define Evaluation.
Ans: Evaluation is a process of understanding the reliability of any AI model, based on
outputs by feeding the test dataset into the model and comparing it with actual
answers.
2. Explain Train Test Split technique for evaluating.
Ans.
• The train-test split is a technique for evaluating the performance of a machine
learning algorithm.
• It can be used for any supervised learning algorithm.
• The procedure involves taking a dataset and dividing it into two subsets:
The training dataset and the testing dataset.
• The train-test procedure is appropriate when there is a sufficiently large dataset
available.

3. Need of Train-test split evaluating technique?


• The train dataset is used to make the model learn.
• The input elements of the test dataset are provided to the trained model. The model
makes predictions, and the predicted values are compared to the expected values
• The objective is to estimate the performance of the machine learning model on new
data: data not used to train the model
4. Explain Accuracy and Error
Ans.
 Accuracy:
 Accuracy is an evaluation metric that allows you to measure the total number of
predictions a model gets right.
 The accuracy of the model and performance of the model is directly proportional,
and hence better the performance of the model, the more accurate are the
predictions.
 Error:
 Error can be described as an action that is inaccurate or wrong.
 In Machine Learning, the error is used to see how accurately our model can predict
data it uses to learn new, unseen data.
 Based on our error, we choose the machine learning model which performs best for
a particular dataset.
 Error refers to the difference between a model's prediction and the actual
outcome. It quantifies how often the model makes mistakes.
5. List Classification Metrics.
Popular metrics used for classification model
 Confusion matrix
 Classification accuracy
 Precision
 Recall
6. Ethical concerns around model evaluation
Ans. While evaluating an AI model, the following ethical concerns need to be kept in
mind:
7. Which two parameters are considered for Evaluation of a model?
Ans: Prediction and Reality are the two parameters considered for Evaluation of a
[Link] “Prediction” is the output which is given by the machine. “Reality” is the
real scenario, when the prediction has been made.
8. What is TruePositive?
Ans: True positive is the outcome of the model correctly predicting the positive class .
The predicted value matches the actual value.
9. What is TrueNegative?
Ans: True negative is the outcome of the model correctly predicting the negative class.
Thee predicted value matches the actual value.
10. What is FalsePositive?
Ans: False positive is the outcome of the model wrongly predicting the negative class as
positive class.
11. What is FalseNegative?
Ans: False Negative (FN) is the outcome of the model wrongly predicting the positive class
as the negative class.
12. What is meant by Overfitting of Data?
Ans: Overfitting is the scenario where the model remembers the data in the training set ,
and always predicts the data in the training set with the correct label, for any point in
the training set and may fail to predict future observations in any unseen data set.
13. What is a confusion matrix? What is it used for?
Ans: A Confusion Matrix is a table that is often used to describe the performance of a
classification model on a set of test data for which the true values are known. It stores
the results of comparison between the prediction and reality. From the confusion
matrix, we can calculate parameters like recall, precision, F1 score which are used to
evaluate the performance of an AI model.
14. Explain the need for a train-test split with an example.
Ans. There is a need for a train test split since overfitting may occur. Overfitting is the
scenario where the model remembers the data in the training set , and always predicts
the data in the training set with the correct label, for any point in the training set and
may fail to predict future observations in any unseen data set. So The performance of
the model is estimated with the test data set, the data that is not used to train the
model.
Example: If there is a model to classify the images of flowers and
vegetables, it will correctly label the images given in the training data set but may fail
to label a new image which is not in the training set.
15. What is Accuracy? Mention its formula.
Ans: Accuracy is an evaluation metric that allows you to measure the total number of
Predictions a model gets right. The accuracy of the model and performance of the
model is directly proportional, and hence better the performance of the model, the
more accurate are the predictions.
Correct prediction =TP+TN
Total Predictions=TP+TN+FP+FN
Accuracy = Correct Predictions /Total Predictions
= (TP+TN)/(TP+TN+FP+FN)
16. What is Precision? Mention its formula.
Ans: Precision is the ratio of the total number of correctly classified positive examples
and the total number of predicted positive examples.
Precision = Correct Positive Predictions /Total Positive Predictions
= (TP)/(TP+FP)
It is used for unbalanced datasets when dealing with the False Positives becomes
important, and the model needs to reduce the FPs as much as possible.
17. What is Recall? Mention its formula.
Ans: Recall is the measure of the model correctly identifying True Positives. It is also
called Sensitivity or True Positive Rate. It is generally used for unbalanced dataset
when dealing with the False Negatives becomes important and the model needs to
reduce the FNs as much as possible.
Recall = Correct Positive Predictions /Total Actual Positive Values
= (TP)/(TP+FN)
18. Identify which metric (Precision or Recall) is to be used in the following cases and
why?
 Mail Spamming
 Gold Mining
 Viral Outbreak
Ans:
 Precision has to be used since False Positives (legitimate emails marked as spam)
have to be reduced as much as possible.
 Precision has to be used since False Positives((incorrectly identifying a non-gold
area as containing gold) have to be reduced as much as possible.
 Recall is important in this case since False negatives have to be reduced as much
as possible. False negatives in viral outbreak means failing to identify a person
with disease, which may have life threatening consequences.
19. An AI model made the following digital payment usage prediction in a state where
government has recently launched the facility of digital payments:
(i) Identify the total number of wrong predictions made by the model.
(ii) Calculate precision, recall and F1 Score.
Ans:
(i) The total number of wrong predictions made by the model is the sum of false
positive and false negative.= FP+FN
=40+12
= 52
(ii) Precision=TP/(TP+FP)
= 50/(50+40)
= 50/90
=0.55
Recall=TP/(TP+FN)
=50/(50+12)
=50/62
=0.81
F1 Score = 2*Precision*Recall/(Precision + Recall)
=2*0.55*.81/(.55+.81)
=.891/1.36
=0.65

Common questions

Powered by AI

False positives occur when a model incorrectly predicts a negative class as positive, while false negatives occur when a positive class is wrongly predicted as negative . In healthcare, a false negative might mean failing to detect a life-threatening disease, greatly risking patient safety, hence recall is prioritized to minimize such errors . Conversely, in email filtering, a false positive could lead to a legitimate email being marked as spam, demanding higher precision to prevent such errors . Thus, the impact varies significantly across different fields, shaping the choice of evaluation metric.

The train-test split technique divides a dataset into two subsets: a training set and a testing set, to evaluate how well a model generalizes to new data not seen during training . By training the model on one part of the dataset and validating it on a separate portion, this approach helps prevent overfitting, where the model might simply memorize the training data . This ensures that the model's performance is evaluated in a realistic manner, indicative of its potential performance in real-world scenarios.

The F1 Score is the harmonic mean of precision and recall, providing a single metric that balances and combines both the precision and recall of a model . This score is particularly significant in scenarios where a balance is needed between precision (minimizing false positives) and recall (minimizing false negatives), offering a comprehensive evaluation of a model’s performance especially in imbalanced datasets . It effectively captures both aspects of the model's predictive capabilities.

A confusion matrix consists of True Positives, True Negatives, False Positives, and False Negatives, which provide a detailed breakdown of a model’s prediction outcomes . It helps in assessing how often the model's predictions align with the actual outcomes, thus enabling calculation of various performance metrics like precision, recall, and F1 score . This matrix becomes a vital tool in identifying specific areas where a model excels or requires improvement.

Accuracy measures the total number of predictions a model gets right, showing how well a model can perform on the dataset it was trained on, thus serving as an indicator of the model's overall performance . The error quantifies the difference between the model's predictions and the actual outcomes, revealing how often the model makes mistakes . Together, these metrics help in understanding a model’s effectiveness and identify areas for improvement to enhance accuracy and minimize errors on new data.

Ethical concerns in model evaluation include biases that can be perpetuated or amplified by AI systems, privacy and consent related to data used during training, transparency in how evaluation metrics are reported and used, and ensuring fairness in how conclusions are drawn from model evaluations. It is crucial to be vigilant of these ethical considerations to prevent discrimination, establish trustworthiness, and uphold ethical standards in AI deployments .

Overfitting occurs when a machine learning model learns the details and noise in the training data to the extent that it negatively impacts the model's performance on new data . It results in the model performing exceptionally well on the training dataset but failing to generalize to unseen data, as the model may not extrapolate well beyond its training set . This diminishes the model’s utility in real-world applications where it must handle data it has not encountered before.

In financial fraud detection, precision is crucial because false positives—incorrectly flagging legitimate transactions as fraudulent—can result in customer dissatisfaction and operational inefficiencies . Although minimizing false negatives is equally important to catch actual fraud, reducing false positives ensures that resources are not wasted on investigating non-fraudulent activity, maintaining customer trust and financial operation seamlessness.

The evaluation parameters, prediction and reality, show how closely a model's outputs align with actual outcomes, highlighting the model's predictive accuracy . By understanding the divergence or convergence of predictions and actual values, stakeholders can adjust models to improve reliability and performance. Decision-making processes benefit by basing decisions on more accurate, data-driven insights, which reduces uncertainty and boosts confidence in AI system applications.

Precision is crucial in scenarios like mail spamming and gold mining because it minimizes false positives, preventing legitimate emails from being marked as spam and reducing the misidentification of non-gold areas as containing gold . In a viral outbreak, recall is vital to reduce false negatives, thus ensuring that as many actual disease cases as possible are identified to prevent dangerous life-threatening situations . Each choice of metric aligns with the specific priority of minimizing erroneous classifications in diverse operational contexts.

You might also like