0% found this document useful (0 votes)

5 views21 pages

Deep Learning Model Evaluation Metrics

notes

Uploaded by

DIYA MEERA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views21 pages

Deep Learning Model Evaluation Metrics

notes

Uploaded by

DIYA MEERA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

1

AD3501/DEEP LEARNING/UNIT IV/AI&DS/SRRCET

SRI RAJA RAAJAN COLLEGE OF ENGINEERING AND TECHNOLOGY

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

III-YEAR-AI&DS
2021 R
SUB CODE: AD3501
SUB NAME: DEEP LEARNIG

UNIT IV
MODEL EVALUATION

Performance metrics -- Baseline Models -- Hyperparameters: Manual

Hyperparameter -- Automatic Hyperparameter -- Grid search -- Random
search -- Debugging strategies.

PREPARED BY VERIFIED BY APPROVED BY

[Link] CHITHRA HOD DEAN

1
2
AD3501/DEEP LEARNING/UNIT IV/AI&DS/SRRCET

[Link] In Details About Performance Metrics

 To evaluate the performance or quality of the model, different metrics are used, and these metrics are known as
performance metrics or evaluation metrics.

 These performance metrics help us understand how well our model has performed for the given data. In this way,
we can improve the model's performance by tuning the hyper-parameters. Each ML model aims to generalize well
on unseen/new data, and performance metrics help determine how well the model generalizes on the new dataset.

[Link] machine learning, how each task or problem is divided into

• Classification
• Regression

[Link] In Details About Performance Metrics for Classification

In a classification problem, the category or classes of data is identified based on training data. The model
learns from the given dataset and then classifies the new data into classes or groups based on the training. It
predicts class labels as the output, such as Yes or No, 0 or 1, Spam or Not Spam, etc. To evaluate the
performance of a classification model, different metrics are used, and some of them are as follows:

o Accuracy
o Confusion Matrix
o Precision
o Recall
o F-Score
o AUC(Area Under the Curve)-ROC

[Link] Accuracy

The accuracy metric is one of the simplest Classification metrics to implement, and it can be determined as the number
of correct predictions to the total number of predictions

[Link] to Use Accuracy?

It is good to use the Accuracy metric when the target variable classes in data are approximately balanced. For example,
if 60% of classes in a fruit image dataset are of Apple, 40% are Mango. In this case, if the model is asked to predict
whether the image is of Apple or Mango, it will give a prediction with 97% of accuracy.

[Link] Confusion Matrix

A confusion matrix is a tabular representation of prediction outcomes of any binary classifier, which is used to
describe the performance of the classification model on a set of test data when true values are known.

2
3
AD3501/DEEP LEARNING/UNIT IV/AI&DS/SRRCET

[Link] Precision

The precision metric is used to overcome the limitation of Accuracy. The precision determines the
proportion of positive prediction that was actually correct. It can becalculated as the True Positive or
predictions that are actually true to the total positive predictions (True Positive and False Positive).

[Link] Recall or Sensitivity

It is also similar to the Precision metric; however, it aims to calculate the proportion of actual positive that was
identified incorrectly. It can be calculated as True Positive or predictions that are actually true to the total
number of positives, either correctly predicted as positive or incorrectly predicted as negative (true Positive
and false negative).

[Link] to use Precision and Recall?

Recall determines the performance of a classifier with respect to a false negative, whereas precision gives
information about the performance of a classifier with respect to a false [Link], if we want to minimize
the false negative, then, Recall should be as near to 100%, and if we want to minimize the false positive, then
precision should be close to 100% as possible

10. Define F-Scores

score or F1 Score is a metric to evaluate a binary classification model on the basis of predictions that are made for the
positive class. It is calculated with the help of Precision and Recall. It is a type of single score that represents both
Precision and Recall. So, the F1 Score can be calculated as the harmonic mean of both precision and Recall,
assigning equal weight to each of them.

[Link] AUC-ROC
 Sometimes we need to visualize the performance of the classification model on charts; then, we can use the
AUC-ROC curve. It is one of the popular and important metrics for evaluating the performance of the
classification model.
 Firstly, let's understand ROC (Receiver Operating Characteristic curve) curve. ROC
represents a graph to show the performance of a classification model at different threshold levels.
The curve is plotted between two parameters, which are:

o True Positive Rate

o False Positive Rate

3
4
AD3501/DEEP LEARNING/UNIT IV/AI&DS/SRRCET

[Link] Performance Metrics for Regression

Regression is a supervised learning technique that aims to find the relationships between the dependent and
independent variables. A predictive regression model predicts a numeric or discrete value. The metrics used
for regression are different from the classification metrics.

o Mean Absolute Error

o Mean Squared Error
o R2 Score
o Adjusted R2

[Link] Mean Absolute Error (MAE)

Mean Absolute Error or MAE is one of the simplest metrics, which measures the absolute difference
between actual and predicted values, where absolute means taking a number as Positive.

The below formula is used to calculate MAE:

Here,

Y is the Actual outcome, Y' is the predicted outcome, and N is the total number of data points.

[Link] Mean Squared Error

Mean Squared error or MSE is one of the most suitable metrics for Regression evaluation. It measures the
average of the Squared difference between predicted values and the actual value given by the model.

Moreover, due to squared differences, it penalizes small errors also, and hence it leads to over-estimation of
how bad the model is.

4
5
AD3501/DEEP LEARNING/UNIT IV/AI&DS/SRRCET
[Link] R Squared Score

R squared error is also known as Coefficient of Determination, The R-squared metric enables us to compare
our model with a constant baseline to determine the performance of the model. To select the constant baseline,
we need to take the mean of the data and draw the line at the mean.

The R squared score will always be less than or equal to 1 without concerning if the values are too large or
small.

[Link] is a Baseline Model?

Baseline models serve as a benchmark in an ML application. Their main goal is to put the results of trained
models into context.

• Encoder

• Hidden Vector

• Decoder

[Link] the types of baseline models

Baseline models are divided into three main categories:

• Random Baseline Models: Data in the actual world isn't always reliable. A dummy classifier or
regressor is the optimal baseline model for these issues. This baseline model will inform you if your
machine learning model is learning or not.
• ML Baseline Modes: Now, if the data is predictable, you can create a baseline model which helps us
analyze which features are critical for prediction and which are not. The baseline models are
commonly used with feature engineering.
• Automated ML Baseline Models: It is the ultimate baseline model. It's an excellent model for
comparing your ML model. If your ML model outperforms the automated baseline model, it's a strong
indication that the model has the potential to become a product.
18. Define Hyperparameters

In neural networks, parameters are used to train the model and make predictions. There are two types of
parameters:

Model parameters are internal to the neural network – for example, neuron weights. They are estimated or
learned automatically from training samples. These parameters are also used to make predictions in a
production model.
Hyperparameters are external parameters set by the operator of the neural network – for example, selecting
which activation function to use or the batch size used in training. Hyperparameters have a huge impact
on the accuracy of a neural network, there may be different optimal values for different values, and it is non-
trivial to discover those values. The simplest way to select hyperparameters for a neural network model is
“manual search” – in other words, trial and error. New methods are evolving which use algorithms and
optimization methods to discover the best hyperparameters.

5
6
AD3501/DEEP LEARNING/UNIT IV/AI&DS/SRRCET

[Link] is Hyperparameter Tuning?

A hyperparameter is a parameter of the model whose value influences the learning process and whose value cannot
be estimated from the training data. Hyperparameters are configured externally before starting the model
learning/training process. Hyperparameter tuning is the process of finding the optimal hyperparameters for any given
machine learning algorithm.

PART B REVIEW QUESTIONS

Explain in detail the Performance metrics

How the Baseline models serve as a benchmark in an ML application
Define Hyperparameters
Expalain in detail the Manual Hyperparameter
Expalain in detail the Automatic Hyperparameter
Expalain in detail the Grid search
Expalain in detail the Random search

Expalain in detail the different Debugging strategies.

PART – C Question
How the Baseline models serve as a benchmark in an ML application
Expalain in detail the di 昀昀 erent Debugging strategies.

6
7
AD3501/DEEP LEARNING/UNIT IV/AI&DS/SRRCET

PART B

[Link] In Details About The Performance Metrics

 To evaluate the performance or quality of the model, different metrics are used, and these metrics are
known as performance metrics or evaluation metrics.
 These performance metrics help us understand how well our model has performed for the given data. In
this way, we can improve the model's performance by tuning the hyper-parameters.
 Each ML model aims to generalize well on unseen/new data, and performance metrics help determine
how well the model generalizes on the new dataset.
 In machine learning, each task or problem is divided into classification and Regression. Not all metrics
can be used for all types of problems;

Performance Metrics for Classification

 In a classification problem, the category or classes of data is identified based on training data. The
model learns from the given dataset and then classifies the new data into classes or groups based on the
training.
 It predicts class labels as the output, such as Yes or No, 0 or 1, Spam or Not Spam, etc. To evaluate the
performance of a classification model, different metrics are used, and some of them are as follows:

o Accuracy
o Confusion Matrix
o Precision
o Recall
o F-Score
o AUC(Area Under the Curve)-ROC

I. Accuracy

The accuracy metric is one of the simplest Classification metrics to implement, and it can be
determined as the number of correct predictions to the total number of predictions

When to Use Accuracy?

It is good to use the Accuracy metric when the target variable classes in data are approximately
balanced. For example, if 60% of classes in a fruit image dataset are of Apple, 40% are Mango.
In this case, if the model is asked to predict whether the image is of Apple or Mango, it will give
a prediction with 97% of accuracy.

When not to use Accuracy?

7
8
AD3501/DEEP LEARNING/UNIT IV/AI&DS/SRRCET

It is recommended not to use the Accuracy measure when the target variable majorly belongs to
one class. For example, Suppose there is a model for a disease prediction in which, out of 100
people, only five people have a disease, and 95 people don't have one. In this case, if our model
predicts every person with no disease (which means a bad prediction), the Accuracy measure
will be 95%, which is not correct.

II. Confusion Matrix

A confusion matrix is a tabular representation of prediction outcomes of any binary classifier,

which is used to describe the performance of the classification model on a set of test data when
true values are known.

o In the matrix, columns are for the prediction values, and rows specify the Actual values.
Here Actual and prediction give two possible classes, Yes or No. So, if we are predicting
the presence of a disease in a patient, the Prediction column with Yes means, Patient has
the disease, and for NO, the Patient doesn't have the disease.
o In this example, the total number of predictions are 165, out of which 110 time predicted
yes, whereas 55 times predicted [Link], in reality, 60 cases in which patients don't
have the disease, whereas 105 cases in which patients have the disease.
o True Positive(TP): In this case, the prediction outcome is true, and it is true in reality, also.
o True Negative(TN): in this case, the prediction outcome is false, and it is false in reality, also.
o False Positive(FP): In this case, prediction outcomes are true, but they are false in actuality.
o False Negative(FN): In this case, predictions are false, and they are true in actuality.

III. Precision

The precision metric is used to overcome the limitation of Accuracy. The precision determines
the proportion of positive prediction that was actually correct. It can be calculated as the True
Positive or predictions that are actually true to the total positive predictions (True Positive and
False Positive).

Recall or Sensitivity

It is also similar to the Precision metric; however, it aims to calculate the proportion of actual
positive that was identified incorrectly. It can be calculated as True Positive or predictions that
are actually true to the total number of positives, either correctly predicted as positive or
incorrectly predicted as negative (true Positive and false negative).

8
9
AD3501/DEEP LEARNING/UNIT IV/AI&DS/SRRCET

9
10
AD3501/DEEP LEARNING/UNIT IV/AI&DS/SRRCET

When to use Precision and Recall?

From the above definitions of Precision and Recall, we can say that recall determines the
performance of a classifier with respect to a false negative, whereas precision gives information
about the performance of a classifier with respect to a false positive.

So, if we want to minimize the false negative, then, Recall should be as near to 100%, and if we
want to minimize the false positive, then precision should be close to 100% as possible.

IV. F-Scores

F-score or F1 Score is a metric to evaluate a binary classification model on the basis of

predictions that are made for the positive class. It is calculated with the help of Precision and
Recall. It is a type of single score that represents both Precision and Recall. So, the F1 Score can
be calculated as the harmonic mean of both precision and Recall, assigning equal weight to
each of them.

V. AUC-ROC

Sometimes we need to visualize the performance of the classification model on charts; then, we
can use the AUC-ROC curve. It is one of the popular and important metrics for evaluating the
performance of the classification model.

Firstly, let's understand ROC (Receiver Operating Characteristic curve) curve. ROC represents a
graph to show the performance of a classification model at different threshold levels. The
curve is plotted between two parameters, which are:

o True Positive Rate

o False Positive Rate

AUC calculates the performance across all the thresholds and provides an aggregate measure.
The value of AUC ranges from 0 to 1. It means a model with 100% wrong prediction will have
an AUC of 0.0, whereas models with 100% correct predictions will have an AUC of 1.0.

Performance Metrics for Regression

Regression is a supervised learning technique that aims to find the relationships between the
dependent and independent variables. A predictive regression model predicts a numeric or
discrete value. The metrics used for regression are different from the classification metrics.

o Mean Absolute Error

o Mean Squared Error
o R2 Score

10
11
AD3501/DEEP LEARNING/UNIT IV/AI&DS/SRRCET

o Adjusted R2

I. Mean Absolute Error (MAE)

Mean Absolute Error or MAE is one of the simplest metrics, which measures the absolute
difference between actual and predicted values, where absolute means taking a number as
Positive.

The below formula is used to calculate MAE:

Here,

Y is the Actual outcome, Y' is the predicted outcome, and N is the total number of data points.

II. Mean Squared Error

Mean Squared error or MSE is one of the most suitable metrics for Regression evaluation. It
measures the average of the Squared difference between predicted values and the actual value
given by the model.

Moreover, due to squared differences, it penalizes small errors also, and hence it leads to over-
estimation of how bad the model is.

III. R Squared Score

R squared error is also known as Coefficient of Determination, The R-squared metric enables us to
compare our model with a constant baseline to determine the performance of the model. To
select the constant baseline, we need to take the mean of the data and draw the line at the mean.

The R squared score will always be less than or equal to 1 without concerning if the values are too
large or small.

11
12
AD3501/DEEP LEARNING/UNIT IV/AI&DS/SRRCET

IV. Adjusted R Squared

 Adjusted R squared, as the name suggests, is the improved version of R squared error. R square has a
limitation of improvement of a score on increasing the terms, even though the model is not improving,
and it may mislead the data scientists.
 To overcome the issue of R square, adjusted R squared is used, which will always show a lower value
than R². It is because it adjusts the values of increasing predictors and only shows improvement if there
is a real improvement.

We can calculate the adjusted R squared as follows:

Here,

n is the number of observations

k denotes the number of independent variables

and Ra2 denotes the adjusted R2

`[Link] is a Baseline Model?

 Baseline models serve as a benchmark in an ML application. Their main goal is to put the results
of trained models into context.
 Assume you begin working on a problem statement and complete all of the steps, including EDA, data
cleansing, and feature engineering.
 You now begin working on your model. During model training, you discover that your model's
accuracy is 54%. So, without making much effort, you now have a 54% accuracy level, which is now
your base value.
 You can now tag this as a baseline model, indicating that you will enhance this number after this. If
your model's accuracy level goes below 54% in the future, it means the model requires improvements.

Types of baseline models

Baseline models are divided into three main categories:

 Random Baseline Models: Data in the actual world isn't always reliable. A dummy
classifier or regressor is the optimal baseline model for these issues. This baseline model
will inform you if your machine learning model is learning or not.
 ML Baseline Modes: Now, if the data is predictable, you can create a baseline model
which helps us analyze which features are critical for prediction and which are not. The
baseline models are commonly used with feature engineering.
 Automated ML Baseline Models: It is the ultimate baseline model. It's an excellent
model for comparing your ML model. If your ML model outperforms the automated

12
13
AD3501/DEEP LEARNING/UNIT IV/AI&DS/SRRCET

baseline model, it's a strong indication that the model has the potential to become a
product.

Benefits of baseline models

Understand your data

The key advantage of employing the baseline model is that it aids in data comprehension:

 Analyze observations that are challenging to categorize: With the help of a baseline
model, you'll be able to figure out which observations are difficult to categorize.
 Analyze the different classes: Likewise, if you're focusing on a multi-class regression
issue, a baseline model might show you which classes are simple to classify and which
are tough to classify.
 Detect data with low signal strength: A weak signal or low fitting might be
indicated by a baseline model with no or little prediction.

Faster iteration

Baseline models also help improve the efficiency with which you can build the models.

 Increase speed and performance: With a baseline model in place, you will have
detailed information on what to improve and develop. This makes it easy to see if the
changes you're making to your model are improving metrics or not. This enables you to
quickly discover initiatives that can enhance your KPIs.
 Efficiency: If you build a baseline model, the amount of work you have to do on current
projects may reduce, allowing you to focus on other projects. The baseline model
facilitates efficiency and productivity.

Performance benchmark

Baseline models provide a suitable standard against which you can evaluate your real models.

 Some performance measures, such as logarithmic loss, are helpful to evaluate amongst
models than to assess individually. This is due to the fact that many performance
measurements lack a specified scale and instead take on varying values based on the
result variable's range. This can assist you in determining when a sophisticated model is
required vs when simple business logic is adequate.
 Calculate the impact on key business parameters. Creating a simple baseline model can
also help you see what type of influence you might have on company indicators. This is
particularly true if your baseline model is stochastic as well.

13
14

[Link] THE STRUCTURE OF HYPERPARAMETERS

 In neural networks, parameters are used to train the model and make predictions. There are two types of
parameters:

Model parameters are internal to the neural network – for example, neuron weights. They are
estimated or learned automatically from training samples. These parameters are also used to
make predictions in a production model.
Hyperparameters are external parameters set by the operator of the neural network – for example,
selecting which activation function to use or the batch size used in training. Hyperparameters
have a huge impact on the accuracy of a neural network, there may be different optimal values
for different values, and it is non-trivial to discover those values.
The simplest way to select hyperparameters for a neural network model is “manual search” – in
other words, trial and error. New methods are evolving which use algorithms and optimization
methods to discover the best hyperparameters.

What is Hyperparameter Tuning?

A hyperparameter is a parameter of the model whose value influences the learning process and whose
value cannot be estimated from the training data. Hyperparameters are configured externally before
starting the model learning/training process. Hyperparameter tuning is the process of finding the optimal
hyperparameters for any given machine learning algorithm.

List of Common Hyperparameters

Hyperparameters related to neural network structure

1. Number of hidden layers – adding more hidden layers of neurons generally improves
accuracy, to a certain limit which can differ depending on the problem.

2. Dropout – what percentage of neurons should be randomly “killed” during each epoch to prevent
overfitting.

3. Activation function – which function should be used to process the inputs flowing into each
neuron. The activation function can impact the network’s ability to converge and learn for
different ranges of input values, and also its training speed.

4. Weights initialization – it is necessary to set initial weights for the first forward pass. Two
basic options are to set weights to zero or to randomize them. However, this can result in a
vanishing or exploding gradient, which will make it difficult to train the model. To mitigate this
problem, you can use a heuristic (a formula tied to the number of neuron layers) to determine the
weights. A common heuristic used for the Tanh activation is called Xavier initialization.
15

Hyperparameters related to training algorithm

1. Learning rate – how fast the backpropagation algorithm performs gradient descent. A lower
learning rate makes the network train faster but might result in missing the minimum of the loss
function.
2. Epoch, iterations and batch size – these parameters determine the rate at which samples are
fed to the model for training. An epoch is a group of samples which are passed through the
model together (forward pass) and then run through backpropagation (backward pass) to
determine their optimal weights. If the epoch cannot be run all together due the size of the
sample or complexity of the network, it is split into batches, and the epoch is run in two or more
iterations. The number of epochs and batches per epoch can significantly affect model fit, as
shown below.
3. Optimizer algorithm – when a neural network trains, it uses an algorithm to determine the
optimal weights for the model, called an optimizer. The basic option is Stochastic Gradient
Descent, but there are other options.
4. Momentum— Another common algorithm is Momentum, which works by waiting after a
weight is updated, and updating it a second time using a delta amount. This speeds up training
gradually, with a reduced risk of oscillation. Other algorithms are Nesterov Accelerated
Gradient, AdaDelta and Adam.

Manual Hyperparameter Tuning

 Traditionally, hyperparameters were tuned manually by trial and error. This is still commonly done, and
experienced operators can “guess” parameter values that will achieve very high accuracy for deep
learning models.
 However, there is a constant search for better, faster and more automatic methods to optimize
hyperparameters. Pros: Very simple and effective with skilled operators Cons: Not scientific, unknown
if you have fully optimized hyperparameters

Automated hyperparameter tuning:

 In automated hyperparameter tuning, the optimal set of hyperparameters is found by using an algorithm.

 An automatic hyperparameter tuning technique involves methods in which the user defines a set of
hyperparameter combinations or a range for each hyperparameter, and the tuning algorithm runs the trials to
find the optimal set of hyperparameters for the model.

[Link] in details about GRID SEARCH method

Grid Search
 Grid search is slightly more sophisticated than manual tuning. It involves systematically testing
multiple values of each hyperparameter, by automatically retraining the model for each value of the
parameter.
 For example, you can perform a grid search for the optimal batch size by automatically training the
model for batch sizes between 10-100 samples, in steps of 20.
 The model will run 5 times and the batch size selected will be the one which yields highest accuracy.
Pros: Maps out the problem space and provides more opportunity for optimization Cons: Can be slow
to run for large numbers of hyperparameter values
 Grid-search is used to find the optimal hyperparameters of a model which results in the most ‘accurate’
predictions.
 Grid search is the simplest algorithm for hyperparameter tuning. Basically, we divide the domain of the
16

hyperparameters into a discrete grid.

 Then, we try every combination of values of this grid, calculating some performance metrics using
cross-validation.
 The point of the grid that maximizes the average value in cross-validation, is the optimal combination
of values for the hyperparameters.

Example of a grid search

 Grid search is an exhaustive algorithm that spans all the combinations, so it can actually find the best
point in the domain.
 The great drawback is that it’s very slow. Checking every combination of the space requires a lot of
time that, sometimes, is not available.
 Don’t forget that every point in the grid needs k- fold cross-validation, which requires k training steps.
So, tuning the hyperparameters of a model in this way can be quite complex and expensive.
 However, if we look for the best combination of values of the hyperparameters, grid search is a very
good idea.

[Link] in details about Random Search method

Random Search
 According to a 2012 research study by James Bergstra and Yoshua Bengio, testing randomized values
of hyperparameters is actually more effective than manual search or grid search.
 In other words, instead of testing systematically to cover “promising areas” of the problem space, it is
preferable to test random values drawn from the entire problem space.
 Pros: According to the study, provides higher accuracy with less training cycles, for problems with
high dimensionality
 Cons: Results are unintuitive, difficult to understand “why” hyperparameter values were chosen
 Random search is similar to grid search, but instead of using all the points in the grid, it tests only a
randomly selected subset of these points.
 The smaller this subset, the faster but less accurate the optimization. The larger this dataset, the more
accurate the optimization but the closer to a grid search.
17

Example of random search

 Random search is a very useful option when you have several hyperparameters with a fine-grained grid
of values.
 Using a subset made by 5-100 randomly selected points, we are able to get a reasonably good set of
values of the hyperparameters. It will not likely be the best point, but it can still be a good set of values
that gives us a good model.

[Link] in detail the different Debugging [Link] strategies

 Debugging is the process of identifying and resolving errors, or bugs, in a software system. It is an
important aspect of software engineering because bugs can cause a software system to malfunction,
and can lead to poor performance or incorrect results.
 Debugging can be a time- consuming and complex task, but it is essential for ensuring that a
software system is functioning correctly.

Pay Attention to Error Messages

Error messages aren’t just there as an annoyance — they can actually tell we exactly what the
problem
with our software. So, when an error message pops up, make sure we read it, because it can
give we
lot of insight into what’s going on with the
product.
we’re not sure what the error message means, try searching for it online. Chances are someone
else
has encountered the same problem in the past and could know precisely how to fix
it.
Leverage a Debugger

 A debugger, also known as a debugging tool or debugging mode, can be used to easily identify and
correct bugs.
 To effectively leverage the tool, we’ll need to run our program within the debugger, which allows we
to monitor it in real-time and see the error when it occurs.
 We can pause the program while it’s running to pinpoint and investigate any issues that are occurring
and review our code line by line.
18

2. Log Everything
 Make sure we’re logging every issue we encounter, as well as steps we take to address them and
ensure our program is running correctly.
 Once we’ve documented the error, we can start mapping out potential scenarios and solutions. We
should keep track of all possible steps to take and the information we need to make a decision
regarding our errors.
 This will also allow we to navigate different potential solutions.
3. Localize the Problem
 The method of problem localization entails removing pieces of code line by line until we find the issue
that is interfering with our program.
 While this is a somewhat painstaking and involved way of identifying the error that’s taking place, it
can be highly effective in determining what, exactly, is going wrong with our product.
 Of course, we’ll need to keep repeating the process until we’ve tracked down the bugs.
4. Try to Replicate the Problem
 By replicating the problem, we’ll find out what the nature of the problem is precisely. In fact, this can
lead to us creating better, cleaner code in general since we’re exercising the critical thinking skills
required to find the cause of an issue.
 This, of course, demands a thorough investigation of the ins and outs of the product. But once we’ve
successfully reproduced the error that’s interfering with our product’s performance, usability, or
functionality, fixing the problem should require far less time.
 In fact, most of the time, replicating the issue is the hard work, while resolving it takes only minutes.
5. Turn to the Community
 It’s highly likely that any error we encounter is one others have encountered before we. It can be
very helpful to turn to a community associated with the language, framework, or another development
tool we’re using to find a solution for addressing the bug we’ve encountered.
 Many development tools, such as languages like Python and frameworks like Ruby on Rails, have
huge, thriving communities, offering an abundance of support to developers within them.
6. Test, Test, and Test Again
 The best way to spot bugs and successfully resolve them before they derail our app — is by
repeatedly testing the product.
 While the QA team will more thoroughly vet the product, developers themselves can script simple
tests during the development phase, such as unit testing, which involves individually testing
different pieces of the code — units.

There are several common methods and techniques used in debugging, including:
1. Code Inspection: This involves manually reviewing the source code of a software
system to identify potential bugs or errors.
2. Debugging Tools: There are various tools available for debugging such as debuggers,
trace tools, and profilers that can be used to identify and resolve bugs.
3. Unit Testing: This involves testing individual units or components of a software system to
identify bugs or errors.
4. Integration Testing: This involves testing the interactions between different
components of a software system to identify bugs or errors.
5. System Testing: This involves testing the entire software system to identify bugs or errors.
6. Monitoring: This involves monitoring a software system for unusual behaviour or
performance issues that can indicate the presence of bugs or errors.
7. Logging: This involves recording events and messages related to the software system, which
can be used to identify bugs or errors.
Debugging Process: Steps involved in debugging are:
 Problem identification and report preparation.
 Assigning the report to the software engineer defect to verify that it is genuine.
 Defect Analysis using modelling, documentation, finding and testing candidate flaws, etc.
 Defect Resolution by making required changes to the system.
 Validation of corrections.
The debugging process will always have one of two outcomes:
1. The cause will be found and corrected.
19

2. The cause will not be found.

Debugging Approaches/Strategies:
1. Brute Force: Study the system for a larger duration in order to understand the system. It
helps the debugger to construct different representations of systems to be debugged
depending on the need. A study of the system is also done actively to find recent changes
made to the software.
2. Backtracking: Backward analysis of the problem which involves tracing the program
backward from the location of the failure message in order to identify the region of faulty
code. A detailed study of the region is conducted to find the cause of defects.
3. Forward analysis of the program involves tracing the program forwards using breakpoints
or print statements at different points in the program and studying the results. The region
where the wrong outputs are obtained is the region that needs to be focused on to find the
defect.
4. Using past experience with the software debug the software with similar problems in
nature. The success of this approach depends on the expertise of the debugger.
5. Cause elimination: it introduces the concept of binary partitioning. Data related to
the error occurrence are organized to isolate potential causes.
6. Static analysis: Analysing the code without executing it to identify potential bugs or
errors. This approach involves analysing code syntax, data flow, and control flow.
7. Dynamic analysis: Executing the code and analysing its behaviour at runtime to identify
errors or bugs. This approach involves techniques like runtime debugging and profiling.
8. Collaborative debugging: Involves multiple developers working together to debug a
system. This approach is helpful in situations where multiple modules or components are
involved, and the root cause of the error is not clear.
9. Logging and Tracing: Using logging and tracing tools to identify the sequence of events
leading up to the error. This approach involves collecting and analysing logs and traces
generated by the system during its execution.
10. Automated Debugging: The use of automated tools and techniques to assist in the
debugging process. These tools can include static and dynamic analysis tools, as well as
tools that use machine learning and artificial intelligence to identify errors and suggest fixes.
Debugging Tools:
Debugging tool is a computer program that is used to test and debug other programs. A lot
of public domain software like gdb and dbx are available for debugging. They offer console-
based command-line interfaces. Examples of automated debugging tools include code-based
tracers, profilers, interpreters, etc. Some of the widely used debuggers are:
 Radare2
 WinDbg
 Valgrind

Difference between Debugging and Testing:

 Debugging is different from testing. Testing focuses on finding bugs, errors, etc whereas debugging
starts after a bug has been identified in the software.

 Testing is used to ensure that the program is correct and it was supposed to do with a certain
minimum success rate.

 Testing can be manual or automated. There are several different types of testing unit testing,
integration testing, alpha, and beta testing, etc. Debugging requires a lot of knowledge, skills, and
expertise.

 It can be supported by some automated tools available but is more of a manual process as every bug
is different and requires a different technique, unlike a pre-defined testing mechanism.

Advantages of Debugging:
20

Several advantages of debugging in software engineering:

1. Improved system quality: By identifying and resolving bugs, a software system can be
made more reliable and efficient, resulting in improved overall quality.
2. Reduced system downtime: By identifying and resolving bugs, a software system can be
made more stable and less likely to experience downtime, which can result in improved
availability for users.
3. Increased user satisfaction: By identifying and resolving bugs, a software system can
be made more user-friendly and better able to meet the needs of users, which can result in
increased satisfaction.
4. Reduced development costs: By identifying and resolving bugs early in the development
process, it can save time and resources that would otherwise be spent on fixing bugs later
in the development process or after the system has been deployed.
5. Increased security: By identifying and resolving bugs that could be exploited by
attackers, a software system can be made more secure, reducing the risk of security
breaches.
6. Facilitates change: With debugging, it becomes easy to make changes to the
software as it becomes easy to identify and fix bugs that would have been caused by
the changes.
7. Better understanding of the system: Debugging can help developers gain a better
understanding of how a software system works, and how different components of the
system interact with one another.
8. Facilitates testing: By identifying and resolving bugs, it makes it easier to test the
software and ensure that it meets the requirements and specifications.

Disadvantages of Debugging

While debugging is an important aspect of software engineering, there are also some disadvantages
to consider:

1. Time-consuming: Debugging can be a time-consuming process, especially if the bug is

difficult to find or reproduce. This can cause delays in the development process and add to
the overall cost of the project.
2. Requires specialized skills: Debugging can be a complex task that requires specialized
skills and knowledge. This can be a challenge for developers who are not familiar with the
tools and techniques used in debugging.
3. Can be difficult to reproduce: Some bugs may be difficult to reproduce, which can
make it challenging to identify and resolve them.
4. Can be difficult to diagnose: Some bugs may be caused by interactions between different
components of a software system, which can make it challenging to identify the root cause
of the problem.
5. Can be difficult to fix: Some bugs may be caused by fundamental design flaws or
architecture issues, which can be difficult or impossible to fix without significant
changes to the software system.
6. Limited insight: In some cases, debugging tools can only provide limited insight into the
problem and may not provide enough information to identify the root cause of the problem.
7. Can be expensive: Debugging can be an expensive process, especially if it requires
additional resources such as specialized debugging tools or additional development
time.

[Link] about Bayesian Optimization

 Bayesian optimization is a technique which tries to approximate the trained model with different
possible hyperparameter values.
 To simplify, bayesian optimization trains the model with different hyperparameter values, and observes
the function generated for the model by each set of parameter values.
 It does this over and over again, each time selecting hyperparameter values that are slightly different
and can help plot the next relevant segment of the problem space.
21

 Similar to sampling methods in statistics, the algorithm ends up with a list of possible hyperparameter
value sets and model functions, from which it predicts the optimal function across the entire problem
set.
 Pros: The original study and practical experience from the industry shows that bayesian optimization
results in significantly higher accuracy compared to random search.
 Cons: Like random search, results are not intuitive and difficult to improve on, even by trained
operators.
 The Bayesian optimization method takes a different approach. This method treats the search for the
optimal hyperparameters as an optimization problem.
 When choosing the next hyperparameter combination, this method considers the previous evaluation
results.
 It then applies a probabilistic function to select the combination that will probably yield the best results.
 This method discovers a fairly good hyperparameter combination in relatively few iterations.
 Data scientists choose a probabilistic model when the objective function is unknown. That is, there is
no analytical expression to maximize or minimize.
 The data scientists apply the learning algorithm to a data set, use the algorithm’s results to define the
objective function, and take the various hyperparameter combinations as the input [Link]
probabilistic model is based on past evaluation results.
 It estimates the probability of a hyperparameter combination’s objective function result:

P( result | hyperparameters )

This probabilistic model is a “surrogate” of the objective function. The objective function can be, for
instance, the root-mean-square error (RMSE). We calculate the objective function using the training data
with the hyperparameter combination. We try to optimize it (maximize or minimize, depending on the
objective function selected). Applying the probabilistic model to the hyperparameters is computationally
inexpensive compared to the objective function, so this method typically updates and improves the surrogate
probability model every time the objective function runs. Better hyperparameter predictions decrease the
number of objective function evaluations we need to achieve a good [Link] processes, random forest
regression, and tree-structured Parzen estimators (TPE) are surrogate model examples.

Deep Learning Model Evaluation Metrics
No ratings yet
Deep Learning Model Evaluation Metrics
11 pages
Deep Learning Model Evaluation Metrics
No ratings yet
Deep Learning Model Evaluation Metrics
31 pages
Model Evaluation and Performance Metrics
No ratings yet
Model Evaluation and Performance Metrics
16 pages
Unit 4
No ratings yet
Unit 4
15 pages
Hyperparameter Tuning and Overfitting
No ratings yet
Hyperparameter Tuning and Overfitting
17 pages
DL 1
No ratings yet
DL 1
14 pages
Ad3501 DL Unit 4
No ratings yet
Ad3501 DL Unit 4
27 pages
Machine Learning Model Evaluation Metrics
No ratings yet
Machine Learning Model Evaluation Metrics
40 pages
Model Evaluation and Performance Metrics
No ratings yet
Model Evaluation and Performance Metrics
15 pages
Machine Learning Model Training & Testing
No ratings yet
Machine Learning Model Training & Testing
23 pages
Machine Learning Performance Metrics Guide
No ratings yet
Machine Learning Performance Metrics Guide
19 pages
Model Validation and Interpretability Techniques
No ratings yet
Model Validation and Interpretability Techniques
33 pages
Deep Learning Model Evaluation Metrics
No ratings yet
Deep Learning Model Evaluation Metrics
10 pages
Machine Learning Evaluation Metrics Guide
No ratings yet
Machine Learning Evaluation Metrics Guide
43 pages
Model Evaluation Techniques in ML
No ratings yet
Model Evaluation Techniques in ML
20 pages
Unit 2 Machine Learning
No ratings yet
Unit 2 Machine Learning
46 pages
Machine Learning Performance Metrics Guide
No ratings yet
Machine Learning Performance Metrics Guide
24 pages
Key Performance Metrics in ML
No ratings yet
Key Performance Metrics in ML
12 pages
Machine Learning Performance Metrics Guide
No ratings yet
Machine Learning Performance Metrics Guide
24 pages
Key Performance Metrics for ML Models
No ratings yet
Key Performance Metrics for ML Models
43 pages
Machine Learning Performance Metrics Guide
No ratings yet
Machine Learning Performance Metrics Guide
50 pages
Intro to Model Evaluation Metrics
No ratings yet
Intro to Model Evaluation Metrics
24 pages
Machine Learning Performance Metrics Guide
No ratings yet
Machine Learning Performance Metrics Guide
28 pages
Performance Metrics
No ratings yet
Performance Metrics
6 pages
Machine Learning Performance Metrics Guide
No ratings yet
Machine Learning Performance Metrics Guide
24 pages
Machine Learning Model Evaluation Techniques
No ratings yet
Machine Learning Model Evaluation Techniques
11 pages
Machine Learning Performance Metrics Guide
No ratings yet
Machine Learning Performance Metrics Guide
30 pages
Performance Metrics for ML Models
No ratings yet
Performance Metrics for ML Models
6 pages
Performance Metrics in Machine Learning
No ratings yet
Performance Metrics in Machine Learning
19 pages
Chapter-7 - Evaluation
No ratings yet
Chapter-7 - Evaluation
4 pages
Evaluation Metrics in Machine Learning
No ratings yet
Evaluation Metrics in Machine Learning
6 pages
Module 6 - Evaluation Metrics
No ratings yet
Module 6 - Evaluation Metrics
23 pages
Key Evaluation Metrics for ML Models
No ratings yet
Key Evaluation Metrics for ML Models
6 pages
Performance Metrics for ML Algorithms
No ratings yet
Performance Metrics for ML Algorithms
13 pages
Evaluation Metrics for Deep Learning Models
No ratings yet
Evaluation Metrics for Deep Learning Models
20 pages
23AD1401 Machine Learning Unit 5
No ratings yet
23AD1401 Machine Learning Unit 5
44 pages
Evaluating Machine Learning Metrics
No ratings yet
Evaluating Machine Learning Metrics
2 pages
Lesson 7 Model Evaluation and Performance Metrics
No ratings yet
Lesson 7 Model Evaluation and Performance Metrics
10 pages
Key Metrics for ML & DL Performance
No ratings yet
Key Metrics for ML & DL Performance
8 pages
Evaluating Machine Learning Models
No ratings yet
Evaluating Machine Learning Models
13 pages
Evaluating Machine Learning in Healthcare
No ratings yet
Evaluating Machine Learning in Healthcare
34 pages
Performance Metrics in Deep Learning
100% (1)
Performance Metrics in Deep Learning
36 pages
Module 5
No ratings yet
Module 5
10 pages
Logistic Regression and Classification Metrics
No ratings yet
Logistic Regression and Classification Metrics
7 pages
Infosys PDF
No ratings yet
Infosys PDF
16 pages
Understanding Machine Learning Metrics
No ratings yet
Understanding Machine Learning Metrics
32 pages
Evaluating Machine Learning Performance
No ratings yet
Evaluating Machine Learning Performance
42 pages
Evaluating Machine Learning Models
No ratings yet
Evaluating Machine Learning Models
46 pages
AI Model Evaluation Metrics Guide
No ratings yet
AI Model Evaluation Metrics Guide
28 pages
Evaluation Metrics for Machine Learning
No ratings yet
Evaluation Metrics for Machine Learning
14 pages
Evaluating AI Model Performance Metrics
No ratings yet
Evaluating AI Model Performance Metrics
51 pages
23AD1401 - ML Unit 5 Notes
No ratings yet
23AD1401 - ML Unit 5 Notes
26 pages
Machine Learning Performance Metrics Guide
No ratings yet
Machine Learning Performance Metrics Guide
8 pages
Model Evaluation Metrics Explained
No ratings yet
Model Evaluation Metrics Explained
3 pages
Types of Machine Learning Models Explained
No ratings yet
Types of Machine Learning Models Explained
48 pages
AI Model Evaluation Metrics Explained
No ratings yet
AI Model Evaluation Metrics Explained
36 pages
Key Metrics for Regression and Classification
No ratings yet
Key Metrics for Regression and Classification
1 page
Probabilistic Reasoning in AI
No ratings yet
Probabilistic Reasoning in AI
48 pages
Security Testing in Software Development
No ratings yet
Security Testing in Software Development
43 pages
Common Techniques in Secure Software Design
No ratings yet
Common Techniques in Secure Software Design
39 pages
Search Strategies for AI Problems
No ratings yet
Search Strategies for AI Problems
31 pages
Python AI Algorithms Implementation
No ratings yet
Python AI Algorithms Implementation
27 pages
Machine Learning for Digital Soil Mapping
No ratings yet
Machine Learning for Digital Soil Mapping
18 pages
1 s2.0 S1959031822000586 Main
No ratings yet
1 s2.0 S1959031822000586 Main
8 pages
Hyperparameter Tuning for ML Models
No ratings yet
Hyperparameter Tuning for ML Models
17 pages
Quranic Verse Authentication via BERT
No ratings yet
Quranic Verse Authentication via BERT
8 pages
Machine Learning in Biocatalysis Review
No ratings yet
Machine Learning in Biocatalysis Review
26 pages
DRL Algorithms for Dynamic Hedging
No ratings yet
DRL Algorithms for Dynamic Hedging
9 pages
ADADELTA: Adaptive Learning Rates
No ratings yet
ADADELTA: Adaptive Learning Rates
6 pages
Hyperparameter Tuning For Machine Learning Models
No ratings yet
Hyperparameter Tuning For Machine Learning Models
14 pages
Machine Learning for Hydraulic System Classification
No ratings yet
Machine Learning for Hydraulic System Classification
4 pages
Incremental Learning for Polymorphic Attack Detection
100% (1)
Incremental Learning for Polymorphic Attack Detection
47 pages
Building Blocks of Deep Neural Networks
No ratings yet
Building Blocks of Deep Neural Networks
21 pages
ML Models for Eco-Friendly Concrete Properties
No ratings yet
ML Models for Eco-Friendly Concrete Properties
17 pages
Automated Machine Learning Overview
No ratings yet
Automated Machine Learning Overview
34 pages
Gender Classification Model Assignment
No ratings yet
Gender Classification Model Assignment
3 pages
Machine Learning for Portfolio Optimization
No ratings yet
Machine Learning for Portfolio Optimization
15 pages
Hyperparameter Tuning with GridSearchCV
No ratings yet
Hyperparameter Tuning with GridSearchCV
9 pages
Introduction to TensorFlow and Keras
No ratings yet
Introduction to TensorFlow and Keras
10 pages
Sagar Paper Content
No ratings yet
Sagar Paper Content
7 pages
Smart Grid Stability with AAO-BiLSTM Model
No ratings yet
Smart Grid Stability with AAO-BiLSTM Model
19 pages
Machine Learning in China's Stock Market
No ratings yet
Machine Learning in China's Stock Market
19 pages
Automated Hyperparameter Tuning in SAS
No ratings yet
Automated Hyperparameter Tuning in SAS
23 pages
Nearest Neighbor Classifiers Explained
No ratings yet
Nearest Neighbor Classifiers Explained
29 pages
Visual Gesture to Speech Converter Project
No ratings yet
Visual Gesture to Speech Converter Project
76 pages
Deep Learning Concepts and Applications
No ratings yet
Deep Learning Concepts and Applications
5 pages
Harmful Content Detection
No ratings yet
Harmful Content Detection
1 page
Evaluating Regression Model Quality
No ratings yet
Evaluating Regression Model Quality
28 pages
Multi-Objective Optimization in Residential Buildings
No ratings yet
Multi-Objective Optimization in Residential Buildings
18 pages
Fine-Tune gpt-oss-20b with Unsloth
No ratings yet
Fine-Tune gpt-oss-20b with Unsloth
4 pages
Hyperparameter Tuning in Deep RL
No ratings yet
Hyperparameter Tuning in Deep RL
12 pages
Practical Methodology in Deep Learning
No ratings yet
Practical Methodology in Deep Learning
25 pages

Deep Learning Model Evaluation Metrics

Uploaded by

Deep Learning Model Evaluation Metrics

Uploaded by

1

AD3501/DEEP LEARNING/UNIT IV/AI&DS/SRRCET

SRI RAJA RAAJAN COLLEGE OF ENGINEERING AND TECHNOLOGY

Performance metrics -- Baseline Models -- Hyperparameters: Manual

PREPARED BY VERIFIED BY APPROVED BY

[Link] CHITHRA HOD DEAN

[Link] In Details About Performance Metrics

[Link] machine learning, how each task or problem is divided into

[Link] In Details About Performance Metrics for Classification

[Link] to Use Accuracy?

[Link] Confusion Matrix

[Link] Recall or Sensitivity

[Link] to use Precision and Recall?

10. Define F-Scores

o True Positive Rate

[Link] Performance Metrics for Regression

o Mean Absolute Error

[Link] Mean Absolute Error (MAE)

The below formula is used to calculate MAE:

[Link] Mean Squared Error

[Link] is a Baseline Model?

[Link] the types of baseline models

Baseline models are divided into three main categories:

[Link] is Hyperparameter Tuning?

PART B REVIEW QUESTIONS

Explain in detail the Performance metrics

Expalain in detail the different Debugging strategies.

[Link] In Details About The Performance Metrics

Performance Metrics for Classification

When to Use Accuracy?

When not to use Accuracy?

II. Confusion Matrix

A confusion matrix is a tabular representation of prediction outcomes of any binary classifier,

When to use Precision and Recall?

F-score or F1 Score is a metric to evaluate a binary classification model on the basis of

o True Positive Rate

Performance Metrics for Regression

o Mean Absolute Error

I. Mean Absolute Error (MAE)

The below formula is used to calculate MAE:

II. Mean Squared Error

III. R Squared Score

IV. Adjusted R Squared

We can calculate the adjusted R squared as follows:

n is the number of observations

k denotes the number of independent variables

and Ra2 denotes the adjusted R2

`[Link] is a Baseline Model?

Types of baseline models

Baseline models are divided into three main categories:

Benefits of baseline models

Understand your data

[Link] THE STRUCTURE OF HYPERPARAMETERS

What is Hyperparameter Tuning?

List of Common Hyperparameters

Hyperparameters related to neural network structure

Hyperparameters related to training algorithm

Manual Hyperparameter Tuning

Automated hyperparameter tuning:

[Link] in details about GRID SEARCH method

hyperparameters into a discrete grid.

Example of a grid search

[Link] in details about Random Search method

Example of random search

[Link] in detail the different Debugging [Link] strategies

Pay Attention to Error Messages

2. The cause will not be found.

Difference between Debugging and Testing:

Several advantages of debugging in software engineering:

1. Time-consuming: Debugging can be a time-consuming process, especially if the bug is

[Link] about Bayesian Optimization

You might also like