0% found this document useful (0 votes)

15 views43 pages

Supervised Learning Model Evaluation Techniques

The document discusses various methods for training and evaluating supervised learning models, including holdout, k-fold cross-validation, and bootstrap sampling. It explains the concepts of eager and lazy learning, underfitting and overfitting, and the bias-variance trade-off, emphasizing the importance of model performance evaluation through metrics like accuracy, sensitivity, specificity, precision, and recall. Additionally, it highlights the significance of confusion matrices and Kappa values in assessing model predictions.

Uploaded by

sksasireddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views43 pages

Supervised Learning Model Evaluation Techniques

Uploaded by

sksasireddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIT – II

CHAPTER - I

Modelling
&
Evaluation

1
TRAINING A MODEL (FOR
SUPERVISED LEARNING)
1. Holdout Method

2
Drawback

The division of data of different classes into the training

and test data may not be proportionate.

 This problem can be solved by applying stratiﬁed random

sampling.
 By using this technique, the whole data is broken into
several homogenous groups or strata and a random
sample is selected from each such stratum.
 This ensures that the generated random partitions have
equal proportions of each class.
3
K-fold Cross-validation method
A special variant of holdout method, called repeated holdout

 In this technique, several random holdouts are used to measure the

model performance.
 In the end, the average of all performances is taken.
 As multiple holdouts have been drawn, the training and test data are
more likely to contain representative data from all classes and
resemble the original input data closely.
 This process of repeated holdout is the basis of k -fold cross-
validation technique. In k-fold cross-validation, the data set is divided
into k- completely distinct or non-overlapping random partitions
called folds.

4
Overall approach for K-fold cross-validation

5
The value of ‘k’ in k-fold cross-validation can be set to any number.
However, there are two approaches which are extremely popular:
1. 10-fold cross-validation (10-fold CV)
2. Leave-one-out cross-validation (LOOCV)
10-fold cross-validation approach:
In this approach, for each of the 10-folds, each comprising of
approximately 10% of the data, one of the folds is used as the test
data for validating model performance trained based on the
remaining 9 folds (or 90% of the data). This is repeated 10 times,
once for each of the 10 folds being used as the test data and the
remaining folds as the training data.
 The average performance across all folds is being reported.

 Figure depicts the detailed approach of selecting the ‘k’ folds in k-fold cross-
validation.

 As can be observed in the f igure, each of the circles resembles a record in

the input data set whereas the different colours indicate the different
classes that the records belong to.

 The entire data set is broken into ‘k’ folds – out of which one fold is selected
in each iteration as the test data set.

 The fold selected as test data set in each of the ‘k’ iterations is different. As
shown in f igure, the circles resemble the records in the input data set, the
contiguous circles represented as folds are not subsequent records in the
data set. The records in a fold are drawn by using random sampling
technique.
7
Detailed approach for fold selection

8
Leave-one-out cross-validation (LOOCV) is an extreme case of k -fold cross-
validation using one record or data instance at a time as a test data. This is done
to maximize the count of data used to train the model. It is obvious. that the
number of iterations for which it has to be run is equal to the total number of data
in the input data set. Hence, obviously, it is computationally very expensive and
not used much in practice.

 Bootstrap sampling: It is used to identify training and test data sets from the input data
set.
 It uses the technique of Simple Random Sampling with Replacement (SRSWR)
 This technique randomly picks data instances from the input data set, with the
possibility of the same data instance to be picked multiple times.
 This means that from the input data set having ‘n’ data instances, bootstrapping can
create one or more training data sets having ‘ n ’ data instances, some of the data
instances being repeated multiple times. This technique is useful in case of input data
sets of small size, i.e. having very less number of data instances.

9
10
11
Lazy vs. Eager learner
• Eager learning:
 It tries to construct an input- independent target function during the model
training phase.
 It comes up with a trained model at the end of the learning phase.

 Hence, when the test data comes in for classif ication, the eager learner is ready
with the model and doesn’t need to refer back to the training data.
 It takes more time in the learning phase than the lazy learners.

 Some of the algorithms which adopt eager learning approach include Decision
Tree, Support Vector Machine, Neural Network, etc.

12
Lazy learning:
 It doesn’t ‘learn’ anything.

 It uses the training data in exact, and uses the knowledge to classify the
unlabelled test data.
 Since lazy learning uses training data as-is, it is also known as rote learning (i.e.
memorization technique based on repetition).
 Due to its heavy dependency on the given training data instance, it is also known
as instance learning.
 They are also called non-parametric learning.

 Lazy learners take very little time in training because not much of training
actually happens.
 One of the most popular algorithm for lazy learning is k - nearest neighbor.

13
MODEL REPRESENTATION AND INTERPRETABILITY
• The goal of supervised machine learning is to learn or derive a
target function which can best determine the target variable from
the set of input variables.

• A key consideration in learning the target function from the training

data is the extent of generalization. This is because the input data
is just a limited, specif ic view and the new, unknown data in the test
data set may be differing quite a bit from the training data.

• Fitness of a target function approximated by a learning algorithm

determines how correctly it is able to classify a set of data it has
never seen.

14
Underfitting
• If the target function is kept too simple, it may not be able to
capture the essential nuances and represent the underlying data
well.
• A typical case of underfitting may occur when trying to represent a
non-linear data with a linear model as demonstrated by both cases
of underfitting shown in figure.
• Many times underf itting happens due to unavailability of suf ficient
training data. Underf itting results in both poor performance with
train in g data as well as poor gen eralization to test data.
Underfitting can be avoided by
1. using more training data
2. reducing features by effective feature selection
15
Underfitting and Overfitting of models

16
Overﬁtting
 The model has been designed in such a way that it matches the training data too
closely.
 It has an impact on the performance of the model on the test data.
 Overf it ting occur as a result of trying to f it an excessively complex model to
closely match the training data.
 The target function tries to make sure all training data points are correctly
partitioned by the decision boundary.
 This exact nature is not replicated in the unknown test data set. Hence, the
target function results in wrong classif ication in the test data set. Overf it ting
results in good performance with training data set, but poor performance with
test data set. Overf itting can be avoided by 1. using re-sampling techniques like k
-fold cross validation. [Link] back of a validation data set 3. remove the nodes
which have little or no predictive power for the given machine learning problem.

17
Bias – variance trade-off
• In supervised learning, the class value assigned by the learning model
built based on the training data may differ from the actual class value.
This error in learning can be of two types – errors due to ‘bias’ and error
due to ‘variance’.

Errors due to ‘Bias’ :

Arise from simplifying assumptions made by the model to make the target
function less complex or easier to learn. In short, it is due to underﬁtting of
the model. Parametric models generally have high bias making them
easier to understand/interpret and faster to learn. These algorithms have
a poor performance on data sets, which are complex in nature and do not
align with the simplifying assumptions made by the algorithm. Underﬁtting
results in high bias.
18
Errors due to ‘Variance’

• Errors due to variance occur from difference in training data

sets used to train the model.

• Different training data sets are used to train the model.

• Ideally the difference in the data sets should not be signif ic ant
and the model trained using different training data sets should
not be too different.

• However, in case of overf it ting, since the model closely

matches the training data, even a small difference in training
data gets overstated in the model.

19
Bias-variance trade-off

20
• So, the problems in training a model can happen because either (a) the
model is too simple and hence fails to interpret the data grossly or (b)
the model is extremely complex and magniﬁes even small differences in
the training data.

• Increasing the bias will decrease the variance, and Increasing the
variance will decrease the bias.

• Parametric algorithms demonstrate high bias but low variance.

• non-parametric algorithms demonstrate low bias & high variance.

• The goal of supervised machine learning is to achieve a balance

between bias and variance.

• For example, in k -Nearest Neighbors or k- NN, the user conf ig urable

parameter ‘k’ can be used to do a trade-off between bias and variance.
When the value of ‘k’ is decreased, the model becomes simpler to fit and
bias increases. When ‘k’ is increased, the variance increases.
21
EVALUATING PERFORMANCE OF A MODEL
Supervised learning - classification
 In supervised learning, one major task is classification.
 The responsibility of the classif ication model is to assign class label to the
target feature based on the value of the predictor features.
 For example, in the problem of predicting the win/loss in a cricket match, the
classif ier will assign a class value win/loss to target feature based on the
values of other features like
whether the team won the toss,
number of spinners in the team,
number of wins the team had in the tournament, etc.
 To evaluat e t he p er formance of t he mod el, t he numb er of correct
classifications or predictions made by the model has to be recorded.
 A classif ication is said to be correct if, say for example in the given problem,
it has been predicted by the model that the team will win and it has actually
won.
22
The accuracy of the model is calculated based on the number of correct and incorrect
classifications or predictions made by a model.
 If 99 out of 100 times the model has classif ie d correctly, e.g. 99% accuracy in case of a
sports win predictor model may be reasonably good but
 the same number i.e., 99% may not be acceptable problem is to deal with predicting a critical
illness. In this case, even the 1% incorrect prediction may lead to loss of many lives. So the
model performance needs to be evaluated based on the learning problem.
So, let’s start with looking at model accuracy more closely.
There are four possibilities with regards to the cricket match win/loss prediction:
1. the model predicted win and the team won 2. the model predicted win and the team lost
[Link] model predicted loss and the team won [Link] model predicted loss and the team lost.
In this problem, the obvious class of interest is ‘win’.
The f irst case, i.e. the model predicted win and the team won is a case where the model has
correctly classif ied data instances as the class of interest. These cases are referred as True
Positive (TP) cases. The second case, i.e. the model predicted win and the team lost is a case
where the model incorrectly classif ied data instances as the class of interest. These cases are
referred as False Positive (FP) cases.

23
 The third case, i.e. the model predicted loss and the team won is a case
where the model has incorrectly classif ied as not the class of interest.
These cases are referred as False Negative (FN) cases.
 The fourth case, i.e. the model predicted loss and the team lost is a case
where the model has correctly classif ied as not the class of interest.
These cases are referred as True Negative (TN) cases. All these four
cases are depicted in Figure 3.7 .
For any classif ication model, model accuracy is given by total number of
correct classif ications (either as the class of interest, i.e. True Positive or
as not the class of interest, i.e. True Negative) divided by total number of
classiﬁcations done.

24
 A matrix containing correct and incorrect predictions in the form of TPs, FPs,
FNs and TNs is known as confusion matrix.
 The win/loss prediction of cricket match has two classes of interest – win and
loss.
 For that reason it will generate a 2 × 2 confusion matrix.
 For a classif ication problem involving three classes, the confusion matrix would
be 3 × 3, etc.
Let’s assume the confusion matrix of the win/loss prediction of cricket match
problem to be as below:

In context of the above confusion matrix, total count of TPs = 85, count of FPs = 4,
count of FNs = 2 and count of TNs = 9.

25
The percentage of misclassif ic ations is indicated using error rate which is
measured as

In context of the above confusion matrix,

Sometimes, correct prediction, both TPs as well as TNs, may happen by mere
coincidence, but, it should not happen.
Kappa value of a model indicates the adjusted the model accuracy. It is calculated
using the formula below:

26
In context of the above confusion matrix,
total count of TPs = 85, count of FPs = 4,
count of FNs = 2 and count of TNs = 9.

Kappa value can be 1 at the maximum, which represents perfect

agreement between model’s prediction and actual values. 27
In certain learning problems, it is critical to have extremely low number of FN cases,
because, it will have a different consequence. For Ex, cancer detection. In these problems
there are some measures of model performance which are more important than
accuracy. Two such measurements are sensitivity and speciﬁcity of the model.
The sensitivity of a model measures the proportion of TP examples or positive cases
which were correctly classiﬁed. It is measured as

In the context of the above confusion matrix for the cricket match win prediction
problem,

A high value of sensitivity is more desirable than a high value of accuracy.

Specif icity of a model measures the proportion of negative examples which have been
correctly classif ied. In the context, of malignancy prediction of tumours, specif icity gives
the proportion of benign tumours which have been correctly classif ied. In the context of
the above confusion matrix for the cricket match win prediction problem,

A higher value of speciﬁcity will indicate a better model performance.

28
There are two other performance measures of a supervised learning model which
are similar to sensitivity and specif ic ity. These are precision and recall. While
precision gives the proportion of positive predictions which are truly positive, recall
gives the proportion of TP cases over all actually positive cases.

Precision indicates the reliability of a model in predicting a class of interest. When

the model is related to win / loss prediction of cricket, precision indicates how
often it predicts the win correctly. In context of the above confusion matrix for the
cricket match win prediction problem,

A model with higher precision is perceived to be more reliable.

29
Recall indicates the proportion of correct prediction of positives to the total number
of positives. In case of win/loss prediction of cricket, recall resembles what
proportion of the total wins were predicted correctly.

In the context of the above confusion matrix for the cricket match win prediction
problem,

F-measure is another measure of model performance which combines the precision

and recall. It takes the harmonic mean of precision and recall as calculated as

In context of the above confusion matrix for the cricket match win prediction
problem,

30
Supervised learning – regression

A regression model which ensures that the difference

between predicted and actual values is low can be considered
as a good model.
The following figure represents a very simple problem of real
estate value prediction solved using linear regression model.
If ‘area’ is the predictor variable (Input variable say x ) and
‘value’ is the target variable (output variable say y), the linear
regression model can be represented in the form:

31
FIG. 3.9 Error – Predicted vs. actual value

32
 For a certain value of x, say x̂, the value of y is predicted as ŷ whereas the
actual value of y is Y (say).
 The distance between the actual value and the f itted or predicted value,
i.e. ŷ is known as residual.
 The regression model can be considered to be f itted well, if the difference
between actual and predicted value, i.e. the residual value is less.
R-squared is a good measure to evaluate the model f itness. It is also known
as the coefficient of determination, or
for multiple regression, the coefficient of multiple determination.
The R-squared value lies between 0 to 1 (0%–100%) with a larger value
representing a better fit. It is calculated as:

33
Sum of Squares Total (SST) = squared differences of each
observation from the overall mean =

where y ̅ is the mean.

Sum of Squared Errors (SSE) (of prediction) = sum of

the squared residuals =

where ^y is the predicted value of yi and Yi is the actual value of yi.

34
Unsupervised learning - clustering
 Clustering algorithms try to prepare groupings amongst the data sets.
 But, it is quite diﬃcult to evaluate the performance of a clustering algorithm.
 There are two challenges which lie in the process of clustering:
 It is generally not known how many clusters can be formulated from a
particular data set.
 Even if the number of clusters is given, the same number of clusters can be
formed with different groups of data instances.
A clustering algorithm is successful if the clusters identif ie d using the algorithm is
able to achieve the right results in the overall problem domain.

35
popular approaches which are adopted for cluster quality evaluation.

1. Internal evaluation

 The internal evaluation methods generally measure cluster quality based on

homogeneity of data belonging to the same cluster and heterogeneity of data
belonging to different clusters.
 The homogeneity/heterogeneity is decided by some similarity measure.
 For example, silhouette coef fic ient, which is one of the most popular internal
evaluation methods, uses distance (Euclidean or Manhattan distances most
commonly used) between data elements as a similarity measure.
 The value of silhouette width ranges between –1 and +1, with a high value
indicating high intra- cluster homogeneity and inter-cluster heterogeneity.
For a data set clustered into ‘k’ clusters, silhouette width is calculated as:

36
a (i ) is the average distance between the i th data instance and all other data
instances belonging to the same cluster and b ( i ) is the lowest average
distance between the i-the data instance and data instances of all other
clusters.

FIG. Silhouette width calculation

37
Let’s try to understand this in context of the example depicted in f igure. There
are four clusters namely cluster 1, 2, 3, and 4.
Let’s consider an arbitrary data element ‘ i ’ in cluster 1, resembled by the
asterisk.
a(i) is the average of the distances ai1 , ai2 , ..., ain1 of the different data elements
from the i th data element in cluster 1, assuming there are n1 data elements in
cluster 1. Mathematically,

In the same way, let’s calculate the distance of an arbitrary data element ‘i’ in
cluster 1 with the different data elements from another cluster, say cluster 4 and
take an average of all those distances. Hence,

38
where n4 is the total number of elements in cluster 4. In the same way, we can
calculate the values of b12 (average) and b13 (average). b (i) is the minimum of all
these values. Hence, we can say that,
b(i) = minimum [b12(average), b13(average), b14(average)]

2. External evaluation

 In this approach, class label is already known for the data set.
 But, the known class labels are not a given during clustering time.
 The cluster algorithm is assessed based on how close the results are
compared to those known class labels.
For example, purity is one of the most popular measures of cluster algorithms –
evaluates the extent to which clusters contain a single class.

For a data set having ‘ n ’ data instances and ‘ c ’ known class labels which
generates ‘k’ clusters,

purity is measured as:

39
IMPROVING PERFORMANCE OF A MODEL
We have already discussed earlier that the model selection is done one several
aspects:
[Link] of learning the task in hand, i.e. supervised or unsupervised
[Link] of the data, i.e. categorical or numeric
[Link] on the problem domain
[Link] all, experience in working with different models to solve
[Link] of diverse domains

What are the different ways to improve the performance of models?

Model parameter tuning is the process of adjusting the model f itting options.
For example, in the popular classif ication model k-Nearest Neighbour (kNN),
using different values of ‘k’ or the number of nearest neighbours to be
considered, the model can be tuned. In the same way, a number of hidden
layers can be adjusted to tune the performance in neural networks model. Most
machine learning models have at least one parameter which can be tuned.

40
As an alternate approach of increasing the performance of one model, several
models may be combined together. i.e. one model may learn one type data sets
well while struggle with another type of data set. Another model may perform well
with the data set which the f irst one struggled with. This approach of combining
different models with diverse strengths is known as ensemble (depicted in Figure
3.11 ). Ensemble helps in averaging out biases of the different underlying models
and also reducing the variance. Ensemble methods combine weaker learners to
create stronger ones. A performance boost can be expected even if models are
built as usual and then ensembled.

FIG. Ensemble

41
Following are the typical steps in ensemble process:
 Build a number of models based on the training data
 For diversifying the models generated, the training data subset can be varied
using the allocation function.
 Sampling techniques like bootstrapping may be used to generate unique training
data sets.
 Alternatively, the same training data may be used but the models combined are
quite varying, e.g, SVM, neural network, kNN, etc.
 The outputs from the different models are combined using a combination
function.
 A very simple strategy of combining, say in case of a prediction task using
ensemble, can be majority voting of the different models combined.
 For example, 3 out of 5 classes predict ‘win’ and 2 predict ‘loss’ – then the f inal
outcome of the ensemble using majority vote would be a ‘win’.

42
One of the earliest and most popular ensemble models is bootstrap aggregating or
bagging. Bagging uses bootstrap sampling method to generate multiple training data
sets. These training data sets are used to generate (or train) a set of models. Then the
outcomes of the models are combined by majority voting (classiﬁcation) or by average
(regression).
Just like bagging, boosting is another key ensemble- based technique. In this type of
ensemble, weaker learning models are trained on resampled data and the outcomes
are combined using a weighted voting approach based on the performance of different
models. Adaptive boosting or AdaBoost is a special variant of boosting algorithm. It is
based on the idea of generating weak learners and slowly learning.
Random forest is another ensemble-based technique. It is an ensemble of decision
trees – hence the name random forest to indicate a forest of decision trees.

Supervised Learning Model Evaluation Techniques
No ratings yet
Supervised Learning Model Evaluation Techniques
43 pages
Supervised Learning Model Evaluation Techniques
No ratings yet
Supervised Learning Model Evaluation Techniques
43 pages
Supervised Learning Model Evaluation Techniques
No ratings yet
Supervised Learning Model Evaluation Techniques
43 pages
Supervised Learning Model Evaluation Guide
No ratings yet
Supervised Learning Model Evaluation Guide
173 pages
ML - 3-1
No ratings yet
ML - 3-1
31 pages
Model Selection and Evaluation in ML
No ratings yet
Model Selection and Evaluation in ML
48 pages
Model Selection and Evaluation in ML
No ratings yet
Model Selection and Evaluation in ML
52 pages
Model Validation Techniques in ML
100% (2)
Model Validation Techniques in ML
26 pages
Machine Learning Model Selection Guide
No ratings yet
Machine Learning Model Selection Guide
49 pages
Bias-Variance Tradeoff in Machine Learning
No ratings yet
Bias-Variance Tradeoff in Machine Learning
7 pages
Unit 5 Ds
No ratings yet
Unit 5 Ds
14 pages
Evaluating Prediction Accuracy in ML
No ratings yet
Evaluating Prediction Accuracy in ML
52 pages
Model Selection and Evaluation in ML
No ratings yet
Model Selection and Evaluation in ML
28 pages
Train-Test Split in Machine Learning
No ratings yet
Train-Test Split in Machine Learning
10 pages
Unit-3 Preparing To Model & Data Preprocessing
No ratings yet
Unit-3 Preparing To Model & Data Preprocessing
50 pages
ML Unit-4
No ratings yet
ML Unit-4
45 pages
Cross-Validation Techniques in ML
No ratings yet
Cross-Validation Techniques in ML
13 pages
Modelling in Machine Learning
No ratings yet
Modelling in Machine Learning
24 pages
Bias-Variance Tradeoff in ML Models
No ratings yet
Bias-Variance Tradeoff in ML Models
23 pages
Understanding Bias-Variance Tradeoff
No ratings yet
Understanding Bias-Variance Tradeoff
14 pages
Cross-Validation Techniques in ML
No ratings yet
Cross-Validation Techniques in ML
56 pages
FML Unit2
No ratings yet
FML Unit2
6 pages
Understanding Generalization Errors in ML
No ratings yet
Understanding Generalization Errors in ML
9 pages
Evaluating Underfit in Machine Learning
No ratings yet
Evaluating Underfit in Machine Learning
16 pages
Model Training and Evaluation in ML
No ratings yet
Model Training and Evaluation in ML
46 pages
Model Evaluation in Data Science
No ratings yet
Model Evaluation in Data Science
81 pages
Machine Learning System Design Overview
100% (3)
Machine Learning System Design Overview
84 pages
Cross-Validation in Model Evaluation
No ratings yet
Cross-Validation in Model Evaluation
17 pages
Machine Learning Model Evaluation Methods
No ratings yet
Machine Learning Model Evaluation Methods
28 pages
Cross-Validation Techniques in ML
No ratings yet
Cross-Validation Techniques in ML
20 pages
Cross-Validation and Bias-Variance Tradeoff
No ratings yet
Cross-Validation and Bias-Variance Tradeoff
50 pages
Machine Learning Training & Evaluation Guide
No ratings yet
Machine Learning Training & Evaluation Guide
42 pages
Cross-Validation in Machine Learning
No ratings yet
Cross-Validation in Machine Learning
51 pages
AIML Explanations
No ratings yet
AIML Explanations
38 pages
ML Lecture 2 Models in ML
No ratings yet
ML Lecture 2 Models in ML
10 pages
Evaluating Machine Learning Algorithms
100% (2)
Evaluating Machine Learning Algorithms
42 pages
MSDA 3050 Lecture3 S24
No ratings yet
MSDA 3050 Lecture3 S24
44 pages
How to Train and Validate ML Models
No ratings yet
How to Train and Validate ML Models
40 pages
Cross-Validation Techniques in ML
No ratings yet
Cross-Validation Techniques in ML
20 pages
2025 AN2DL 03 NeuralNetworksTraining
No ratings yet
2025 AN2DL 03 NeuralNetworksTraining
41 pages
Machine Learning Model Evaluation Guide
No ratings yet
Machine Learning Model Evaluation Guide
34 pages
Bias-Variance Tradeoff in Machine Learning
No ratings yet
Bias-Variance Tradeoff in Machine Learning
40 pages
Machine Learning Model Validation Techniques
No ratings yet
Machine Learning Model Validation Techniques
54 pages
Model Selection in Machine Learning
No ratings yet
Model Selection in Machine Learning
57 pages
UNIT IV Software Presentation
No ratings yet
UNIT IV Software Presentation
6 pages
Machine Learning Model Selection & Training
No ratings yet
Machine Learning Model Selection & Training
26 pages
Supervised Learning: Overfitting & Techniques
No ratings yet
Supervised Learning: Overfitting & Techniques
22 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
Prediction Error in Model Selection
No ratings yet
Prediction Error in Model Selection
28 pages
ML Question Bank Solutions SOL Full
No ratings yet
ML Question Bank Solutions SOL Full
79 pages
Model Selection and Error Estimation
No ratings yet
Model Selection and Error Estimation
18 pages
Evaluating ML Algorithms and Models
No ratings yet
Evaluating ML Algorithms and Models
21 pages
Neural Networks: Training & Overfitting Solutions
No ratings yet
Neural Networks: Training & Overfitting Solutions
40 pages
Machine Learning Model Training Guide
No ratings yet
Machine Learning Model Training Guide
13 pages
Machine Learning Models Overview
No ratings yet
Machine Learning Models Overview
54 pages
Best Practices in Model Evaluation
No ratings yet
Best Practices in Model Evaluation
17 pages
Understanding Model Evaluation in ML
No ratings yet
Understanding Model Evaluation in ML
29 pages
Model Evaluation and Hyperparameter Tuning
No ratings yet
Model Evaluation and Hyperparameter Tuning
48 pages
Machine Learning Challenges & Solutions
No ratings yet
Machine Learning Challenges & Solutions
26 pages
r23 Mid Format-1
No ratings yet
r23 Mid Format-1
2 pages
ML Unit-Iv-1
No ratings yet
ML Unit-Iv-1
40 pages
Cse (Ai &ML) II-II Result Analysis2025
No ratings yet
Cse (Ai &ML) II-II Result Analysis2025
7 pages
CC For Ai Lab
No ratings yet
CC For Ai Lab
3 pages
III - I Sem Result Analysis New 1
No ratings yet
III - I Sem Result Analysis New 1
38 pages
Cloud Computing UNIT-1
No ratings yet
Cloud Computing UNIT-1
40 pages
AP SSC Vii Fa-4 Social - Revision Assignment-2
No ratings yet
AP SSC Vii Fa-4 Social - Revision Assignment-2
4 pages
Activity Report On Nss Special Camp
No ratings yet
Activity Report On Nss Special Camp
14 pages
AP SSC Vii Fa-4 Social - Revision Assignment-1
No ratings yet
AP SSC Vii Fa-4 Social - Revision Assignment-1
4 pages
Z.p.hs Polycet Set1
No ratings yet
Z.p.hs Polycet Set1
6 pages
AP SSC Vii Fa-4 - Physical Science Assignment-2
No ratings yet
AP SSC Vii Fa-4 - Physical Science Assignment-2
3 pages
Software Engineering Mid Exam Questions
No ratings yet
Software Engineering Mid Exam Questions
1 page
AI Mid Exam Questions for CSE Students
100% (1)
AI Mid Exam Questions for CSE Students
1 page
JNTUA June 2025 Exam Notification
No ratings yet
JNTUA June 2025 Exam Notification
1 page
AR/VR Fundamentals Assignment Questions
No ratings yet
AR/VR Fundamentals Assignment Questions
1 page
CYP Training Program for Yoga Day 2025
No ratings yet
CYP Training Program for Yoga Day 2025
1 page
Workshop Proposal: Generative AI & Engineering
No ratings yet
Workshop Proposal: Generative AI & Engineering
6 pages
JNTUA B.Tech Academic Calendar 2025-26
No ratings yet
JNTUA B.Tech Academic Calendar 2025-26
1 page
Workshop Proposal: Python with SQL
No ratings yet
Workshop Proposal: Python with SQL
6 pages
Student Exam Subject List 2023
No ratings yet
Student Exam Subject List 2023
54 pages
Cloud Computing Course Certificate 2024
No ratings yet
Cloud Computing Course Certificate 2024
1 page
22ade006-Computer Vision
No ratings yet
22ade006-Computer Vision
4 pages
Introduction To Machine Learning: Module - 1 Chapter - 1
No ratings yet
Introduction To Machine Learning: Module - 1 Chapter - 1
95 pages
Student Engagement Prediction Using Deep Learning
No ratings yet
Student Engagement Prediction Using Deep Learning
6 pages
Deep Learning for Image Watermarking
No ratings yet
Deep Learning for Image Watermarking
21 pages
Data Orchestration in DNN Accelerators
No ratings yet
Data Orchestration in DNN Accelerators
166 pages
Explainable AI for Fake News Detection
No ratings yet
Explainable AI for Fake News Detection
5 pages
Deep Learning for Speech Emotion Recognition
No ratings yet
Deep Learning for Speech Emotion Recognition
5 pages
Advanced Machine Learning Concepts
No ratings yet
Advanced Machine Learning Concepts
7 pages
Class 10 AI Exam Questions and Answers
No ratings yet
Class 10 AI Exam Questions and Answers
3 pages
Al3502 - DLV Iat 2 - 1
No ratings yet
Al3502 - DLV Iat 2 - 1
4 pages
Understanding Dimensionality Reduction Techniques
No ratings yet
Understanding Dimensionality Reduction Techniques
13 pages
Assignment - 1: Part A: Regression
No ratings yet
Assignment - 1: Part A: Regression
7 pages
Brain Tumor Detection with CNN
No ratings yet
Brain Tumor Detection with CNN
47 pages
Handout WW Expo 2025 Collins Aerospace Avionics PDF
No ratings yet
Handout WW Expo 2025 Collins Aerospace Avionics PDF
11 pages
Machine Learning Summer Training Report
No ratings yet
Machine Learning Summer Training Report
36 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
10 pages
Real-Time Road Damage Detection Survey
No ratings yet
Real-Time Road Damage Detection Survey
13 pages
FedHealth: Federated Learning for Wearables
No ratings yet
FedHealth: Federated Learning for Wearables
11 pages
AI Terminology Glossary
No ratings yet
AI Terminology Glossary
4 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
29 pages
Intelligent Pavement Crack Detection System
No ratings yet
Intelligent Pavement Crack Detection System
13 pages
Discretizing Time Series with RNNs
No ratings yet
Discretizing Time Series with RNNs
5 pages
Machine Learning Internship Report
No ratings yet
Machine Learning Internship Report
25 pages
Week 2: AI & ML Course Overview
No ratings yet
Week 2: AI & ML Course Overview
29 pages
Plant Leaf Disease Detection Review
No ratings yet
Plant Leaf Disease Detection Review
66 pages
Challenges in Training GANs Explained
No ratings yet
Challenges in Training GANs Explained
2 pages
ML Cross Validation Techniques Explained
No ratings yet
ML Cross Validation Techniques Explained
14 pages
Privacy Preserving
No ratings yet
Privacy Preserving
12 pages
AI Computer Vision Engineer Job Opening
No ratings yet
AI Computer Vision Engineer Job Opening
3 pages
CFP - 7th International Conference On NLP & Big Data (NLPD 2026)
No ratings yet
CFP - 7th International Conference On NLP & Big Data (NLPD 2026)
3 pages

Supervised Learning Model Evaluation Techniques

Uploaded by

Supervised Learning Model Evaluation Techniques

Uploaded by

UNIT – II

The division of data of different classes into the training

 This problem can be solved by applying stratiﬁed random

 In this technique, several random holdouts are used to measure the

 As can be observed in the f igure, each of the circles resembles a record in

• A key consideration in learning the target function from the training

• Fitness of a target function approximated by a learning algorithm

Errors due to ‘Bias’ :

• Errors due to variance occur from difference in training data

• Different training data sets are used to train the model.

• However, in case of overf it ting, since the model closely

• Parametric algorithms demonstrate high bias but low variance.

• non-parametric algorithms demonstrate low bias & high variance.

• The goal of supervised machine learning is to achieve a balance

• For example, in k -Nearest Neighbors or k- NN, the user conf ig urable

In context of the above confusion matrix,

Kappa value can be 1 at the maximum, which represents perfect

A high value of sensitivity is more desirable than a high value of accuracy.

A higher value of speciﬁcity will indicate a better model performance.

Precision indicates the reliability of a model in predicting a class of interest. When

A model with higher precision is perceived to be more reliable.

F-measure is another measure of model performance which combines the precision

A regression model which ensures that the difference

where y ̅ is the mean.

Sum of Squared Errors (SSE) (of prediction) = sum of

where ^y is the predicted value of yi and Yi is the actual value of yi.

 The internal evaluation methods generally measure cluster quality based on

FIG. Silhouette width calculation

purity is measured as:

What are the different ways to improve the performance of models?

You might also like