Matters of Discussion
Machine Learning Experiments:
Evaluating classification model performance
[Review]
Classification model Performance - Evaluating
Predictive model Performance
Techniques to Improve Classification Accuracy
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
1
Performance measures
❖The performance of the developed model can
be evaluated using Confusion Matrix
=== Confusion Matrix ===
a b <-- classified as
Predicted
150 28 | a = tested_negative Class
32 51 | b = tested_positive
Class = Class
Negative =Positive
Actual Class = True False
Class Negative
Negative Positive
[a] (FP)
(TN)
Class = False True
Positive Negative Positive
[b] (FN) (TP)
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
2
Data set for building Confusion Matrix Example
TP FP
FN TN
3
Performance measures
• Performance metrics
4
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
Classifications - Classification methods
Decision Tree,
Naïve Bayes,
K-Nearest Neighbors
Already Discussed - ok
how to estimate the performance of those algorithms based on the
measures.
Now we investigate the performance measure?
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
5
Performance measure for Naïve Bayes classification[class wise]
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
6
Comparison Result
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
7
Four common Test options
For both, training and testing, you need data.
Those four options are commonly used.
1. Use training set:
➢ Means you will test your knowledge on the
same data you learned.
➢ Not very accepted because you can just make
build your code to memorize the training
instances (which will be in the test).
➢ Less degree of use for research.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
8
2. Supplied test set:
❖ It is an external file that you can use as
training set.
❖ It can be used when you want/need to test
the algorithm's knowledge against a specific
test set.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
9
3. K-fold cross validation
❖ The training set is randomly divided into K disjoint sets of
equal size where each part has roughly the same class
distribution.
❖ You fold the data in 10 folds (for example) and
repeat 10 (because it is 10-folds) the following
process: Use 9 folds for training and leave 1 fold out
for testing.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
10
4. Percentage split:
❖ Splits the data and separates x% of the data
for learning and the rest of it for testing.
❖ It is useful when your algorithm is slow.
❖ The best method to evaluate your classifier is
to train algorithm with 67% of your training
data and 33% to test your classifier.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
11
Model performance for classification models
A classification model is a machine learning model
which predicts a Y variable which is categorical:
1. Will the employ leave the organization or stay?
2. Does the patient have cancer or not?
3. Does this customer fall into high risk, medium
risk or low risk?
4. Will the customer pay or default a loan?
A classification model in which the Y variable can
take only 2 values is called a binary classifier.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
12
CASE: Confusion matrix for customer class prediction
=== Confusion Matrix ===
a b <-- classified as TN= 150 ; FP = 28
150 28 | a = tested_negative
FN= 32 ; TP = 51
32 51 | b = tested_positive
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
13
model performance measure
1. Accuracy: = [TP+TN] / [TP+FP+TN+FN]
Accuracy is the number of correct predictions made by
the model by the total number of records. The best
accuracy is 100% indicating that all the predictions are
correct. TN= 150 ; FP = 28
2. Sensitivity or recall FN= 32 ; TP = 51
Sensitivity (Recall or True positive rate) is calculated as
the number of correct positive predictions divided by
the total number of positives.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
14
3. Specificity:
Specificity (true negative rate) is calculated as
the number of correct negative predictions
divided by the total number of negatives.
TN= 150 ; FP = 28
4. Precision: FN= 32 ; TP = 51
Precision (Positive predictive value) is
calculated as the number of correct positive
predictions divided by the total number of
positive predictions.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
15
5. KS statistic : KS statistic is a measure of degree of
separation between the positive and negative
distributions. KS value of 100 indicates that the scores
partition the records exactly such that one group
contains all positives and the other contains all
negatives. In practical situations, a KS value higher than
50% is desirable.
6. ROC chart & Area under the curve (AUC)
ROC chart is a plot of 1-specificity in the X axis and
sensitivity in the Y axis. Area under the ROC curve is a
measure of model performance. The AUC of a random
classifier is 50% and that of a perfect classifier is 100%.
For practical situations, an AUC of over 70% is
desirable.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
16
7. Precision vs. recall: Recall or sensitivity gives
us information about a model’s performance on
false negatives (incorrect prediction of
customers who will default),
while precision gives us information of the
model’s performance of false positives.
8. F-measure [measure of a test's accuracy]
= F1 Score = 2*(Recall * Precision) / (Recall +
Precision) TN= 150 ; FP = 28
FN= 32 ; TP = 51
(F1 score or F score): alternate terms
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
17
Performance measures
• Performance metrics
18
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
Performance measures Summary
19
Rules extracted from classification algorithms
The final classification Rules are the actual classifier model use for prediction.
20
Challenge of Evaluation Metrics
1) Evaluation measures play a crucial role in both
assessing the classification performance and
guiding the classifier modeling.
2) In fact, the use of common metrics in imbalanced
domains can lead to sub-optimal classification
models and might produce misleading conclusions
since these measures are insensitive to skewed
domains.
✓ skewness is a measure of the asymmetry of the
probability distribution.
21
Techniques to
Improve Classification Accuracy.
Data-Level Improvements
1.1 Data Cleaning
Handle missing values (imputation, removal)
Remove duplicates
Correct labeling errors
Detect and treat outliers
1.2 Handling Class Imbalance
Oversampling (e.g., SMOTE)
Undersampling
Class weighting
Generate synthetic samples
1.3 Data Augmentation (especially for images/text/audio)
Image rotation, flipping, cropping
Text synonym replacement, paraphrasing
Noise injection in audio
1.4 Increase Dataset Size
Collect more data
Use transfer learning
Use pre-trained embeddings 22
Techniques to
Improve Classification Accuracy.
2. Feature Engineering
2.1 Feature Selection
Remove irrelevant features
Use:
Chi-square test
Information Gain
Recursive Feature Elimination (RFE)
L1 regularization
2.2 Feature Extraction
PCA (Principal Component Analysis)
LDA (Linear Discriminant Analysis)
Autoencoders
2.3 Feature Scaling
Standardization (Z-score normalization)
Min–Max scaling
Robust scaling (for outliers)
2.4 Domain-Specific Features
Use domain knowledge
Create interaction features
23
Techniques to
Improve Classification Accuracy.
3. Model-Level Techniques
3.1 Algorithm Selection
Try multiple models:
Logistic Regression
SVM
k-NN
Decision Trees
Random Forest
Gradient Boosting (XGBoost, LightGBM, CatBoost)
Neural Networks
24
Techniques to
Improve Classification Accuracy.
3.2 Hyperparameter Tuning
Grid Search
Random Search
Bayesian Optimization
Cross-validation tuning
3.3 Ensemble Methods
Bagging
Boosting
Stacking
Voting classifiers
Ensembles often significantly improve accuracy.
25
Techniques to
Improve Classification Accuracy.
4. Regularization and Optimization
4.1 Prevent Overfitting
L1/L2 regularization
Dropout (neural networks)
Early stopping
Pruning (decision trees)
4.2 Better Optimization
Learning rate tuning
Adaptive optimizers (Adam, RMSProp)
Batch normalization
26
Techniques to
Improve Classification Accuracy.
5. Evaluation Strategy Improvements
5.1 Proper Validation
k-fold cross-validation
Stratified sampling
Avoid data leakage
5.2 Better Metrics (when accuracy is misleading)
Precision
Recall
F1-score
ROC-AUC
Confusion matrix analysis
27
Techniques to
Improve Classification Accuracy.
Practical Workflow for Improving Accuracy
i. Clean and preprocess data
ii. Perform exploratory data analysis (EDA)
iii. Engineer meaningful features
iv. Train baseline model
v. Tune hyperparameters
vi. Try ensemble methods
vii. Validate properly
viii. Analyze errors and iterate
28
ACTIVITY-07
SOLVED
In a Covid test of 1000 patients, there were 45 positive tests,
of which 30 patients had covid and 15 were falsely tested
positive.
Of the 955 negative tests there were 5 that were incorrect,
these patients had covid but were tested negatively.
Draw the confusion matrix and calculate the accuracy,
precision, recall, sensitivity, and F1 score from the matrix.
29
ACTIVITY-07—cont..
The total of 1000 cases consist of 45 positive
tests (TP + FP) which are correct (30) and
incorrect (15). The other 955 negative cases (FP
+ FN) contain 5 incorrect tests and 950 correct
tests.
True Positive (TP) a correct positive test – 30
True Negative (TN) a correct negative test – 950
False Positive (FP) an incorrect positive test – 15
False Negative (FN) an incorrect negative test – 5
30
ACTIVITY-07—cont..
31
ACTIVITY-07—cont..
• Accuracy – the percentage of correct
predictions.
• Precision– the percentage of positive, correct
predictions
• Recall– the percentage of actual cases that
the test has correctly identified.
• Sensitivity– the same as recall
• F1 score – a measure that equally combines
both precision and recall
32
ACTIVITY-07—cont..
Accuracy
number of correct predictions / total number of
predictions
30+950 / 30 + 15+ 950 + 5
= 980/1000
= 49/50 or 98%
Precision
true positive / (true positive + false positive)
30 / (30+15)
=30/45
=2/3 or 66.7%
33
ACTIVITY-07—cont..
Recall (and sensitivity)
true positive / (true positive + false negative)
30 / 30+5
=30/35
= 0.857 or 85.7%
F1 score
2 x (precision*recall / precision + recall)
= 2 * (0.57/1.52)
= 2*0.375
=0.75 or 75%
34
ACTIVITY-13
Explore a classification problem case by
considering any real-world domain application,
formulate a confusion matrix through scenario
assumption for the classifier model, and
investigate the various parameters to measure
the performance of the classifier model.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
35
Extra Query
1. Investigate the four test options to perform
training and testing for Machine learning
algorithms based on the dataset. What do the
four test options mean, and when do you use
them?
2. Investigate the significant challenges in the
context of the performance measures of the
classifier models that are connected to the real-
world application scenario.
36
Cheers For the Great Patience!
Query Please?
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
37