0% found this document useful (0 votes)

11 views35 pages

Module 4

The document provides an overview of classification and prediction techniques in machine learning, detailing their definitions, examples, and common algorithms. It outlines the process of solving classification problems, including data preparation, model selection, training, evaluation, and tuning, as well as various evaluation metrics and techniques. Additionally, it discusses decision tree construction, Naive Bayes classifiers, and their applications in real-world scenarios.

Uploaded by

rajeshshikari.09

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views35 pages

Module 4

Uploaded by

rajeshshikari.09

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Module-iv

Classification And Prediction

Problem Definition: -
Classification maps data into predefined categorical labels
(e.g., safe/risky, spam/not-spam).
Prediction models are continuous-valued functions to
forecast unknown or missing numerical values (e.g.,
predicting income, sales, or temperature).
Both techniques use labelled training data to predict new,
unseen data.
Classification vs. Prediction: Definitions
 Classification (Categorical Prediction): A supervised
learning process that finds a model (classifier) to
categorise data objects into discrete classes.
o Goal: Assign a class label to new data.
o Examples: Email spam detection, loan approval
classification, medical diagnosis (disease/no-
disease).
 Prediction (Numerical Prediction/Regression): A
process used to predict missing or unavailable numerical
data values for new observations.
o Goal: Estimate a continuous value.
o Examples: Predicting customer spending,
forecasting stock prices, estimating product
demand.
Usage Examples
 Classification:
o Finance: Determining if a loan application is "safe"
or "risky".
o Healthcare: Classifying tumour cells as
"malignant" or "benign".
o Marketing: Segmenting customers into "high-
value" or "low-value" groups.
 Prediction:
o Finance: Predicting the dollar amount a customer
will spend.
o Sales: Forecasting future sales trends based on
historical data.
o Technology: Estimating the remaining lifespan of a
machine component.
Synonyms and Key Terms
 Classification: Categorisation, Supervised Learning
(Classification), Class Label Prediction, Discrete
Classification.
 Prediction: Numerical Prediction, Regression Analysis,
Forecasting, Continuous Value Estimation.
Process Flow
1. Learning Step (Training): Algorithms analyse
historical data to build a model (e.g., decision trees,
neural networks).
2. Classification/Prediction Step (Testing): The model is
used to predict or classify new data, with accuracy
determined by comparing predicted results to known,
actual labels.
Common Algorithms:
 Classification: Decision Trees, Naïve Bayes, Support
Vector Machines (SVM), k-Nearest Neighbours (k-NN).
 Prediction: Linear Regression, Nonlinear Regression,
Artificial Neural Networks (ANN).
General Approaches for solving a classification problem
Solving a classification problem involves a structured, two-
step process: building a model using labelled training
data and subsequently using that model to predict class
labels forunseen data. The general approach
requires data preparation, algorithm selection (e.g.,
decision trees, SVM), model training, evaluation (using
accuracy, precision, F1-score), and tuning.
Key Steps for Solving a Classification Problem
 Data Preparation & Cleaning: Gather, clean, and pre-
process data by handling missing values and scaling
numerical features.
 Feature Selection & Engineering: Identify the most
relevant variables influencing the outcome.
 Data Splitting: Divide the dataset into training (e.g., 70-
80%) and testing (e.g., 20-30%) sets to ensure the model
generalises well.
 Model Selection: Choose an algorithm based on data
type and complexity. Common techniques include:
o Decision Trees: Creates a tree-based model for
decision-making.
o Logistic Regression: Suitable for binary
classification.
o K-Nearest Neighbours (KNN): A lazy learner that
classifies based on nearest neighbours.
o Support Vector Machines (SVM): Find the
optimal hyperplane for separation.
o Naive Bayes: Probabilistic classifier based on
Bayes' theorem.
 Model Training: The algorithm learns patterns from the
training data.
 Model Evaluation: Test the model on unseen data using
a confusion matrix to assess accuracy, precision, recall,
and F1 Score.
 Hyperparameter Tuning: Optimise model parameters
(e.g., via cross-validation) to improve performance.
Common Approaches
 Supervised Learning: The most common approach,
using labelled datasets.
 Lazy vs. Eager Learners: Lazy learners (e.g., KNN)
store data and wait for a test instance, while eager
learners (e.g., Decision Trees) build a model before
receiving test data.
 Binary vs. Multiclass/Multi-label: Selecting the
appropriate approach (e.g., binary for yes/no, multiclass
for multiple categories).

Evaluation of Classifiers
It’s about measuring how well a classification model predicts
class labels on unseen data. Different problems care
about different kinds of errors, so there’s no single
“best” metric.
Confusion Matrix (the foundation)
For binary classification:

Predicted Positive Predicted Negative

Actual Positive True Positive (TP) False Negative (FN)

Actual Negative False Positive (FP) True Negative (TN)

Almost every evaluation metric comes from these four
numbers.
Common Evaluation Metrics
1. Accuracy
𝑇𝑃 + 𝑇𝑁
Accuracy =
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁

Good when: classes are balanced

Bad when: classes are imbalanced (e.g., fraud
detection)
2. Precision
𝑇𝑃
Precision =
𝑇𝑃 + 𝐹𝑃

Answers: “When the model predicts positive, how often is it

correct?”
Important when: false positives are costly (e.g., spam
filtering)
3. Recall (Sensitivity / True Positive Rate)
𝑇𝑃
Recall =
𝑇𝑃 + 𝐹𝑁

Answers: “How many actual positives did we catch?”

Important when: false negatives are costly (e.g.,
disease detection)
4. F1-Score
Precision ⋅ Recall
F1 = 2 ⋅
Precision + Recall

Use when: you need a balance between precision and recall

5. Specificity (True Negative Rate)
𝑇𝑁
Specificity =
𝑇𝑁 + 𝐹𝑃

Often used alongside recall in medical or risk-sensitive

domains.
Threshold-Based Evaluation
Many classifiers output probabilities, not labels.
ROC Curve & AUC
 ROC curve: plots Recall (TPR) vs False Positive Rate
 AUC: probability that a random positive is ranked higher
than a random negative
Good for: overall ranking quality
Limitation: can be misleading with heavy class
imbalance
Precision–Recall Curve
 Focuses on positive class performance
 More informative than ROC for imbalanced datasets
Multi-Class Classification
Metrics can be averaged:
 Macro average: treats all classes equally
 Micro average: weights by class frequency
 Weighted average: balance between macro and micro
Evaluation Strategies (How you test)
Train/Test Split
Simple, fast, but noisy.
Cross-Validation
 k-fold CV is the standard
 Reduces variance in evaluation
Stratified Sampling
Preserves class proportions — very important for imbalanced
data.
Classification techniques
1. Linear Classification Techniques
Logistic Regression
 Models probability using a sigmoid function
 Fast, interpretable, strong baseline
 Works best when classes are linearly separable
Linear Discriminate Analysis (LDA)
 Models class means and variances
 Assumes normal distribution
 Good when data is well-behaved and low-dimensional
2. Distance-Based Techniques
K-Nearest Neighbors (k-NN)
 Classifies based on the majority vote of nearest points
 No training phase (lazy learner)
 Sensitive to noise and feature scaling
3. Tree-Based Techniques
Decision Trees
 Uses feature splits to classify data
 Easy to interpret
 Prone to over fitting
Random Forest
 Ensemble of decision trees
 Reduces over fitting via bagging
 Strong general-purpose classifier
Gradient Boosting (XG Boost, Light GBM, CatBoost)
 Trees built sequentially to fix previous errors
 Very high accuracy
 Needs careful tuning
4. Probabilistic Techniques
Naive Bayes
 Based on Bayes’ theorem
 Assumes feature independence
 Very fast, works well for text classification
5. Margin-Based Techniques
Support Vector Machines (SVM)
 Finds maximum-margin hyper plane
 Works well in high-dimensional spaces
 Kernel trick handles non-linear data
6. Neural Network-Based Techniques
Artificial Neural Networks (ANN)
 Multi-layer perceptrons for classification
 Handles complex patterns
 Requires large datasets
Deep Learning Models
 CNNs → images
 RNNs / Transformers → text & sequences
 State-of-the-art but computationally expensive
7. Rule-Based & Other Techniques
Rule-Based Classifiers
 IF–THEN rules
 Easy to understand
 Limited flexibility
Ensemble Methods
 Combine multiple classifiers
 Examples: Voting, Stacking, Bagging, Boosting
8. Binary vs Multi-Class Techniques
 Binary classifiers: Logistic Regression, SVM, k-NN
 Multi-class strategies: One-vs.-Rest (OvR), One-vs.-
One (OvO), Softmax
Quick Comparison Table

Technique Interpretability Speed Accuracy

Logistic
High Fast Medium
Regression
Technique Interpretability Speed Accuracy

Slow (test
k-NN Medium Medium
time)

Decision Tree High Fast Medium

Random Forest Medium Medium High

SVM Low Medium High

Very
Neural Networks Low Slow Hi
gh

What is Decision Tree Construction?

Decision tree construction is the process of building a tree-
like model by repeatedly splitting the dataset based on
feature values, yielding subsets as pure as possible.
Main Steps in Decision Tree Construction
1. Start with the Root Node
 The entire dataset is placed at the root.
 Choose the best attribute to split the data.
2. Select the Best Splitting Attribute
The best attribute is chosen using an attribute selection
measure:
Common Measures:
 Information Gain (ID3)
 Gain Ratio (C4.5)
 Gini Index (CART)
These measures evaluate how well a feature separates the
classes.
3. Split the Dataset
 The dataset is divided into subsets based on the selected
attribute.
 Each branch represents an outcome of the test.
Example:
Is Outlook = Sunny?
├── Yes
└── No
4. Create Decision Nodes or Leaf Nodes
 If all instances in a subset belong to the same class,
create a leaf node.
 If not, repeat the process recursively on that subset.
5. Stopping Conditions
Tree growth stops when:
 All instances belong to the same class
 No attributes are left for splitting
 A predefined depth is reached
 Information gain becomes negligible
6. Pruning (Avoid Over fitting)
After the tree is built, pruning removes unnecessary branches.
Types of Pruning:
 Pre-pruning (Early stopping)
Stops tree growth early
 Post-pruning
Removes branches after full tree construction
Attribute Selection Measures (Key Formulas)
Entropy
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) = −∑𝑝 log 𝑝

Information Gain
∣𝑆 ∣
𝐼𝐺(𝑆, 𝐴) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆 )
∣𝑆∣

Gini Index
𝐺𝑖𝑛𝑖(𝑆) = 1 − ∑𝑝

Algorithms for Decision Tree Construction

Algorithm Measure Used

ID3 Information Gain

C4.5 Gain Ratio

CART Gini Index

Advantages of Decision Tree Construction

 Easy to understand and interpret
 Handles both categorical and numerical data
 Requires little data pre processing
Limitations
 Sensitive to noisy data
 Can over fit without pruning
 Small data changes may create different trees
Simple Flow of Construction
Start
↓
Select the best attribute
↓
Split data
↓
Pure node?
├─ Yes → Leaf node
└─ No → Repeat

Naive Bayes Classifier

The Naive Bayes Classifier is a probabilistic machine learning

algorithm used mainly for classification problems like spam detection,
sentiment analysis, and document classification.

It is based on Bayes’ Theorem and assumes that features are

independent of each other (that’s why it’s called “naive”).
1. Bayes’ Theorem

Bayes’ theorem formula:

P(C∣X)=P(X∣C)⋅P(C)P(X)P(C|X) = \frac{P(X|C) \cdot

P(C)}{P(X)}P(C∣X)=P(X)P(X∣C)⋅P(C)

Where:

 P(C|X) → Posterior probability (probability of class C given

data X)
 P(X|C) → Likelihood (probability of data X given class C)
 P(C) → Prior probability of class C
 P(X) → Evidence (probability of data X)

In simple words:

It calculates the probability that a data point belongs to a particular

class.

Why “Naive”?
Because it assumes:

All features are independent of each other.

Example:
If you're classifying whether an email is spam:

 Word “free”
 Word “offer”
 Word “win”

Naive Bayes assumes these words occur independently — which is

not fully true in real life, but surprisingly the model still works very
well!

Types of Naive Bayes

1. Gaussian Naive Bayes

oUsed when features are continuous
o Assumes data follows a normal distribution
2. Multinomial Naive Bayes
o Used for text classification
o Based on word counts
3. Bernoulli Naive Bayes
o Used for binary features (0/1, yes/no)

Working Steps
1. Calculate prior probabilities for each class
2. Calculate likelihood probabilities for each feature
3. Apply Bayes theorem
4. Choose class with highest posterior probability

Applications
 Email spam detection
 Sentiment analysis
 Document classification
 Medical diagnosis
 Student performance prediction (like in your CNN project —
but simpler model)

Advantages
 Simple and fast
 Works well with large datasets
 Performs well in text classification
 Requires less training data
Disadvantages

 Assumes feature independence

 Not good when features are highly correlated
Example
Suppose we want to classify a message as Spam or Not Spam.

If:

 P(Spam) = 0.4
 P(Free | Spam) = 0.8
 P(Free | NotSpam) = 0.1

Then the model compares probabilities and chooses the higher one.

Problem

Suppose we want to classify whether a student will pass or fail based

on two features:

 Hours Studied (Low / High)

 Attendance (Poor / Good)

We are given training data of 10 students:

Student Hours Attendance Result

1 High Good Pass

2 High Good Pass

3 High Poor Pass

4 Low Good Pass

5 Low Poor Fail

6 Low Poor Fail

7 High Poor Pass

8 Low Good Pass

Student Hours Attendance Result

9 Low Poor Fail

10 High Good Pass

Now classify a new student:

Hours = High, Attendance = Poor

Step 1: Calculate Prior Probabilities

Total students = 10

Number of Pass = 7
Number of Fail = 3

P(Pass)=7/10=0.7P(Pass) = 7/10 = 0.7P(Pass)=7/10=0.7

P(Fail)=3/10=0.3P(Fail) = 3/10 = 0.3P(Fail)=3/10=0.3

Step 2: Calculate Likelihood Probabilities

For Class = Pass

Among 7 Pass students:

 High Hours = 5
 Low Hours = 2
 Poor Attendance = 2
 Good Attendance = 5

So,

P(High∣Pass)=5/7P(High | Pass) = 5/7P(High∣Pass)=5/7

P(Poor∣Pass)=2/7P(Poor | Pass) = 2/7P(Poor∣Pass)=2/7
For Class = Fail

Among 3 fail students:

 High Hours = 0
 Low Hours = 3
 Poor Attendance = 3
 Good Attendance = 0

So,

P(High∣Fail)=0/3=0P(High | Fail) = 0/3 = 0P(High∣Fail)=0/3=0

P(Poor∣Fail)=3/3=1P(Poor | Fail) = 3/3 = 1P(Poor∣Fail)=3/3=1

Step 3: Apply Naive Bayes Formula

We calculate:

P(Pass∣High,Poor)P(Pass | High, Poor)P(Pass∣High,Poor)

Using:

P(Pass)×P(High∣Pass)×P(Poor∣Pass)P(Pass) × P(High | Pass) ×

P(Poor | Pass)P(Pass)×P(High∣Pass)×P(Poor∣Pass)
=0.7×(5/7)×(2/7)= 0.7 × (5/7) × (2/7)=0.7×(5/7)×(2/7)
=0.7×0.714×0.285= 0.7 × 0.714 × 0.285=0.7×0.714×0.285
=0.142= 0.142=0.142

Now for Fail:

P(Fail∣High,Poor)P(Fail | High, Poor)P(Fail∣High, Poor) =0.3×0×1=

0.3 × 0 × 1=0.3×0×1 =0= 0=0

Step 4: Compare Probabilities

P(Pass∣High,Poor)=0.142P(Pass | High, Poor) =
0.142P(Pass∣High,Poor)=0.142 P(Fail∣High,Poor)=0P(Fail |
High, Poor) = 0P(Fail∣High,Poor)=0
Since 0.142 > 0

Final Prediction = PASS

Important Observation (Zero Probability Problems)

Notice:

P(High∣Fail)=0P(High | Fail) = 0P(High∣Fail)=0

Because in training data, no failed student had High hours.

This is called the Zero Frequency Problem.

Bayesian Belief Network

A Bayesian Belief Network (BBN)—often just called a Bayesian
Network, Bayes Net, or Belief Network—is a type of probabilistic
graphical model. It is used to represent a set of variables and their
conditional dependencies, making it an incredibly powerful tool for
reasoning under uncertainty.

By mapping out cause-and-effect relationships and assigning

probabilities to them, BBNs allow you to calculate the likelihood of
certain events occurring based on the evidence you currently have.

Here is a breakdown of how they work, their core components, and

where they are used.

Core Components

A BBN consists of two main parts: a graphical structure and a set of

probabilities.

 Directed Acyclic Graph (DAG): This is the visual structure of

the network.
o Nodes: Each node represents a random variable. These
variables can be observable quantities (like a sensor
reading), latent variables (like a hidden disease), or
hypotheses.
o Directed Edges (Arrows): These represent conditional
dependencies. If an arrow points from Node $A$ to Node
$B$, $A$ is considered the "parent" of $B$, meaning the
state of $B$ depends directly on the state of $A$.
o Acyclic: The graph cannot have loops. You cannot start at
a node, follow the arrows, and end up back at the same
node.
 Conditional Probability Tables (CPTs): Every node in the
network has an associated CPT.
o If a node has no parents (an independent variable), the
table simply lists its prior probabilities.
o If a node has parents, the table quantifies the probability of
that node being in a specific state given the states of its
parents.

The Mathematical Foundation

BBNs are built on Bayes' Theorem, which provides a mathematical
rule for updating the probability of a hypothesis as more evidence or
information becomes available.

The true power of a BBN lies in how it simplifies complex probability

distributions. Instead of calculating the joint probability of every
single variable interacting with every other variable, a BBN assumes
that a variable is conditionally independent of all its non-descendants,
given its parents.

This allows the network to calculate the full joint probability

distribution using the chain rule for Bayesian networks:

P(X 1 ,X 2 ,…,X n )= i=1 ∏ n P(X i ∣Parents(X i ))

Where:

 X i represents a specific node (variable).

 Parents(X i ) represents the direct causes of that node.
A Classic Example: The Wet Grass

Imagine you want to know why the grass in your yard is wet. A
simple BBN for this might involve four variables: Cloudy, Rain,
Sprinkler, and Wet Grass.

1. Cloudy is the root node. It has no parents. Its probability is

independent.
2. Cloudy is a parent to both Rain and Sprinkler. (If it's cloudy,
it's more likely to rain, and you are less likely to turn your
sprinkler on).
3. Both Rain and Sprinkler are parents to Wet Grass.

If you step outside and see that the grass is wet (your evidence), you
can use the network to work backward (infer) and calculate the
probability that it rained versus the probability that the sprinkler was
left on. If you then notice it is cloudy, the network will update the
probabilities, making rain the much more likely cause.

Common Applications
Because BBNs are excellent at handling incomplete data and
modeling complex systems, they are heavily utilized across various
fields:

 Medical Diagnosis: Mapping symptoms to potential diseases to

calculate the most likely diagnosis.
 Robotics and AI: Helping autonomous systems make decisions
when sensor data is noisy or incomplete.
 Spam Filtering: Analyzing the occurrence of certain words to
determine the probability that an email is spam.
 Risk Management: Predicting the likelihood of financial
defaults, equipment failures, or project delays.

K K-Nearest neighbour classification-Algorithm and

Characteristics
K-Nearest Neighbors (KNN) is one of the simplest, yet most
intuitive, supervised machine learning algorithms. It is primarily used
for classification (though it can also be used for regression) and
operates on a very straightforward principle: similar things exist in
close proximity.

It works by finding the "k" closest data points (neighbours) to a

given input and makes a predictions based on the majority class (for
classification) or the average value (for regression). Since KNN
makes no assumptions about the underlying data distribution it
makes it a non-parametric and instance-based learning method.
K-Nearest Neighbors is also called as a lazy learner
algorithm because it does not learn from the training set immediately
instead it stores the entire dataset and performs computations only at
the time of classification.
For example, consider two features i.e Category 1 and Category 2:
 KNN assigns the category based on the majority of nearby points.
The image shows how KNN predicts the category of a new data
point based on its closest neighbours.
 The green points represent Category 1 and the red points represent
Category 2.
 The new data point checks its closest neighbors (circled points).
 Since the majority of its closest neighbors are red points (Category
2) KNN predicts the new data point belongs to Category 2.
KNN Algorithm working visualization

KNN works by using proximity and majority voting to make

predictions.
What is 'K' in K nearest Neighbour?
In the k-Nearest Neighbours algorithm k is just a number that tells the
algorithm how many nearby points or neighbors to look at when it
makes a decision.
Example: Imagine you're deciding which fruit it is based on its shape
and size. You compare it to fruits you already know.
 If k = 3, the algorithm looks at the 3 closest fruits to the new one.
 If 2 of those 3 fruits are apples and 1 is a banana, the algorithm
says the new fruit is an apple because most of its neighbors are
apples.

How to choose the value of k for KNN Algorithm?

 The value of k in KNN decides how many neighbours the

algorithm looks at when making a prediction.
 Choosing the right k is important for good results.
 If the data has lots of noise or outliers, using a larger k can make
the predictions more stable.
 But if k is too large the model may become too simple and miss
important patterns and this is called under fitting.
 So k should be picked carefully based on the data.

Statistical Methods for Selecting k

 Cross-Validation: Cross-Validation is a good way to find the best
value of k is by using k-fold cross-validation. This means dividing
the dataset into k parts. The model is trained on some of these parts
and tested on the remaining ones. This process is repeated for each
part. The k value that gives the highest average accuracy during
these tests is usually the best one to use.
 Elbow Method: In Elbow Method we draw a graph showing the
error rate or accuracy for different k values. As k increases the
error usually drops at first. But after a certain point error stops
decreasing quickly. The point where the curve changes direction
and looks like an "elbow" is usually the best choice for k.
 Odd Values for k: It’s a good idea to use an odd number for k
especially in classification problems. This helps avoid ties when
deciding which class is the most common among the neighbours.
Distance Metrics Used in KNN Algorithm

KNN uses distance metrics to identify nearest neighbor; these

neighbors are used for classification and regression task. To identify
nearest neighbor we use below distance metrics:

1. Euclidean Distance

Euclidean distance is defined as the straight-line distance between two

points in a plane or space. You can think of it like the shortest path
you would walk if you were to go directly from one point to another.

𝐝𝐢𝐬𝐭𝐚𝐧𝐜𝐞(𝒙, 𝑿𝒊) = ∑𝒅𝒋 𝟏(𝒙𝒋 − 𝑿𝒊𝒋)𝟐 ]

2. Manhattan Distance
This is the total distance you would travel if you could only move
along horizontal and vertical lines like a grid or city streets. It’s also
called "taxicab distance" because a taxi can only drive along the grid-
like streets of a city.
𝒅(𝒙, 𝒚) = ∑𝒏𝒊 𝟏 ∣ 𝒙𝒊 − 𝒚𝒊 ∣
3. Minkowski Distance
Minkowski distance is like a family of distances, which includes both
Euclidean and Manhattan distances as special cases.

𝐝(𝐱, 𝐲) = (∑𝐧𝐢 𝟎(𝐱 𝐢 − 𝐲 𝐢 ) 𝐩 )𝟏/𝐩

From the formula above, when p=2, it becomes the same as the
Euclidean distance formula and when p=1, it turns into the Manhattan
distance formula. Minkowski distance is essentially a flexible formula
that can represent either Euclidean or Manhattan distance depending
on the value of p.
Working of KNN algorithm
Thе K-Nearest Neighbors (KNN) algorithm operates on the principle of similarity
where it predicts the label or value of a new data point by considering the labels or
values of its K nearest neighbors in the training dataset.
Step 1: Selecting the optimal value of K

 K represents the number of nearest neighbour that needs to be

considered while making prediction.

Step 2: Calculating distance

 To measure the similarity between target and training data point’s

Euclidean distance is widely used. Distance is calculated between
data points in the dataset and target point.

Step 3: Finding Nearest Neighbors

 The k data points with the smallest distances to the target point are
nearest neighbours.

Step 4: Voting for Classification or Taking Average for

Regression

 When you want to classify a data point into a category like spam or
not spam, the KNN algorithm looks at the K closest points in the
dataset. These closest points are called neighbours. The algorithm
then looks at which category the neighbour belong to and picks the
one that appears the most. This is called majority voting.
 In regression, the algorithm still looks for the K closest points. But
instead of voting for a class in classification, it takes the average of
the values of those K neighbours. This average is the predicted
value for the new point for the algorithm.
It shows how a test point is classified based on its nearest neighbors.
As the test point moves the algorithm identifies the closest 'k' data
points i.e. 5 in this case and assigns test point the majority class label
that is grey label class here.
Applications of KNN
 Recommendation Systems: Suggests items like movies or
products by finding users with similar preferences.
 Spam Detection: Identifies spam emails by comparing new emails
to known spam and non-spam examples.
 Customer Segmentation: Groups customers by comparing their
shopping behavior to others.
 Speech Recognition: Matches spoken words to known patterns to
convert them into text.
Advantages of KNN
 Simple to use: Easy to understand and implement.
 No training step: No need to train as it just stores the data and
uses it during prediction.
 Few parameters: Only needs to set the number of neighbour (k)
and a distance method.
 Versatile: Works for both classification and regression problems.
Disadvantages of KNN
 Slow with large data: Needs to compare every point during
prediction.
 Struggles with many features: Accuracy drops when data has too
many features.
 Can Over fit: It can over fit especially when the data is high-
dimensional or not clean.

Prediction: Accuracy and Error measures

Prediction accuracy and error measures in data mining, such as
RMSE, MAE, and classification accuracy, evaluate model
performance by quantifying the difference between predicted and
actual values. Common techniques like
-fold cross-validation and confusion matrices (precision, recall, F1-
score) are used to assess robustness, with RMSE often used for
numerical prediction and accuracy for classification.

Key Prediction Accuracy & Error Measures

 Root Mean Square Error (RMSE): The most frequently used metric
to assess numerical prediction errors; it effectively reflects large
errors by squaring them.
 Mean Absolute Error (MAE): Measures the average magnitude of
errors, treating all errors equally.
Coefficient of Determination (): Indicates the fraction of variance
explained by the model, capturing the model's skill.
 Accuracy (Classification): Defined as the percentage of correct
predictions out of all, calculated as


 Precision & Recall: Precision measures the quality of positive
predictions, while recall measures the proportion of actual positive
cases detected.
 F1-Score: The harmonic mean of precision and recall, balancing
both.
 Log Loss: Measures the uncertainty of probabilities, penalizing low-
probability, correct predictions.
Model Evaluation Techniques
 Holdout Method: Randomly splits data into training and test sets to
calculate error.
 K-Fold Cross-Validation: Splits data into subsets, training times to
ensure unbiased estimation.
 Bootstrapping: Repeatedly samples the dataset with replacement for
estimation.
 Confusion Matrix: A matrix () used to visualize the performance of
classification algorithms by comparing predicted and actual classes.
Evaluation Criteria for Models
 Accuracy: The ability of a model to correctly predict class labels or
numeric values.
 Speed: Computational cost to build and use the model.
 Robustness: The capability to make correct predictions on new,
unseen data.
 Scalability: Performance change relative to the amount of data.
 Interpretability: The ability to understand the reasoning behind the
model's predictions.
Evaluating the accuracy of a classifier or a predictor
Data Mining can be referred to as knowledge mining from data,
knowledge extraction, data/pattern analysis, data archaeology, and
data dredging. In this article, we will see techniques to evaluate the
accuracy of classifiers.

Hold Out

In the holdout method, the largest dataset is randomly divided into

three subsets:
 A training set is a subset of the dataset which are been used to
build predictive models.
 The validation set is a subset of the dataset which is been used to
assess the performance of the model built in the training phase. It
provides a test platform for fine-tuning of the model's parameters
and selecting the best-performing model. It is not necessary for all
modelling algorithms to need a validation set.
 Test sets or unseen examples are the subset of the dataset to
assess the likely future performance of the model. If a model is
fitting into the training set much better than it fits into the test set,
then overfitting is probably the cause that occurred here.
Basically, two-thirds of the data are been allocated to the training set
and the remaining one-third is been allocated to the test set.
Random Sub sampling random sub sampling is a variation of the
holdout method. The holdout method is been repeated K times. The
holdout sub sampling involves randomly splitting the data into a
training set and a test set. On the training set the data is been trained
and the mean square error (MSE) is been obtained from the
predictions on the test set. As MSE is dependent on the split, this
method is not recommended. So a new split can give you a new MSE.
The overall accuracy is been calculated as E = 1/K \sum_{k}^{i=1}
E_{i}
Cross-Validation

 K-fold cross-validation is been used when there is only a limited

amount of data available, to achieve an unbiased estimation of the
performance of the model.
 Here, we divide the data into K subsets of equal sizes.
 We build models K times, each time leaving out one of the subsets
from the training, and use it as the test set.
 If K equals the sample size, then this is called a "Leave-One-Out"

Bootstrapping

 Bootstrapping is one of the techniques which is used to make the

estimations from the data by taking an average of the estimates
from smaller data samples.
 The bootstrapping method involves the iterative re sampling of a
dataset with replacement.
 On re sampling instead of only estimating the statistics once on
complete data, we can do it many times.
 Repeating this multiple times helps to obtain a vector of estimates.
 Bootstrapping can compute variance, expected value, and other
relevant statistics of these estimates.
Ensemble methods
Ensemble methods are used in data mining due to their ability to enhance
the predictive performance of machine learning models. A single model
may either over fit the training data or underperform on unseen instances.
Ensembles solve these problems by aggregating models and balancing
their errors.

Understanding Classification in DWDM
No ratings yet
Understanding Classification in DWDM
23 pages
Classification vs. Prediction in Machine Learning
No ratings yet
Classification vs. Prediction in Machine Learning
36 pages
Machine Learning Classification Explained
No ratings yet
Machine Learning Classification Explained
77 pages
Module 4
No ratings yet
Module 4
25 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
123 pages
Data Mining: Classification Techniques
No ratings yet
Data Mining: Classification Techniques
21 pages
Classification & Prediction - Models
No ratings yet
Classification & Prediction - Models
23 pages
Classification vs. Prediction Explained
No ratings yet
Classification vs. Prediction Explained
38 pages
Data Mining Classification Techniques
No ratings yet
Data Mining Classification Techniques
19 pages
Classification Algorithms in Machine Learning
No ratings yet
Classification Algorithms in Machine Learning
27 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Data Mining Classification Overview
No ratings yet
Data Mining Classification Overview
37 pages
Classification Models in Machine Learning
No ratings yet
Classification Models in Machine Learning
7 pages
Classification and Prediction Techniques
No ratings yet
Classification and Prediction Techniques
17 pages
Module 4 - Notes
No ratings yet
Module 4 - Notes
47 pages
Data Mining: Classification & Prediction
No ratings yet
Data Mining: Classification & Prediction
25 pages
Classification and Prediction Techniques
No ratings yet
Classification and Prediction Techniques
12 pages
Supervised Learning Overview and Applications
No ratings yet
Supervised Learning Overview and Applications
53 pages
Machine Learning Classification Overview
No ratings yet
Machine Learning Classification Overview
28 pages
Classification and Prediction Techniques
No ratings yet
Classification and Prediction Techniques
26 pages
Understanding Classification in Data Mining
No ratings yet
Understanding Classification in Data Mining
7 pages
Understanding Data Classification Techniques
No ratings yet
Understanding Data Classification Techniques
8 pages
Data Mining: Classification & Prediction Techniques
No ratings yet
Data Mining: Classification & Prediction Techniques
18 pages
Python Machine Learning: Classification Guide
No ratings yet
Python Machine Learning: Classification Guide
19 pages
Machine Learning (ML) Fundamentals: Unit - 4
No ratings yet
Machine Learning (ML) Fundamentals: Unit - 4
9 pages
Classification vs Prediction in Data Analytics
No ratings yet
Classification vs Prediction in Data Analytics
55 pages
Classification Technique of Supervised Learning
No ratings yet
Classification Technique of Supervised Learning
13 pages
Classification
No ratings yet
Classification
19 pages
Key Machine Learning Metrics Explained
No ratings yet
Key Machine Learning Metrics Explained
39 pages
Module - IV FDMDW
No ratings yet
Module - IV FDMDW
24 pages
Classification and Regression in ML
No ratings yet
Classification and Regression in ML
201 pages
Data Science Lec 10
No ratings yet
Data Science Lec 10
13 pages
Supervised vs. Unsupervised Learning
No ratings yet
Supervised vs. Unsupervised Learning
87 pages
Classification and Prediction Techniques
No ratings yet
Classification and Prediction Techniques
19 pages
Classification vs. Prediction in Data Mining
No ratings yet
Classification vs. Prediction in Data Mining
24 pages
Experiment No.7 AIDS
No ratings yet
Experiment No.7 AIDS
4 pages
Classification and Prediction Overview
No ratings yet
Classification and Prediction Overview
3 pages
Data Classification Techniques Overview
No ratings yet
Data Classification Techniques Overview
102 pages
Supervised Learning: Classification & Regression
No ratings yet
Supervised Learning: Classification & Regression
187 pages
Data Mining: Classification & Prediction Techniques
No ratings yet
Data Mining: Classification & Prediction Techniques
21 pages
Best Split Measures in Classification
No ratings yet
Best Split Measures in Classification
52 pages
Understanding Classification Algorithms in Machine Learning
No ratings yet
Understanding Classification Algorithms in Machine Learning
25 pages
Data Mining Classification Techniques
No ratings yet
Data Mining Classification Techniques
25 pages
Lecture 3. Basic Concept of Supervised Learning and Rule Induction
No ratings yet
Lecture 3. Basic Concept of Supervised Learning and Rule Induction
48 pages
Classification and Prediction Techniques
No ratings yet
Classification and Prediction Techniques
36 pages
Supervised Classification Techniques Explained
No ratings yet
Supervised Classification Techniques Explained
31 pages
Supervised Machine Learning Overview
No ratings yet
Supervised Machine Learning Overview
42 pages
7 Common Classification Algorithms
No ratings yet
7 Common Classification Algorithms
9 pages
Classification Techniques in Data Mining
No ratings yet
Classification Techniques in Data Mining
13 pages
Data Mining
No ratings yet
Data Mining
11 pages
Classification and Prediction Techniques
No ratings yet
Classification and Prediction Techniques
103 pages
Classification and Prediction Techniques Overview
No ratings yet
Classification and Prediction Techniques Overview
4 pages
Introduction to Classification in Machine Learning
No ratings yet
Introduction to Classification in Machine Learning
14 pages
Classification and Prediction Overview
No ratings yet
Classification and Prediction Overview
83 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
42 pages
Understanding Classification in ML
No ratings yet
Understanding Classification in ML
36 pages
Supervised Learning: Class & Regression Methods
No ratings yet
Supervised Learning: Class & Regression Methods
16 pages
B2C Marketing Budget Optimization Model
No ratings yet
B2C Marketing Budget Optimization Model
23 pages
Machine Learning Syllabus for B.Tech AI
No ratings yet
Machine Learning Syllabus for B.Tech AI
2 pages
Python and Data Science Quiz Answers
No ratings yet
Python and Data Science Quiz Answers
15 pages
AI in Clinical Decision Support Systems
No ratings yet
AI in Clinical Decision Support Systems
9 pages
End-to-End Multimodal Sentiment Analysis
No ratings yet
End-to-End Multimodal Sentiment Analysis
6 pages
AI Models for Predicting Student Success
No ratings yet
AI Models for Predicting Student Success
19 pages
Strategies for Imbalanced Data in ML
No ratings yet
Strategies for Imbalanced Data in ML
51 pages
Ensemble Learning for Credit Card Fraud Detection
No ratings yet
Ensemble Learning for Credit Card Fraud Detection
62 pages
Springer Format123 - Merged
No ratings yet
Springer Format123 - Merged
11 pages
Phishing Detection Using AI Techniques
No ratings yet
Phishing Detection Using AI Techniques
15 pages
Machine Learning for Intrusion Detection
No ratings yet
Machine Learning for Intrusion Detection
8 pages
Comprehensive Machine Learning Overview
No ratings yet
Comprehensive Machine Learning Overview
20 pages
Machine Learning for Insurance Fraud Detection
No ratings yet
Machine Learning for Insurance Fraud Detection
9 pages
Integrating Two-Tier Optimization Algorithm With Convolutional Bi-LSTM Model For Robust Anomaly Detection in Autonomous Vehicles
No ratings yet
Integrating Two-Tier Optimization Algorithm With Convolutional Bi-LSTM Model For Robust Anomaly Detection in Autonomous Vehicles
14 pages
Model Selection and Evaluation Guide
No ratings yet
Model Selection and Evaluation Guide
40 pages
Decision Trees and Algorithms Explained
No ratings yet
Decision Trees and Algorithms Explained
40 pages
Detecting Zero-Day Web Attacks With An Ensemble of LSTM, GRU, and Stacked Autoencoders
No ratings yet
Detecting Zero-Day Web Attacks With An Ensemble of LSTM, GRU, and Stacked Autoencoders
29 pages
Bengali News Genre Classification Project
No ratings yet
Bengali News Genre Classification Project
16 pages
XAI Techniques for Anomaly Detection
No ratings yet
XAI Techniques for Anomaly Detection
26 pages
GCN Models for Drug Activity Prediction
No ratings yet
GCN Models for Drug Activity Prediction
14 pages
HITRUST Inheritance in Ensemble Methods
No ratings yet
HITRUST Inheritance in Ensemble Methods
17 pages
Age and Gender Detection System Overview
No ratings yet
Age and Gender Detection System Overview
43 pages
Accurate Prediction of Heart Disease Using Machine Learning-A Case Study On The Cleveland Dataset - IJISRT24JUL1400
No ratings yet
Accurate Prediction of Heart Disease Using Machine Learning-A Case Study On The Cleveland Dataset - IJISRT24JUL1400
8 pages
Weak vs Strong Learners in Ensemble Methods
No ratings yet
Weak vs Strong Learners in Ensemble Methods
24 pages
Quantum Dwarf Mongoose for CPS Intrusion Detection
No ratings yet
Quantum Dwarf Mongoose for CPS Intrusion Detection
10 pages
Predicting Student Sleep Quality with CNNs
No ratings yet
Predicting Student Sleep Quality with CNNs
11 pages
Water Quality Prediction with ML Models
No ratings yet
Water Quality Prediction with ML Models
27 pages
Ensemble Learning for Landslide Detection
No ratings yet
Ensemble Learning for Landslide Detection
8 pages
1 s2.0 S1877050923016964 Main
No ratings yet
1 s2.0 S1877050923016964 Main
10 pages
Iot Platform
No ratings yet
Iot Platform
22 pages