AIML unit 4
Supervised Learning – Linear Regression
Introduction
Machine Learning is a branch of Artificial Intelligence that enables computers to
learn from data without being explicitly programmed. One of the most important
categories of machine learning is Supervised Learning.
In supervised learning, the model is trained using labeled data, where both input
and output values are known. The algorithm learns the relationship between them
and predicts outputs for new inputs.
Among all supervised learning algorithms, Linear Regression is one of the
simplest and most widely used methods for prediction and analysis.
1. Supervised Learning
Supervised learning uses training data consisting of:
● Input variables (features)
● Output variables (labels)
The algorithm learns a mapping function from input to output.
Diagram of Supervised Learning
Training Data
+-----------------------+
| Input(X) Output(Y) |
+-----------------------+
|
v
Supervised Learning
Model
|
v
Predicted Output
Example
Hours Studied Marks
2 35
4 50
6 70
The system learns how marks depend on study hours.
2. What is Linear Regression?
Linear Regression is a supervised learning algorithm used to predict continuous
numerical values.
It finds the best fit straight line between input and output variables.
Examples:
● Predicting house prices
● Predicting salary
● Predicting temperature
● Predicting sales revenue
3. Linear Regression Equation
The mathematical equation of linear regression is:
Where:
● = Predicted output
● = Input feature
● = Slope of line
● = Intercept
4. Best Fit Line
The main goal of linear regression is to find the line that best represents the data.
Diagram of Best Fit Line
Price(Y)
^
|
| *
| *
| *
| *
| *
| *
| *
|__________________________________> Size(X)
-------- Best Fit Line --------
Explanation
● Dots represent actual data points.
● Straight line represents predicted relationship.
● The line minimizes overall prediction error.
5. Working of Linear Regression
The working process includes:
1. Collect training data
2. Analyze relationship between variables
3. Fit a straight line
4. Predict future values
6. Types of Linear Regression
(a) Simple Linear Regression
Uses only one independent variable.
Example
● Experience → Salary
Equation:
Where:
● = intercept
● = coefficient
(b) Multiple Linear Regression
Uses multiple independent variables.
Example
● House size
● Number of rooms
● Location
to predict house price.
Equation:
7. Cost Function in Linear Regression
The algorithm tries to minimize prediction error using a Cost Function.
The most commonly used cost function is Mean Squared Error (MSE).
Where:
● = cost/error
● = predicted value
● = actual value
8. Error Representation
The difference between actual value and predicted value is called error.
Error Diagram
Actual Point
*
|\
|\
Error | \
| \
| \
| \
*------\----------------
Regression Line
Smaller error means better prediction.
9. Gradient Descent
Gradient Descent is an optimization technique used to minimize error.
It updates the slope and intercept repeatedly until the minimum error is obtained.
10. Gradient Descent Flowchart
Start
|
Initialize m and b
|
Predict Output
|
Calculate Error
|
Update Parameters
|
Is Error Minimum?
| \
No Yes
| \
Repeat Stop
11. Assumptions of Linear Regression
Linear regression works properly when the following assumptions are satisfied:
1. Linear relationship exists
2. Data points are independent
3. Errors are normally distributed
4. Constant variance exists
5. No extreme outliers
12. Advantages of Linear Regression
● Simple and easy to implement
● Fast training process
● Easy interpretation
● Works well for linear data
● Useful for trend analysis
13. Disadvantages of Linear Regression
● Cannot handle complex nonlinear data
● Sensitive to outliers
● Assumes linear relationship
● Accuracy decreases with noisy data
14. Applications of Linear Regression
Linear Regression is widely used in real-world applications.
Applications
1. House price prediction
2. Weather forecasting
3. Stock market prediction
4. Sales forecasting
5. Risk analysis
6. Salary estimation
15. Real-Time Example
Suppose a company wants to predict employee salary based on years of
experience.
Experience Salary
1 30000
2 40000
3 50000
4 60000
The regression equation becomes:
Salary = 10000 × Experience + 20000
If experience = 5 years:
Salary = 10000(5) + 20000
= 70000
Predicted Salary = ₹70,000
16. Linear Regression Architecture
Overall Process Diagram
Input Data
|
v
Linear Regression
Model
|
v
Best Fit Equation
|
v
Predicted Output
17. Conclusion
Linear Regression is one of the most important supervised learning algorithms
used for predicting continuous numerical values. It establishes a linear relationship
between input and output variables using a best fit line. The algorithm minimizes
prediction error using cost functions and optimization techniques like gradient
descent.
Due to its simplicity, efficiency, and interpretability, linear regression is widely
used in machine learning, statistics, economics, and business analytics.
—---------------------------------------------------------
Logistic Regression
Introduction
Logistic Regression is a supervised learning classification algorithm used to
predict categorical outcomes. Unlike Linear Regression, which predicts continuous
values, Logistic Regression predicts the probability of a class.
It is mainly used for:
● Binary classification
● Probability estimation
● Decision making
Examples:
● Email spam detection
● Disease prediction
● Pass or fail prediction
● Fraud detection
1. Supervised Learning in Logistic Regression
In supervised learning, the model is trained using labeled data.
Structure
Input Data → Logistic Regression Model → Predicted Class
Example
Study Hours Result
2 Fail
4 Fail
6 Pass
8 Pass
The algorithm learns the relationship between study hours and exam result.
2. What is Logistic Regression?
Logistic Regression predicts the probability that an input belongs to a particular
category.
Output values are generally:
● 0 → Negative class
● 1 → Positive class
Example:
● 0 = No disease
● 1 = Disease
3. Sigmoid Function
Logistic Regression uses the Sigmoid Function to convert linear output into
probability.
Where:
● = probability output
● = exponential constant
● = linear equation
4. Linear Equation in Logistic Regression
The linear equation is:
This value is passed into the sigmoid function.
5. Sigmoid Curve Diagram
Sigmoid Function Curve
Probability
1.0 | ********
| ****
0.8 | ***
| ***
0.6 | ***
| ***
0.5 |----------***
| ***
0.3 | ***
| ***
0.0 |****____________________________
Input Value (z)
Explanation
● Output ranges between 0 and 1
● Used for probability prediction
● S-shaped curve
6. Working of Logistic Regression
The steps involved are:
1. Collect training data
2. Apply linear equation
3. Pass result through sigmoid function
4. Generate probability
5. Classify output
7. Classification Process
Diagram
Input Features
Linear Equation
Sigmoid Function
Probability Output
Class Prediction
8. Decision Boundary
A threshold value is used for classification.
Usually:
● Probability ≥ 0.5 → Class 1
● Probability < 0.5 → Class 0
Decision Boundary Diagram
Class 1
^
|
| ********
0.5|-------------|--------------
| |
| |
+---------------------------->
Threshold
(0.5)
Class 0
9. Cost Function
Logistic Regression uses Log Loss or Cross Entropy Loss.
Where:
● = actual output
● = predicted probability
10. Gradient Descent
Gradient Descent is used to minimize the cost function.
Steps
1. Initialize parameters
2. Compute predictions
3. Calculate error
4. Update weights
5. Repeat until minimum error
11. Flowchart of Logistic Regression
Start
Input Training Data
Apply Linear Equation
Apply Sigmoid Function
Calculate Error
Update Weights
Prediction Output
Stop
12. Types of Logistic Regression
(a) Binary Logistic Regression
Used for two classes.
Example:
● Yes / No
● Pass / Fail
(b) Multinomial Logistic Regression
Used for more than two categories.
Example:
● Predicting grades:
○ A, B, C
(c) Ordinal Logistic Regression
Used for ordered categories.
Example:
● Low, Medium, High
13. Advantages of Logistic Regression
● Simple and easy to implement
● Efficient for classification
● Provides probability output
● Fast training
● Works well with linearly separable data
14. Disadvantages of Logistic Regression
● Cannot model complex nonlinear relationships
● Sensitive to outliers
● Requires large datasets for better accuracy
● Assumes linear relationship between variables
15. Applications of Logistic Regression
Real-World Applications
1. Spam email detection
2. Medical diagnosis
3. Credit card fraud detection
4. Customer churn prediction
5. Sentiment analysis
6. Admission prediction
16. Real-Time Example
Suppose a student’s exam result depends on study hours.
Hours Studied Result
1 Fail
2 Fail
5 Pass
7 Pass
If the predicted probability is:
P(Pass) = 0.85
Since probability > 0.5,
Prediction = Pass
17. Comparison: Linear vs Logistic Regression
Feature Linear Regression Logistic Regression
Output Continuous values Categorical values
Purpose Prediction Classification
Graph Straight line Sigmoid curve
Output Range Any value 0 to 1
18. Conclusion
Logistic Regression is an important supervised learning algorithm mainly used for
classification problems. It predicts probabilities using the sigmoid function and
classifies data into categories using a decision boundary.
Because of its simplicity, efficiency, and interpretability, Logistic Regression is
widely used in machine learning, healthcare, finance, marketing, and data
analytics.
—---------------------------------------------------------
Decision Trees: Classification and Regression Trees (CART)
Introduction
Decision Trees are one of the most popular supervised learning algorithms used for
both:
● Classification problems
● Regression problems
A Decision Tree works like a flowchart where:
● Each internal node represents a test condition
● Each branch represents the outcome
● Each leaf node represents the final prediction
The most common form is called:
CART – Classification and Regression Trees
1. What is a Decision Tree?
A Decision Tree divides data into smaller subsets based on conditions.
It creates a tree-like structure for decision making.
Basic Structure
Root Node
|
-----------------------
| |
Condition 1 Condition 2
| |
--------- ----------
| | | |
Leaf Leaf Leaf Leaf
2. Components of Decision Tree
(a) Root Node
● Starting point of the tree
● Represents the complete dataset
(b) Decision Node
● Splits data into branches
(c) Leaf Node
● Final output or prediction
(d) Branch
● Connection between nodes
3. Working of Decision Tree
The algorithm:
1. Selects best feature
2. Splits dataset
3. Repeats splitting recursively
4. Stops when condition is satisfied
4. Classification Tree
A Classification Tree is used when output is categorical.
Examples
● Yes / No
● Spam / Not Spam
● Pass / Fail
5. Classification Tree Diagram
Example: Student Pass Prediction
Study Hours?
|
--------------------
| |
> 5 Hours <= 5 Hours
| |
Pass Fail
Explanation
● If study hours > 5 → Pass
● Otherwise → Fail
6. Regression Tree
Regression Trees are used when output is continuous numerical values.
Examples
● House price prediction
● Temperature prediction
● Salary estimation
7. Regression Tree Diagram
Example: House Price Prediction
House Size?
|
--------------------
| |
> 1500 [Link] <=1500 [Link]
| |
Price = 20L Price = 10L
8. CART (Classification and Regression Trees)
CART is a machine learning algorithm that supports:
● Classification
● Regression
It uses:
● Gini Index for classification
● Mean Squared Error (MSE) for regression
9. Gini Index in Classification
The Gini Index measures impurity in the dataset.
Where:
● = probability of class
Properties
● Gini = 0 → Pure node
● Higher Gini → More impurity
10. Entropy in Decision Trees
Entropy measures randomness in data.
Interpretation
● Lower entropy → Better split
● Higher entropy → More disorder
11. Information Gain
Information Gain measures reduction in entropy after splitting.
The feature with highest information gain is selected for splitting.
12. Decision Tree Building Process
Flowchart
Start
|
Input Dataset
|
Select Best Feature
|
Split Dataset
|
Create Branches
|
Repeat Recursively
|
Leaf Node Reached
|
Stop
13. Tree Splitting Example
Weather Prediction
Weather?
/ | \
Sunny Rainy Cloudy
| | |
Play No Play Play
14. Advantages of Decision Trees
1. Simple and easy to understand
2. Easy visualization
3. Handles both numerical and categorical data
4. Requires less data preprocessing
5. Supports classification and regression
15. Disadvantages of Decision Trees
1. Overfitting problem
2. Sensitive to noisy data
3. Large trees become complex
4. Lower accuracy compared to ensemble methods
16. Overfitting in Decision Trees
Overfitting occurs when the model learns training data too perfectly.
Overfitting Diagram
Training Accuracy → High
Testing Accuracy → Low
Solution
● Tree pruning
● Limiting depth
● Minimum samples split
17. Pruning in Decision Trees
Pruning removes unnecessary branches.
Pruning Diagram
Before Pruning After Pruning
A A
/\ /\
B C B C
/\
D E
Pruning improves generalization.
18. Applications of Decision Trees
Real-World Applications
1. Medical diagnosis
2. Fraud detection
3. Loan approval
4. Customer segmentation
5. Weather prediction
6. Recommendation systems
19. Comparison: Classification vs Regression Trees
Feature Classification Tree Regression Tree
Output Categorical Continuous
Example Pass/Fail Price Prediction
Criteria Gini / Entropy MSE
Goal Class prediction Numerical prediction
20. Real-Time Example
Loan Approval System
Income?
|
----------------
| |
High Low
| |
Credit Score? Reject
|
------------
| |
Good Bad
| |
Approve Reject
21. Conclusion
Decision Trees are powerful supervised learning algorithms used for both
classification and regression tasks. CART (Classification and Regression Trees)
builds a tree-like structure using recursive splitting techniques.
Because of their simplicity, interpretability, and ability to handle different types of
data, Decision Trees are widely used in machine learning, business analytics,
healthcare, and finance.
—----------------------------------------------------------
Neural Networks and Multilayer Perceptron (MLP)
Introduction
Artificial Neural Networks (ANNs) are computational models inspired by the
structure and working of the human brain. Neural networks are widely used in
machine learning and artificial intelligence for solving complex problems such as:
● Image recognition
● Speech recognition
● Pattern classification
● Prediction systems
A Multilayer Perceptron (MLP) is one of the most important types of neural
networks.
1. What is a Neural Network?
A Neural Network consists of interconnected processing units called neurons.
Each neuron:
● Receives input
● Processes data
● Produces output
2. Biological Neuron vs Artificial Neuron
Biological Neuron
Dendrites → Cell Body → Axon
Artificial Neuron
Inputs → Weights → Summation → Activation Function → Output
3. Structure of Artificial Neural Network
A neural network contains:
1. Input Layer
2. Hidden Layer(s)
3. Output Layer
4. Neural Network Architecture
Basic ANN Diagram
Input Layer Hidden Layer Output Layer
x1 --------\
\
x2 ----------( O ) --------\
/ \
x3 ---------/ ( O ) ---- Output
Explanation
● Inputs are fed into the network
● Hidden layers process information
● Output layer gives prediction
5. Artificial Neuron Model
The neuron computes weighted sum of inputs.
Where:
● = input
● = weight
● = bias
● = activation function
● = output
6. Working of Neural Network
Steps
1. Receive inputs
2. Multiply inputs with weights
3. Add bias
4. Apply activation function
5. Generate output
7. Activation Functions
Activation functions introduce nonlinearity.
Common Activation Functions
1. Sigmoid
2. ReLU
3. Tanh
4. Softmax
8. Sigmoid Activation Function
The sigmoid function converts output into values between 0 and 1.
9. Sigmoid Function Diagram
Output
1.0 | *****
| ***
0.8 | ***
| ***
0.5 |-------------***
| ***
0.2 | ***
| ***
0.0 |***______________________
Input
10. ReLU Activation Function
ReLU is widely used in deep learning.
Advantages
● Faster computation
● Reduces vanishing gradient problem
11. Multilayer Perceptron (MLP)
A Multilayer Perceptron is a feedforward neural network containing:
● One input layer
● One or more hidden layers
● One output layer
MLP uses supervised learning.
12. MLP Architecture
Multilayer Perceptron Diagram
Input Layer Hidden Layer 1 Hidden Layer 2 Output
x1 --------\
( O ) ----\
x2 --------/ \
( O ) ----\
x3 --------\ / \
( O ) ----/ (O)
x4 --------/
13. Feedforward Process
In MLP, data moves only in one direction:
● Input → Hidden → Output
This is called Feedforward Neural Network.
14. Backpropagation
Backpropagation is used for training MLP.
It adjusts weights to reduce prediction error.
15. Backpropagation Flow Diagram
Input Data
|
Forward Propagation
|
Prediction
|
Calculate Error
|
Backward Propagation
|
Update Weights
|
Repeat
16. Error Function
The network minimizes prediction error using loss functions.
Mean Squared Error
Where:
● = actual value
● = predicted value
17. Learning Process in MLP
Training Steps
1. Initialize weights
2. Perform forward propagation
3. Calculate error
4. Apply backpropagation
5. Update weights
6. Repeat until minimum error
18. Advantages of Neural Networks
1. Learns complex patterns
2. High prediction accuracy
3. Handles nonlinear problems
4. Supports parallel processing
5. Adaptive learning capability
19. Disadvantages of Neural Networks
1. Requires large datasets
2. High computational cost
3. Training takes more time
4. Difficult to interpret
5. Risk of overfitting
20. Applications of Neural Networks
Real-World Applications
1. Image recognition
2. Speech recognition
3. Medical diagnosis
4. Fraud detection
5. Self-driving cars
6. Natural language processing
21. Difference Between Single Layer and Multilayer Perceptron
Feature Single Layer Multilayer Perceptron
Perceptron
Hidden Layer No Yes
Complexity Simple Complex
Learning Ability Linear problems Nonlinear problems
Accuracy Lower Higher
22. Real-Time Example
Handwritten Digit Recognition
Input Image
|
Neural Network
|
Hidden Layers
|
Predicted Digit
The network learns patterns of handwritten digits and predicts the correct number.
23. Conclusion
Neural Networks are powerful machine learning models inspired by the human
brain. A Multilayer Perceptron (MLP) is a feedforward neural network with
multiple hidden layers capable of solving complex nonlinear problems.
Using activation functions, forward propagation, and backpropagation, MLPs can
learn patterns from data effectively. Neural networks are widely used in artificial
intelligence, healthcare, robotics, finance, and computer vision applications.
—----------------------------------------------------------
Support Vector Machines (SVM): Linear and Nonlinear Kernel Functions
Introduction
Support Vector Machine (SVM) is a powerful supervised learning algorithm used
for:
● Classification
● Regression
● Pattern recognition
SVM is mainly used for solving classification problems by finding the best
separating boundary between classes.
Examples:
● Face recognition
● Text classification
● Spam detection
● Image classification
1. What is Support Vector Machine?
Support Vector Machine finds the optimal hyperplane that separates data into
different classes.
The goal is to maximize the margin between classes.
2. Basic Concept of SVM
Hyperplane
A hyperplane is a decision boundary separating different classes.
For 2D data:
● Hyperplane is a straight line
For 3D data:
● Hyperplane is a plane
3. SVM Classification Diagram
Class A o o o
\
\
---------------\---------------- Hyperplane
\
\
Class B x x x
Explanation
● Circles and crosses are two classes
● SVM finds the best boundary between them
4. Support Vectors
Support vectors are the data points closest to the hyperplane.
They are very important because they determine the position of the hyperplane.
5. Margin in SVM
Margin is the distance between:
● Hyperplane
● Closest data points
SVM tries to maximize this margin.
Margin Diagram
o o o Margin x x x
|<------------------->|
---------------- Hyperplane ----------------
Larger margin → Better generalization
6. Linear SVM
Linear SVM is used when data is linearly separable.
Linear Separation Diagram
o o o
----------------------- Straight Line
x x x
The data can be separated using a straight line.
7. Linear SVM Equation
The hyperplane equation is:
Where:
● = weight vector
● = input vector
● = bias
8. Nonlinear SVM
Sometimes data cannot be separated using a straight line.
In such cases, SVM uses Kernel Functions.
9. Nonlinear Data Diagram
ooo
o o
xx
x x
Straight line cannot separate the classes properly.
10. Kernel Functions
Kernel functions transform low-dimensional data into higher-dimensional space
where separation becomes easier.
This technique is called the Kernel Trick.
11. Kernel Trick Diagram
Nonlinear Data
|
v
Kernel Transformation
|
v
Higher Dimensional Space
|
v
Linear Separation
12. Types of Kernel Functions
(a) Linear Kernel
Used for linearly separable data.
Equation:
Features
● Simple
● Fast
● Suitable for text classification
13. Polynomial Kernel
Used for curved decision boundaries.
Equation:
Where:
● = constant
● = polynomial degree
14. Polynomial Kernel Diagram
ooo
----------- Curved Boundary -----------
xxx
15. Radial Basis Function (RBF) Kernel
Most commonly used nonlinear kernel.
Equation:
Where:
● controls influence of data points
Advantages
● Handles complex data
● High accuracy
16. Sigmoid Kernel
Inspired by neural networks.
Equation:
Where:
● = slope parameter
● = constant
17. Working of SVM
Steps
1. Input training data
2. Select kernel function
3. Find support vectors
4. Construct optimal hyperplane
5. Predict new data classes
18. SVM Training Flowchart
Start
|
Input Dataset
|
Choose Kernel Function
|
Find Optimal Hyperplane
|
Identify Support Vectors
|
Classify Data
|
Stop
19. Advantages of SVM
1. High accuracy
2. Effective in high-dimensional data
3. Works for linear and nonlinear problems
4. Memory efficient
5. Good generalization capability
20. Disadvantages of SVM
1. Slow for large datasets
2. Complex parameter tuning
3. Difficult interpretation
4. Sensitive to noise
21. Applications of SVM
Real-World Applications
1. Face detection
2. Image classification
3. Handwriting recognition
4. Bioinformatics
5. Spam filtering
6. Medical diagnosis
22. Comparison: Linear vs Nonlinear SVM
Feature Linear SVM Nonlinear SVM
Data Type Linearly separable Nonlinear data
Boundary Straight line Curved boundary
Complexity Simple Complex
Kernel Required No Yes
23. Real-Time Example
Email Spam Detection
Input Email
|
Feature Extraction
|
Support Vector Machine
|
Spam / Not Spam
SVM classifies emails based on learned patterns.
24. Conclusion
Support Vector Machines are powerful supervised learning algorithms used for
both linear and nonlinear classification problems. SVM works by finding the
optimal hyperplane that maximizes the margin between classes.
For nonlinear data, kernel functions such as Polynomial, RBF, and Sigmoid
kernels help transform data into higher-dimensional space for effective separation.
Due to their high accuracy and strong performance, SVMs are widely used in
machine learning, computer vision, healthcare, and text classification systems.
—---------------------------------------------------------
K-Nearest Neighbours (KNN)
Introduction
K-Nearest Neighbours (KNN) is one of the simplest and most widely used
supervised learning algorithms. It is mainly used for:
● Classification
● Regression
● Pattern recognition
KNN predicts the output based on the nearest neighboring data points.
Examples:
● Recommendation systems
● Handwriting recognition
● Medical diagnosis
● Image classification
1. What is K-Nearest Neighbours?
KNN is a supervised learning algorithm that classifies new data points based on
similarity with existing data.
The algorithm stores all training data and predicts the class of new data using
nearest neighbors.
2. Basic Idea of KNN
The new data point is assigned to the class most common among its nearest
neighbors.
● represents the number of nearest neighbors.
● Distance is used to identify similarity.
3. KNN Working Principle
Steps
1. Choose value of
2. Calculate distance from new point to all training points
3. Select K nearest neighbors
4. Find majority class
5. Assign class to new data point
4. KNN Classification Diagram
Class A (o) Class B (x)
o o
● ← New Point
x x
If most nearby points are from Class B, the new point is classified as Class B.
5. Choosing the Value of K
● Small K → Sensitive to noise
● Large K → More stable but slower
Common values:
● 3
● 5
● 7
6. Example of KNN
Suppose:
● K=3
● Among nearest neighbors:
○ 2 belong to Class A
○ 1 belongs to Class B
Then:
● New point → Class A
7. Distance Metrics in KNN
KNN uses distance measures to identify nearest neighbors.
8. Euclidean Distance
Most commonly used distance metric.
Where:
● , = data points
9. Euclidean Distance Diagram
A(x1,y1) *
\
\
\ Distance
\
*
B(x2,y2)
10. Manhattan Distance
Measures distance along grid paths.
11. Minkowski Distance
Generalized distance metric.
12. KNN Algorithm Flowchart
Start
|
Load Training Data
|
Choose Value of K
|
Calculate Distance
|
Find K Nearest Neighbors
|
Majority Voting
|
Predict Class
|
Stop
13. KNN for Classification
Used when output is categorical.
Examples
● Spam / Not Spam
● Pass / Fail
● Disease / No Disease
14. KNN for Regression
KNN can also predict continuous values.
In regression:
● Average value of neighbors is calculated.
Example
● Predicting house prices
15. Decision Boundary in KNN
KNN creates nonlinear decision boundaries.
Diagram
ooooooo
ooooooo
------ Curved Boundary ------
xxxxxxx
xxxxxxx
16. Advantages of KNN
1. Simple and easy to understand
2. No training phase required
3. Handles multiclass problems
4. Works well with small datasets
5. Effective for nonlinear data
17. Disadvantages of KNN
1. Slow for large datasets
2. High memory usage
3. Sensitive to noise
4. Requires feature scaling
5. Choosing optimal K is difficult
18. Feature Scaling in KNN
Since KNN uses distance calculations, features should be normalized.
Common scaling methods:
● Min-Max Scaling
● Standardization
19. Real-Time Example
Fruit Classification
Weight + Color
|
v
KNN Algorithm
|
v
Apple / Orange
The algorithm compares features with nearby fruits and predicts the class.
20. Applications of KNN
Real-World Applications
1. Recommendation systems
2. Face recognition
3. Medical diagnosis
4. Pattern recognition
5. Image classification
6. Credit scoring
21. Comparison: KNN Classification vs Regression
Feature Classification Regression
Output Category Continuous value
Prediction Majority vote Average value
Example Spam detection Price prediction
22. Practical Example
Suppose K = 5:
Neighbor Class
1 A
2 A
3 B
4 A
5 B
Majority class = A
Therefore:
● New point → Class A
23. Conclusion
K-Nearest Neighbours (KNN) is a simple yet powerful supervised learning
algorithm used for classification and regression tasks. It predicts outputs by
analyzing the nearest neighboring data points using distance metrics such as
Euclidean distance.
Because of its simplicity and effectiveness, KNN is widely used in
recommendation systems, healthcare, image recognition, and pattern classification
applications.
—------------------------------------------------------------
Ensemble Learning: Bagging and Boosting
Introduction
Ensemble Learning is a machine learning technique in which multiple models are
combined to improve prediction accuracy and performance.
Instead of using a single model, ensemble learning combines several weak learners
to create a strong learner.
Ensemble methods are widely used in:
● Classification
● Regression
● Fraud detection
● Recommendation systems
● Medical diagnosis
The two major ensemble techniques are:
1. Bagging
2. Boosting
1. What is Ensemble Learning?
Ensemble learning combines predictions from multiple machine learning models to
produce better results.
Basic Idea
Model 1 \
Model 2 \
Model 3 ---> Combined Prediction
Model 4 /
Model 5 /
Advantages
● Higher accuracy
● Better generalization
● Reduced overfitting
● Improved robustness
2. Types of Ensemble Learning
Main ensemble techniques:
1. Bagging
2. Boosting
3. Stacking
This topic focuses on:
● Bagging
● Boosting
3. Bagging (Bootstrap Aggregating)
Bagging improves performance by training multiple models independently on
random subsets of data.
The final output is obtained by:
● Majority voting (classification)
● Averaging (regression)
4. Working of Bagging
Steps
1. Create multiple random datasets using bootstrap sampling
2. Train separate models
3. Combine outputs
4. Produce final prediction
5. Bagging Architecture Diagram
Original Dataset
|
-----------------------------------
| | |
Sample 1 Sample 2 Sample 3
| | |
Model 1 Model 2 Model 3
\ | /
\ | /
Final Prediction
6. Bootstrap Sampling
Bootstrap sampling means selecting random samples from the dataset with
replacement.
Some data points may appear multiple times.
7. Majority Voting in Bagging
For classification:
● Final class = Most common prediction
Example
Model Prediction
Model 1 A
Model 2 A
Model 3 B
Final Output = A
8. Random Forest – Example of Bagging
Random Forest is the most popular bagging algorithm.
It combines multiple decision trees and performs:
● Voting for classification
● Averaging for regression
9. Advantages of Bagging
1. Reduces variance
2. Prevents overfitting
3. Improves accuracy
4. Parallel training possible
5. Stable predictions
10. Disadvantages of Bagging
1. Increased computational cost
2. Complex model interpretation
3. Requires more memory
11. Boosting
Boosting is an ensemble technique where models are trained sequentially.
Each new model focuses on correcting errors made by previous models.
12. Working of Boosting
Steps
1. Train first weak learner
2. Identify errors
3. Increase weight of misclassified data
4. Train next learner
5. Combine all learners
13. Boosting Architecture Diagram
Dataset
|
Weak Learner 1
|
Errors Identified
|
Weak Learner 2
|
Errors Corrected
|
Weak Learner 3
|
Final Strong Learner
14. Key Idea of Boosting
Boosting converts weak learners into strong learners by focusing more on difficult
samples.
15. AdaBoost (Adaptive Boosting)
AdaBoost is a popular boosting algorithm.
It assigns higher importance to incorrectly classified points.
16. AdaBoost Workflow Diagram
Training Data
|
Weak Classifier 1
|
Increase Weight of Errors
|
Weak Classifier 2
|
Increase Weight Again
|
Final Combined Classifier
17. Gradient Boosting
Gradient Boosting minimizes prediction errors using gradient descent techniques.
It builds trees sequentially.
18. XGBoost
XGBoost stands for:
● Extreme Gradient Boosting
Features:
● Fast
● Efficient
● Highly accurate
Widely used in:
● Competitions
● Data science projects
● Industry applications
19. Advantages of Boosting
1. High prediction accuracy
2. Handles complex datasets
3. Reduces bias
4. Effective for classification and regression
20. Disadvantages of Boosting
1. Training is slower
2. Sensitive to noise
3. Risk of overfitting
4. Difficult parameter tuning
21. Difference Between Bagging and Boosting
Feature Bagging Boosting
Training Parallel Sequential
Goal Reduce variance Reduce bias
Data Weight Equal Weighted
Overfitting Less Higher risk
Example Random Forest AdaBoost, XGBoost
22. Bagging vs Boosting Diagram
Bagging:
Model 1
Model 2 ---> Parallel Learning ---> Combined Output
Model 3
Boosting:
Model 1 ---> Model 2 ---> Model 3 ---> Final Output
Sequential Learning
23. Applications of Ensemble Learning
Real-World Applications
1. Fraud detection
2. Medical diagnosis
3. Recommendation systems
4. Image recognition
5. Customer churn prediction
6. Stock market analysis
24. Real-Time Example
Loan Approval Prediction
Applicant Data
|
Ensemble Learning
(Bagging / Boosting)
|
Loan Approved / Rejected
Multiple models improve decision accuracy.
25. Conclusion
Ensemble Learning is a powerful machine learning approach that combines
multiple models to improve performance and prediction accuracy.
● Bagging reduces variance by training models independently.
● Boosting reduces bias by training models sequentially and correcting
previous mistakes.
Popular ensemble methods such as Random Forest, AdaBoost, and XGBoost are
widely used in modern machine learning applications because of their high
accuracy and robustness.
—------------------------------------------------------------