0% found this document useful (0 votes)

5 views53 pages

New Unit 4 AIML Supervised Learning

The document covers key concepts in supervised learning, focusing on Linear Regression, Logistic Regression, and Decision Trees. It explains how these algorithms work, their applications, advantages, and disadvantages, as well as their mathematical foundations and optimization techniques. Additionally, it introduces Neural Networks and Multilayer Perceptron (MLP) as important models in machine learning.

Uploaded by

kiruthika1991

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views53 pages

New Unit 4 AIML Supervised Learning

Uploaded by

kiruthika1991

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

AIML unit 4

Supervised Learning – Linear Regression

Introduction

Machine Learning is a branch of Artificial Intelligence that enables computers to

learn from data without being explicitly programmed. One of the most important
categories of machine learning is Supervised Learning.

In supervised learning, the model is trained using labeled data, where both input
and output values are known. The algorithm learns the relationship between them
and predicts outputs for new inputs.

Among all supervised learning algorithms, Linear Regression is one of the

simplest and most widely used methods for prediction and analysis.

1. Supervised Learning

Supervised learning uses training data consisting of:

● Input variables (features)

● Output variables (labels)

The algorithm learns a mapping function from input to output.

Diagram of Supervised Learning

Training Data
+-----------------------+
| Input(X) Output(Y) |
+-----------------------+
|
v
Supervised Learning
Model
|
v
Predicted Output

Example
Hours Studied Marks
2 35
4 50
6 70

The system learns how marks depend on study hours.

2. What is Linear Regression?

Linear Regression is a supervised learning algorithm used to predict continuous

numerical values.

It finds the best fit straight line between input and output variables.

Examples:

● Predicting house prices

● Predicting salary
● Predicting temperature
● Predicting sales revenue

3. Linear Regression Equation

The mathematical equation of linear regression is:

Where:

● = Predicted output
● = Input feature
● = Slope of line
● = Intercept

4. Best Fit Line

The main goal of linear regression is to find the line that best represents the data.

Diagram of Best Fit Line

Price(Y)
^
|
| *
| *
| *
| *
| *
| *
| *
|__________________________________> Size(X)

-------- Best Fit Line --------

Explanation

● Dots represent actual data points.

● Straight line represents predicted relationship.
● The line minimizes overall prediction error.

5. Working of Linear Regression

The working process includes:

1. Collect training data

2. Analyze relationship between variables
3. Fit a straight line
4. Predict future values
6. Types of Linear Regression

(a) Simple Linear Regression

Uses only one independent variable.

Example

● Experience → Salary

Equation:

Where:

● = intercept
● = coefficient

(b) Multiple Linear Regression

Uses multiple independent variables.

Example

● House size
● Number of rooms
● Location

to predict house price.

Equation:

7. Cost Function in Linear Regression

The algorithm tries to minimize prediction error using a Cost Function.

The most commonly used cost function is Mean Squared Error (MSE).

Where:

● = cost/error
● = predicted value
● = actual value

8. Error Representation

The difference between actual value and predicted value is called error.

Error Diagram
Actual Point
*
|\
|\
Error | \
| \
| \
| \
*------\----------------
Regression Line

Smaller error means better prediction.

9. Gradient Descent

Gradient Descent is an optimization technique used to minimize error.

It updates the slope and intercept repeatedly until the minimum error is obtained.

10. Gradient Descent Flowchart

11. Assumptions of Linear Regression

Linear regression works properly when the following assumptions are satisfied:

1. Linear relationship exists

2. Data points are independent
3. Errors are normally distributed
4. Constant variance exists
5. No extreme outliers

12. Advantages of Linear Regression

● Simple and easy to implement

● Fast training process
● Easy interpretation
● Works well for linear data
● Useful for trend analysis

13. Disadvantages of Linear Regression

● Cannot handle complex nonlinear data

● Sensitive to outliers
● Assumes linear relationship
● Accuracy decreases with noisy data

14. Applications of Linear Regression

Linear Regression is widely used in real-world applications.

Applications

1. House price prediction

2. Weather forecasting
3. Stock market prediction
4. Sales forecasting
5. Risk analysis
6. Salary estimation

15. Real-Time Example

Suppose a company wants to predict employee salary based on years of

experience.

Experience Salary
1 30000
2 40000
3 50000
4 60000

The regression equation becomes:

Salary = 10000 × Experience + 20000

If experience = 5 years:

Salary = 10000(5) + 20000

= 70000

Predicted Salary = ₹70,000

16. Linear Regression Architecture

Overall Process Diagram

Input Data
|
v
Linear Regression
Model
|
v
Best Fit Equation
|
v
Predicted Output

17. Conclusion

Linear Regression is one of the most important supervised learning algorithms

used for predicting continuous numerical values. It establishes a linear relationship
between input and output variables using a best fit line. The algorithm minimizes
prediction error using cost functions and optimization techniques like gradient
descent.

Due to its simplicity, efficiency, and interpretability, linear regression is widely

used in machine learning, statistics, economics, and business analytics.

—---------------------------------------------------------

Logistic Regression

Introduction

Logistic Regression is a supervised learning classification algorithm used to

predict categorical outcomes. Unlike Linear Regression, which predicts continuous
values, Logistic Regression predicts the probability of a class.

It is mainly used for:

● Binary classification
● Probability estimation
● Decision making

Examples:

● Email spam detection

● Disease prediction
● Pass or fail prediction
● Fraud detection

1. Supervised Learning in Logistic Regression

In supervised learning, the model is trained using labeled data.

Structure

Input Data → Logistic Regression Model → Predicted Class

Example

Study Hours Result

2 Fail

4 Fail

6 Pass

8 Pass

The algorithm learns the relationship between study hours and exam result.

2. What is Logistic Regression?

Logistic Regression predicts the probability that an input belongs to a particular
category.

Output values are generally:

● 0 → Negative class
● 1 → Positive class

Example:

● 0 = No disease
● 1 = Disease

3. Sigmoid Function

Logistic Regression uses the Sigmoid Function to convert linear output into
probability.

Where:

● = probability output
● = exponential constant
● = linear equation

4. Linear Equation in Logistic Regression

The linear equation is:

This value is passed into the sigmoid function.

5. Sigmoid Curve Diagram

Sigmoid Function Curve

Probability

1.0 | ********

| ****
0.8 | ***

| ***

0.6 | ***

| ***

0.5 |----------***

| ***

0.3 | ***

| ***

0.0 |****____________________________

Input Value (z)

Explanation

● Output ranges between 0 and 1

● Used for probability prediction
● S-shaped curve

6. Working of Logistic Regression

The steps involved are:

1. Collect training data

2. Apply linear equation
3. Pass result through sigmoid function
4. Generate probability
5. Classify output

7. Classification Process

Diagram
Input Features

Linear Equation

Sigmoid Function

Probability Output

Class Prediction

8. Decision Boundary

A threshold value is used for classification.

Usually:

● Probability ≥ 0.5 → Class 1

● Probability < 0.5 → Class 0

Decision Boundary Diagram

Class 1

^
|

| ********

0.5|-------------|--------------

| |

+---------------------------->

Threshold

(0.5)

Class 0

9. Cost Function

Logistic Regression uses Log Loss or Cross Entropy Loss.

Where:

● = actual output
● = predicted probability

10. Gradient Descent

Gradient Descent is used to minimize the cost function.

Steps

1. Initialize parameters
2. Compute predictions
3. Calculate error
4. Update weights
5. Repeat until minimum error

11. Flowchart of Logistic Regression

Start

Input Training Data

Apply Linear Equation

Apply Sigmoid Function

Calculate Error

Update Weights

Prediction Output

Stop

12. Types of Logistic Regression

(a) Binary Logistic Regression

Used for two classes.

Example:

● Yes / No
● Pass / Fail
(b) Multinomial Logistic Regression

Used for more than two categories.

Example:

● Predicting grades:
○ A, B, C

(c) Ordinal Logistic Regression

Used for ordered categories.

Example:

● Low, Medium, High

13. Advantages of Logistic Regression

● Simple and easy to implement

● Efficient for classification
● Provides probability output
● Fast training
● Works well with linearly separable data

14. Disadvantages of Logistic Regression

● Cannot model complex nonlinear relationships

● Sensitive to outliers
● Requires large datasets for better accuracy
● Assumes linear relationship between variables

15. Applications of Logistic Regression

Real-World Applications

1. Spam email detection

2. Medical diagnosis
3. Credit card fraud detection
4. Customer churn prediction
5. Sentiment analysis
6. Admission prediction

16. Real-Time Example

Suppose a student’s exam result depends on study hours.

Hours Studied Result

1 Fail

2 Fail

5 Pass

7 Pass

If the predicted probability is:

P(Pass) = 0.85

Since probability > 0.5,

Prediction = Pass

17. Comparison: Linear vs Logistic Regression

Feature Linear Regression Logistic Regression

Output Continuous values Categorical values

Purpose Prediction Classification

Graph Straight line Sigmoid curve

Output Range Any value 0 to 1

18. Conclusion

Logistic Regression is an important supervised learning algorithm mainly used for

classification problems. It predicts probabilities using the sigmoid function and
classifies data into categories using a decision boundary.

Because of its simplicity, efficiency, and interpretability, Logistic Regression is

widely used in machine learning, healthcare, finance, marketing, and data
analytics.

—---------------------------------------------------------

Decision Trees: Classification and Regression Trees (CART)

Introduction

Decision Trees are one of the most popular supervised learning algorithms used for
both:

● Classification problems
● Regression problems

A Decision Tree works like a flowchart where:

● Each internal node represents a test condition

● Each branch represents the outcome
● Each leaf node represents the final prediction

The most common form is called:

CART – Classification and Regression Trees

1. What is a Decision Tree?

A Decision Tree divides data into smaller subsets based on conditions.

It creates a tree-like structure for decision making.

2. Components of Decision Tree

(a) Root Node

● Starting point of the tree

● Represents the complete dataset

(b) Decision Node

● Splits data into branches

(c) Leaf Node

● Final output or prediction

(d) Branch

● Connection between nodes

3. Working of Decision Tree

The algorithm:

1. Selects best feature

2. Splits dataset
3. Repeats splitting recursively
4. Stops when condition is satisfied

4. Classification Tree

A Classification Tree is used when output is categorical.

Examples

● Yes / No
● Spam / Not Spam
● Pass / Fail

5. Classification Tree Diagram

Example: Student Pass Prediction

Explanation
● If study hours > 5 → Pass
● Otherwise → Fail

6. Regression Tree

Regression Trees are used when output is continuous numerical values.

Examples

● House price prediction

● Temperature prediction
● Salary estimation

7. Regression Tree Diagram

Example: House Price Prediction

8. CART (Classification and Regression Trees)

CART is a machine learning algorithm that supports:

● Classification
● Regression

It uses:

● Gini Index for classification

● Mean Squared Error (MSE) for regression

9. Gini Index in Classification

The Gini Index measures impurity in the dataset.

Where:

● = probability of class

Properties

● Gini = 0 → Pure node

● Higher Gini → More impurity

10. Entropy in Decision Trees

Entropy measures randomness in data.

Interpretation

● Lower entropy → Better split

● Higher entropy → More disorder

11. Information Gain

Information Gain measures reduction in entropy after splitting.

The feature with highest information gain is selected for splitting.

12. Decision Tree Building Process

13. Tree Splitting Example

Weather Prediction
Weather?
/ | \
Sunny Rainy Cloudy
| | |
Play No Play Play

14. Advantages of Decision Trees

1. Simple and easy to understand

2. Easy visualization
3. Handles both numerical and categorical data
4. Requires less data preprocessing
5. Supports classification and regression

15. Disadvantages of Decision Trees

1. Overfitting problem
2. Sensitive to noisy data
3. Large trees become complex
4. Lower accuracy compared to ensemble methods

16. Overfitting in Decision Trees

Overfitting occurs when the model learns training data too perfectly.

Overfitting Diagram
Training Accuracy → High
Testing Accuracy → Low

Solution

● Tree pruning
● Limiting depth
● Minimum samples split

17. Pruning in Decision Trees

Pruning removes unnecessary branches.

Pruning Diagram
Before Pruning After Pruning

A A
/\ /\
B C B C
/\
D E

Pruning improves generalization.

18. Applications of Decision Trees

Real-World Applications

1. Medical diagnosis
2. Fraud detection
3. Loan approval
4. Customer segmentation
5. Weather prediction
6. Recommendation systems

19. Comparison: Classification vs Regression Trees

Feature Classification Tree Regression Tree
Output Categorical Continuous
Example Pass/Fail Price Prediction
Criteria Gini / Entropy MSE
Goal Class prediction Numerical prediction
20. Real-Time Example

Loan Approval System

21. Conclusion

Decision Trees are powerful supervised learning algorithms used for both
classification and regression tasks. CART (Classification and Regression Trees)
builds a tree-like structure using recursive splitting techniques.

Because of their simplicity, interpretability, and ability to handle different types of

data, Decision Trees are widely used in machine learning, business analytics,
healthcare, and finance.

—----------------------------------------------------------
Neural Networks and Multilayer Perceptron (MLP)

Introduction

Artificial Neural Networks (ANNs) are computational models inspired by the

structure and working of the human brain. Neural networks are widely used in
machine learning and artificial intelligence for solving complex problems such as:

● Image recognition
● Speech recognition
● Pattern classification
● Prediction systems

A Multilayer Perceptron (MLP) is one of the most important types of neural

networks.

1. What is a Neural Network?

A Neural Network consists of interconnected processing units called neurons.

Each neuron:

● Receives input
● Processes data
● Produces output

2. Biological Neuron vs Artificial Neuron

Biological Neuron
Dendrites → Cell Body → Axon

Artificial Neuron
Inputs → Weights → Summation → Activation Function → Output

3. Structure of Artificial Neural Network

A neural network contains:

1. Input Layer
2. Hidden Layer(s)
3. Output Layer

4. Neural Network Architecture

Basic ANN Diagram

Input Layer Hidden Layer Output Layer

x1 --------\
\
x2 ----------( O ) --------\
/ \
x3 ---------/ ( O ) ---- Output

Explanation

● Inputs are fed into the network

● Hidden layers process information
● Output layer gives prediction

5. Artificial Neuron Model

The neuron computes weighted sum of inputs.

Where:

● = input
● = weight
● = bias
● = activation function
● = output

6. Working of Neural Network

Steps
1. Receive inputs
2. Multiply inputs with weights
3. Add bias
4. Apply activation function
5. Generate output

7. Activation Functions

Activation functions introduce nonlinearity.

Common Activation Functions

1. Sigmoid
2. ReLU
3. Tanh
4. Softmax

8. Sigmoid Activation Function

The sigmoid function converts output into values between 0 and 1.

9. Sigmoid Function Diagram

Output
1.0 | *****
| ***
0.8 | ***
| ***
0.5 |-------------***
| ***
0.2 | ***
| ***
0.0 |***______________________
Input

10. ReLU Activation Function

ReLU is widely used in deep learning.

Advantages

● Faster computation
● Reduces vanishing gradient problem

11. Multilayer Perceptron (MLP)

A Multilayer Perceptron is a feedforward neural network containing:

● One input layer

● One or more hidden layers
● One output layer

MLP uses supervised learning.

12. MLP Architecture

Multilayer Perceptron Diagram

Input Layer Hidden Layer 1 Hidden Layer 2 Output

x1 --------\
( O ) ----\
x2 --------/ \
( O ) ----\
x3 --------\ / \
( O ) ----/ (O)
x4 --------/

13. Feedforward Process

In MLP, data moves only in one direction:

● Input → Hidden → Output

This is called Feedforward Neural Network.

14. Backpropagation
Backpropagation is used for training MLP.

It adjusts weights to reduce prediction error.

15. Backpropagation Flow Diagram

16. Error Function

The network minimizes prediction error using loss functions.

Mean Squared Error

Where:

● = actual value
● = predicted value

17. Learning Process in MLP

Training Steps

1. Initialize weights
2. Perform forward propagation
3. Calculate error
4. Apply backpropagation
5. Update weights
6. Repeat until minimum error

18. Advantages of Neural Networks

1. Learns complex patterns

2. High prediction accuracy
3. Handles nonlinear problems
4. Supports parallel processing
5. Adaptive learning capability

19. Disadvantages of Neural Networks

1. Requires large datasets

2. High computational cost
3. Training takes more time
4. Difficult to interpret
5. Risk of overfitting

20. Applications of Neural Networks

Real-World Applications

1. Image recognition
2. Speech recognition
3. Medical diagnosis
4. Fraud detection
5. Self-driving cars
6. Natural language processing

21. Difference Between Single Layer and Multilayer Perceptron

Feature Single Layer Multilayer Perceptron
Perceptron
Hidden Layer No Yes
Complexity Simple Complex
Learning Ability Linear problems Nonlinear problems
Accuracy Lower Higher
22. Real-Time Example

Handwritten Digit Recognition

Input Image
|
Neural Network
|
Hidden Layers
|
Predicted Digit

The network learns patterns of handwritten digits and predicts the correct number.

23. Conclusion

Neural Networks are powerful machine learning models inspired by the human
brain. A Multilayer Perceptron (MLP) is a feedforward neural network with
multiple hidden layers capable of solving complex nonlinear problems.

Using activation functions, forward propagation, and backpropagation, MLPs can

learn patterns from data effectively. Neural networks are widely used in artificial
intelligence, healthcare, robotics, finance, and computer vision applications.

—----------------------------------------------------------

Support Vector Machines (SVM): Linear and Nonlinear Kernel Functions

Introduction

Support Vector Machine (SVM) is a powerful supervised learning algorithm used

for:
● Classification
● Regression
● Pattern recognition

SVM is mainly used for solving classification problems by finding the best
separating boundary between classes.

Examples:

● Face recognition
● Text classification
● Spam detection
● Image classification

1. What is Support Vector Machine?

Support Vector Machine finds the optimal hyperplane that separates data into
different classes.

The goal is to maximize the margin between classes.

2. Basic Concept of SVM

Hyperplane

A hyperplane is a decision boundary separating different classes.

For 2D data:

● Hyperplane is a straight line

For 3D data:

● Hyperplane is a plane

3. SVM Classification Diagram

Class A o o o
\
\
---------------\---------------- Hyperplane
\
\
Class B x x x

Explanation

● Circles and crosses are two classes

● SVM finds the best boundary between them

4. Support Vectors

Support vectors are the data points closest to the hyperplane.

They are very important because they determine the position of the hyperplane.

5. Margin in SVM

Margin is the distance between:

● Hyperplane
● Closest data points

SVM tries to maximize this margin.

Margin Diagram
o o o Margin x x x
|<------------------->|
---------------- Hyperplane ----------------

Larger margin → Better generalization

6. Linear SVM

Linear SVM is used when data is linearly separable.

Linear Separation Diagram

o o o

----------------------- Straight Line

x x x

The data can be separated using a straight line.

7. Linear SVM Equation

The hyperplane equation is:

Where:

● = weight vector
● = input vector
● = bias

8. Nonlinear SVM

Sometimes data cannot be separated using a straight line.

In such cases, SVM uses Kernel Functions.

9. Nonlinear Data Diagram

ooo
o o

xx
x x

Straight line cannot separate the classes properly.

10. Kernel Functions

Kernel functions transform low-dimensional data into higher-dimensional space

where separation becomes easier.
This technique is called the Kernel Trick.

11. Kernel Trick Diagram

Nonlinear Data
|
v
Kernel Transformation
|
v
Higher Dimensional Space
|
v
Linear Separation

12. Types of Kernel Functions

(a) Linear Kernel

Used for linearly separable data.

Equation:

Features

● Simple
● Fast
● Suitable for text classification

13. Polynomial Kernel

Used for curved decision boundaries.

Equation:

Where:

● = constant
● = polynomial degree
14. Polynomial Kernel Diagram
ooo

----------- Curved Boundary -----------

xxx

15. Radial Basis Function (RBF) Kernel

Most commonly used nonlinear kernel.

Equation:

Where:

● controls influence of data points

Advantages

● Handles complex data

● High accuracy

16. Sigmoid Kernel

Inspired by neural networks.

Equation:

Where:

● = slope parameter
● = constant

17. Working of SVM

Steps

1. Input training data

2. Select kernel function
3. Find support vectors
4. Construct optimal hyperplane
5. Predict new data classes

18. SVM Training Flowchart

19. Advantages of SVM

1. High accuracy
2. Effective in high-dimensional data
3. Works for linear and nonlinear problems
4. Memory efficient
5. Good generalization capability

20. Disadvantages of SVM

1. Slow for large datasets

2. Complex parameter tuning
3. Difficult interpretation
4. Sensitive to noise

21. Applications of SVM

Real-World Applications

1. Face detection
2. Image classification
3. Handwriting recognition
4. Bioinformatics
5. Spam filtering
6. Medical diagnosis

22. Comparison: Linear vs Nonlinear SVM

Feature Linear SVM Nonlinear SVM
Data Type Linearly separable Nonlinear data
Boundary Straight line Curved boundary
Complexity Simple Complex
Kernel Required No Yes
23. Real-Time Example

Email Spam Detection

Input Email
|
Feature Extraction
|
Support Vector Machine
|
Spam / Not Spam

SVM classifies emails based on learned patterns.

24. Conclusion

Support Vector Machines are powerful supervised learning algorithms used for
both linear and nonlinear classification problems. SVM works by finding the
optimal hyperplane that maximizes the margin between classes.
For nonlinear data, kernel functions such as Polynomial, RBF, and Sigmoid
kernels help transform data into higher-dimensional space for effective separation.
Due to their high accuracy and strong performance, SVMs are widely used in
machine learning, computer vision, healthcare, and text classification systems.

—---------------------------------------------------------

K-Nearest Neighbours (KNN)

Introduction

K-Nearest Neighbours (KNN) is one of the simplest and most widely used
supervised learning algorithms. It is mainly used for:

● Classification
● Regression
● Pattern recognition

KNN predicts the output based on the nearest neighboring data points.

Examples:

● Recommendation systems
● Handwriting recognition
● Medical diagnosis
● Image classification

1. What is K-Nearest Neighbours?

KNN is a supervised learning algorithm that classifies new data points based on
similarity with existing data.

The algorithm stores all training data and predicts the class of new data using
nearest neighbors.

2. Basic Idea of KNN

The new data point is assigned to the class most common among its nearest
neighbors.

● represents the number of nearest neighbors.

● Distance is used to identify similarity.

3. KNN Working Principle

Steps

1. Choose value of
2. Calculate distance from new point to all training points
3. Select K nearest neighbors
4. Find majority class
5. Assign class to new data point

4. KNN Classification Diagram

Class A (o) Class B (x)

o o

● ← New Point

x x

If most nearby points are from Class B, the new point is classified as Class B.

5. Choosing the Value of K

● Small K → Sensitive to noise

● Large K → More stable but slower

Common values:

● 3
● 5
● 7
6. Example of KNN

Suppose:

● K=3
● Among nearest neighbors:
○ 2 belong to Class A
○ 1 belongs to Class B

Then:

● New point → Class A

7. Distance Metrics in KNN

KNN uses distance measures to identify nearest neighbors.

8. Euclidean Distance

Most commonly used distance metric.

Where:

● , = data points

9. Euclidean Distance Diagram

A(x1,y1) *
\
\
\ Distance
\
*
B(x2,y2)

10. Manhattan Distance

Measures distance along grid paths.

11. Minkowski Distance

Generalized distance metric.

12. KNN Algorithm Flowchart

Used when output is categorical.

Examples

● Spam / Not Spam

● Pass / Fail
● Disease / No Disease

14. KNN for Regression

KNN can also predict continuous values.

In regression:

● Average value of neighbors is calculated.

Example
● Predicting house prices

15. Decision Boundary in KNN

KNN creates nonlinear decision boundaries.

Diagram
ooooooo
ooooooo
------ Curved Boundary ------
xxxxxxx
xxxxxxx

16. Advantages of KNN

1. Simple and easy to understand

2. No training phase required
3. Handles multiclass problems
4. Works well with small datasets
5. Effective for nonlinear data

17. Disadvantages of KNN

1. Slow for large datasets

2. High memory usage
3. Sensitive to noise
4. Requires feature scaling
5. Choosing optimal K is difficult

18. Feature Scaling in KNN

Since KNN uses distance calculations, features should be normalized.

Common scaling methods:

● Min-Max Scaling
● Standardization
19. Real-Time Example

Fruit Classification
Weight + Color
|
v
KNN Algorithm
|
v
Apple / Orange

The algorithm compares features with nearby fruits and predicts the class.

20. Applications of KNN

Real-World Applications

1. Recommendation systems
2. Face recognition
3. Medical diagnosis
4. Pattern recognition
5. Image classification
6. Credit scoring

21. Comparison: KNN Classification vs Regression

Feature Classification Regression
Output Category Continuous value
Prediction Majority vote Average value
Example Spam detection Price prediction
22. Practical Example

Suppose K = 5:
Neighbor Class
1 A
2 A
3 B
4 A
5 B

Majority class = A

Therefore:

● New point → Class A

23. Conclusion

K-Nearest Neighbours (KNN) is a simple yet powerful supervised learning

algorithm used for classification and regression tasks. It predicts outputs by
analyzing the nearest neighboring data points using distance metrics such as
Euclidean distance.

Because of its simplicity and effectiveness, KNN is widely used in

recommendation systems, healthcare, image recognition, and pattern classification
applications.

—------------------------------------------------------------

Ensemble Learning: Bagging and Boosting

Introduction

Ensemble Learning is a machine learning technique in which multiple models are

combined to improve prediction accuracy and performance.
Instead of using a single model, ensemble learning combines several weak learners
to create a strong learner.

Ensemble methods are widely used in:

● Classification
● Regression
● Fraud detection
● Recommendation systems
● Medical diagnosis

The two major ensemble techniques are:

1. Bagging
2. Boosting

1. What is Ensemble Learning?

Ensemble learning combines predictions from multiple machine learning models to

produce better results.

Basic Idea
Model 1 \
Model 2 \
Model 3 ---> Combined Prediction
Model 4 /
Model 5 /

Advantages

● Higher accuracy
● Better generalization
● Reduced overfitting
● Improved robustness

2. Types of Ensemble Learning

Main ensemble techniques:

1. Bagging
2. Boosting
3. Stacking

This topic focuses on:

● Bagging
● Boosting

3. Bagging (Bootstrap Aggregating)

Bagging improves performance by training multiple models independently on

random subsets of data.

The final output is obtained by:

● Majority voting (classification)

● Averaging (regression)

4. Working of Bagging

Steps

1. Create multiple random datasets using bootstrap sampling

2. Train separate models
3. Combine outputs
4. Produce final prediction

5. Bagging Architecture Diagram

6. Bootstrap Sampling

Bootstrap sampling means selecting random samples from the dataset with
replacement.

Some data points may appear multiple times.

7. Majority Voting in Bagging

For classification:

● Final class = Most common prediction

Example
Model Prediction
Model 1 A
Model 2 A
Model 3 B

Final Output = A

8. Random Forest – Example of Bagging

Random Forest is the most popular bagging algorithm.

It combines multiple decision trees and performs:

● Voting for classification

● Averaging for regression

9. Advantages of Bagging

1. Reduces variance
2. Prevents overfitting
3. Improves accuracy
4. Parallel training possible
5. Stable predictions

10. Disadvantages of Bagging

1. Increased computational cost

2. Complex model interpretation
3. Requires more memory

11. Boosting

Boosting is an ensemble technique where models are trained sequentially.

Each new model focuses on correcting errors made by previous models.

12. Working of Boosting

Steps

1. Train first weak learner

2. Identify errors
3. Increase weight of misclassified data
4. Train next learner
5. Combine all learners

13. Boosting Architecture Diagram

14. Key Idea of Boosting

Boosting converts weak learners into strong learners by focusing more on difficult
samples.

15. AdaBoost (Adaptive Boosting)

AdaBoost is a popular boosting algorithm.

It assigns higher importance to incorrectly classified points.

16. AdaBoost Workflow Diagram

17. Gradient Boosting

Gradient Boosting minimizes prediction errors using gradient descent techniques.

It builds trees sequentially.

18. XGBoost

XGBoost stands for:

● Extreme Gradient Boosting

Features:

● Fast
● Efficient
● Highly accurate

Widely used in:

● Competitions
● Data science projects
● Industry applications

19. Advantages of Boosting

1. High prediction accuracy

2. Handles complex datasets
3. Reduces bias
4. Effective for classification and regression

20. Disadvantages of Boosting

1. Training is slower
2. Sensitive to noise
3. Risk of overfitting
4. Difficult parameter tuning

21. Difference Between Bagging and Boosting

Feature Bagging Boosting
Training Parallel Sequential
Goal Reduce variance Reduce bias
Data Weight Equal Weighted
Overfitting Less Higher risk
Example Random Forest AdaBoost, XGBoost
22. Bagging vs Boosting Diagram
Bagging:
Model 1
Model 2 ---> Parallel Learning ---> Combined Output
Model 3

Boosting:
Model 1 ---> Model 2 ---> Model 3 ---> Final Output
Sequential Learning

23. Applications of Ensemble Learning

Real-World Applications

1. Fraud detection
2. Medical diagnosis
3. Recommendation systems
4. Image recognition
5. Customer churn prediction
6. Stock market analysis

24. Real-Time Example

Loan Approval Prediction

Applicant Data
|
Ensemble Learning
(Bagging / Boosting)
|
Loan Approved / Rejected

Multiple models improve decision accuracy.

25. Conclusion
Ensemble Learning is a powerful machine learning approach that combines
multiple models to improve performance and prediction accuracy.

● Bagging reduces variance by training models independently.

● Boosting reduces bias by training models sequentially and correcting
previous mistakes.

Popular ensemble methods such as Random Forest, AdaBoost, and XGBoost are
widely used in modern machine learning applications because of their high
accuracy and robustness.

—------------------------------------------------------------

Understanding Regression Analysis Techniques
No ratings yet
Understanding Regression Analysis Techniques
77 pages
Regression and SVM in Machine Learning
No ratings yet
Regression and SVM in Machine Learning
9 pages
Computational Statistics Exam Questions
No ratings yet
Computational Statistics Exam Questions
17 pages
Linear and Logistic Regression Explained
No ratings yet
Linear and Logistic Regression Explained
81 pages
Linear and Logistic Regression Explained
No ratings yet
Linear and Logistic Regression Explained
23 pages
Understanding Logistic Regression
No ratings yet
Understanding Logistic Regression
7 pages
Supervised Learning & Linear Regression Guide
No ratings yet
Supervised Learning & Linear Regression Guide
37 pages
Linear vs Logistic Regression Explained
No ratings yet
Linear vs Logistic Regression Explained
57 pages
UNIT 2-Part 2
No ratings yet
UNIT 2-Part 2
46 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
30 pages
Linear and Logistic Regression Boopathi Kumar 30 Slides
No ratings yet
Linear and Logistic Regression Boopathi Kumar 30 Slides
30 pages
MNIST Dataset Classifier Ensemble
No ratings yet
MNIST Dataset Classifier Ensemble
110 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
21 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
24 pages
ML Unit2
No ratings yet
ML Unit2
82 pages
Classification vs. Regression Algorithms
No ratings yet
Classification vs. Regression Algorithms
19 pages
Understanding Regression in Machine Learning
No ratings yet
Understanding Regression in Machine Learning
7 pages
Unit 2 (MLT)
No ratings yet
Unit 2 (MLT)
34 pages
Understanding the Sigmoid Function
No ratings yet
Understanding the Sigmoid Function
26 pages
Understanding Regression in Machine Learning
No ratings yet
Understanding Regression in Machine Learning
49 pages
Linear and Logistic Regression Guide
No ratings yet
Linear and Logistic Regression Guide
16 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
30 pages
Regression Techniques in Machine Learning
No ratings yet
Regression Techniques in Machine Learning
19 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
79 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
54 pages
Linear & Logistic Regression Guide
No ratings yet
Linear & Logistic Regression Guide
34 pages
Machine Learning: Regression & SVM Techniques
No ratings yet
Machine Learning: Regression & SVM Techniques
43 pages
Supervised Learning Overview
No ratings yet
Supervised Learning Overview
42 pages
ML QP Ans U2
No ratings yet
ML QP Ans U2
46 pages
Linear Separability in Regression Models
No ratings yet
Linear Separability in Regression Models
32 pages
Statistical Decision Theory & Linear Regression
No ratings yet
Statistical Decision Theory & Linear Regression
16 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
47 pages
Machine Learning Classification Techniques
No ratings yet
Machine Learning Classification Techniques
111 pages
Understanding Linear Models in ML
No ratings yet
Understanding Linear Models in ML
60 pages
Python Linear Regression Pseudocode
No ratings yet
Python Linear Regression Pseudocode
23 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
20 pages
Understanding Logistic Regression Basics
No ratings yet
Understanding Logistic Regression Basics
8 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
19 pages
Machine Learning: Regression Techniques
No ratings yet
Machine Learning: Regression Techniques
199 pages
Understanding Logistic Regression Basics
No ratings yet
Understanding Logistic Regression Basics
74 pages
Understanding Regression Algorithms
No ratings yet
Understanding Regression Algorithms
23 pages
Understanding Logistic Regression Basics
No ratings yet
Understanding Logistic Regression Basics
26 pages
Understanding Logistic Regression Basics
No ratings yet
Understanding Logistic Regression Basics
25 pages
Machine Learning Regression SLR 1
No ratings yet
Machine Learning Regression SLR 1
36 pages
Regression and Ensemble Modeling Overview
No ratings yet
Regression and Ensemble Modeling Overview
11 pages
Machine Learning Module 2 Notes
No ratings yet
Machine Learning Module 2 Notes
12 pages
Machine Learning: Linear Models Overview
No ratings yet
Machine Learning: Linear Models Overview
25 pages
Understanding Regression Analysis
No ratings yet
Understanding Regression Analysis
11 pages
Multivariable Linear Regression Guide
No ratings yet
Multivariable Linear Regression Guide
116 pages
Supervised Learning: Linear Regression Guide
No ratings yet
Supervised Learning: Linear Regression Guide
147 pages
UPSupervised Learning Overview
No ratings yet
UPSupervised Learning Overview
38 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
13 pages
Detecting Outliers in Machine Learning
No ratings yet
Detecting Outliers in Machine Learning
21 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
9 pages
Unit 3
No ratings yet
Unit 3
103 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Quiz - UNIT 1
No ratings yet
Quiz - UNIT 1
5 pages
Artificial Intelligence and Machine Learning
100% (2)
Artificial Intelligence and Machine Learning
179 pages
Maths Book
No ratings yet
Maths Book
42 pages
Data - Structure Unit - 1
No ratings yet
Data - Structure Unit - 1
56 pages