0% found this document useful (0 votes)
5 views53 pages

New Unit 4 AIML Supervised Learning

The document covers key concepts in supervised learning, focusing on Linear Regression, Logistic Regression, and Decision Trees. It explains how these algorithms work, their applications, advantages, and disadvantages, as well as their mathematical foundations and optimization techniques. Additionally, it introduces Neural Networks and Multilayer Perceptron (MLP) as important models in machine learning.

Uploaded by

kiruthika1991
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views53 pages

New Unit 4 AIML Supervised Learning

The document covers key concepts in supervised learning, focusing on Linear Regression, Logistic Regression, and Decision Trees. It explains how these algorithms work, their applications, advantages, and disadvantages, as well as their mathematical foundations and optimization techniques. Additionally, it introduces Neural Networks and Multilayer Perceptron (MLP) as important models in machine learning.

Uploaded by

kiruthika1991
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

AIML unit 4

Supervised Learning – Linear Regression

Introduction

Machine Learning is a branch of Artificial Intelligence that enables computers to


learn from data without being explicitly programmed. One of the most important
categories of machine learning is Supervised Learning.

In supervised learning, the model is trained using labeled data, where both input
and output values are known. The algorithm learns the relationship between them
and predicts outputs for new inputs.

Among all supervised learning algorithms, Linear Regression is one of the


simplest and most widely used methods for prediction and analysis.

1. Supervised Learning

Supervised learning uses training data consisting of:

● Input variables (features)


● Output variables (labels)

The algorithm learns a mapping function from input to output.

Diagram of Supervised Learning


Training Data
+-----------------------+
| Input(X) Output(Y) |
+-----------------------+
|
v
Supervised Learning
Model
|
v
Predicted Output

Example
Hours Studied Marks
2 35
4 50
6 70

The system learns how marks depend on study hours.

2. What is Linear Regression?

Linear Regression is a supervised learning algorithm used to predict continuous


numerical values.

It finds the best fit straight line between input and output variables.

Examples:

● Predicting house prices


● Predicting salary
● Predicting temperature
● Predicting sales revenue

3. Linear Regression Equation

The mathematical equation of linear regression is:

Where:

● = Predicted output
● = Input feature
● = Slope of line
● = Intercept

4. Best Fit Line

The main goal of linear regression is to find the line that best represents the data.

Diagram of Best Fit Line


Price(Y)
^
|
| *
| *
| *
| *
| *
| *
| *
|__________________________________> Size(X)

-------- Best Fit Line --------

Explanation

● Dots represent actual data points.


● Straight line represents predicted relationship.
● The line minimizes overall prediction error.

5. Working of Linear Regression

The working process includes:

1. Collect training data


2. Analyze relationship between variables
3. Fit a straight line
4. Predict future values
6. Types of Linear Regression

(a) Simple Linear Regression

Uses only one independent variable.

Example

● Experience → Salary

Equation:

Where:

● = intercept
● = coefficient

(b) Multiple Linear Regression

Uses multiple independent variables.

Example

● House size
● Number of rooms
● Location

to predict house price.

Equation:

7. Cost Function in Linear Regression

The algorithm tries to minimize prediction error using a Cost Function.

The most commonly used cost function is Mean Squared Error (MSE).

Where:

● = cost/error
● = predicted value
● = actual value

8. Error Representation

The difference between actual value and predicted value is called error.

Error Diagram
Actual Point
*
|\
|\
Error | \
| \
| \
| \
*------\----------------
Regression Line

Smaller error means better prediction.

9. Gradient Descent

Gradient Descent is an optimization technique used to minimize error.

It updates the slope and intercept repeatedly until the minimum error is obtained.

10. Gradient Descent Flowchart


Start
|
Initialize m and b
|
Predict Output
|
Calculate Error
|
Update Parameters
|
Is Error Minimum?
| \
No Yes
| \
Repeat Stop

11. Assumptions of Linear Regression

Linear regression works properly when the following assumptions are satisfied:

1. Linear relationship exists


2. Data points are independent
3. Errors are normally distributed
4. Constant variance exists
5. No extreme outliers

12. Advantages of Linear Regression

● Simple and easy to implement


● Fast training process
● Easy interpretation
● Works well for linear data
● Useful for trend analysis

13. Disadvantages of Linear Regression

● Cannot handle complex nonlinear data


● Sensitive to outliers
● Assumes linear relationship
● Accuracy decreases with noisy data

14. Applications of Linear Regression

Linear Regression is widely used in real-world applications.


Applications

1. House price prediction


2. Weather forecasting
3. Stock market prediction
4. Sales forecasting
5. Risk analysis
6. Salary estimation

15. Real-Time Example

Suppose a company wants to predict employee salary based on years of


experience.

Experience Salary
1 30000
2 40000
3 50000
4 60000

The regression equation becomes:

Salary = 10000 × Experience + 20000

If experience = 5 years:

Salary = 10000(5) + 20000


= 70000

Predicted Salary = ₹70,000

16. Linear Regression Architecture

Overall Process Diagram


Input Data
|
v
Linear Regression
Model
|
v
Best Fit Equation
|
v
Predicted Output

17. Conclusion

Linear Regression is one of the most important supervised learning algorithms


used for predicting continuous numerical values. It establishes a linear relationship
between input and output variables using a best fit line. The algorithm minimizes
prediction error using cost functions and optimization techniques like gradient
descent.

Due to its simplicity, efficiency, and interpretability, linear regression is widely


used in machine learning, statistics, economics, and business analytics.

—---------------------------------------------------------

Logistic Regression

Introduction

Logistic Regression is a supervised learning classification algorithm used to


predict categorical outcomes. Unlike Linear Regression, which predicts continuous
values, Logistic Regression predicts the probability of a class.

It is mainly used for:

● Binary classification
● Probability estimation
● Decision making

Examples:

● Email spam detection


● Disease prediction
● Pass or fail prediction
● Fraud detection

1. Supervised Learning in Logistic Regression

In supervised learning, the model is trained using labeled data.

Structure

Input Data → Logistic Regression Model → Predicted Class

Example

Study Hours Result

2 Fail

4 Fail

6 Pass

8 Pass

The algorithm learns the relationship between study hours and exam result.

2. What is Logistic Regression?


Logistic Regression predicts the probability that an input belongs to a particular
category.

Output values are generally:

● 0 → Negative class
● 1 → Positive class

Example:

● 0 = No disease
● 1 = Disease

3. Sigmoid Function

Logistic Regression uses the Sigmoid Function to convert linear output into
probability.

Where:

● = probability output
● = exponential constant
● = linear equation

4. Linear Equation in Logistic Regression

The linear equation is:

This value is passed into the sigmoid function.

5. Sigmoid Curve Diagram

Sigmoid Function Curve

Probability

1.0 | ********

| ****
0.8 | ***

| ***

0.6 | ***

| ***

0.5 |----------***

| ***

0.3 | ***

| ***

0.0 |****____________________________

Input Value (z)

Explanation

● Output ranges between 0 and 1


● Used for probability prediction
● S-shaped curve

6. Working of Logistic Regression

The steps involved are:

1. Collect training data


2. Apply linear equation
3. Pass result through sigmoid function
4. Generate probability
5. Classify output

7. Classification Process

Diagram
Input Features

Linear Equation

Sigmoid Function

Probability Output

Class Prediction

8. Decision Boundary

A threshold value is used for classification.

Usually:

● Probability ≥ 0.5 → Class 1


● Probability < 0.5 → Class 0

Decision Boundary Diagram

Class 1

^
|

| ********

0.5|-------------|--------------

| |

| |

+---------------------------->

Threshold

(0.5)

Class 0

9. Cost Function

Logistic Regression uses Log Loss or Cross Entropy Loss.

Where:

● = actual output
● = predicted probability

10. Gradient Descent

Gradient Descent is used to minimize the cost function.

Steps

1. Initialize parameters
2. Compute predictions
3. Calculate error
4. Update weights
5. Repeat until minimum error

11. Flowchart of Logistic Regression


Start

Input Training Data

Apply Linear Equation

Apply Sigmoid Function

Calculate Error

Update Weights

Prediction Output

Stop

12. Types of Logistic Regression

(a) Binary Logistic Regression

Used for two classes.

Example:

● Yes / No
● Pass / Fail
(b) Multinomial Logistic Regression

Used for more than two categories.

Example:

● Predicting grades:
○ A, B, C

(c) Ordinal Logistic Regression

Used for ordered categories.

Example:

● Low, Medium, High

13. Advantages of Logistic Regression

● Simple and easy to implement


● Efficient for classification
● Provides probability output
● Fast training
● Works well with linearly separable data

14. Disadvantages of Logistic Regression

● Cannot model complex nonlinear relationships


● Sensitive to outliers
● Requires large datasets for better accuracy
● Assumes linear relationship between variables

15. Applications of Logistic Regression

Real-World Applications

1. Spam email detection


2. Medical diagnosis
3. Credit card fraud detection
4. Customer churn prediction
5. Sentiment analysis
6. Admission prediction

16. Real-Time Example

Suppose a student’s exam result depends on study hours.

Hours Studied Result

1 Fail

2 Fail

5 Pass

7 Pass

If the predicted probability is:

P(Pass) = 0.85

Since probability > 0.5,

Prediction = Pass

17. Comparison: Linear vs Logistic Regression

Feature Linear Regression Logistic Regression

Output Continuous values Categorical values


Purpose Prediction Classification

Graph Straight line Sigmoid curve

Output Range Any value 0 to 1

18. Conclusion

Logistic Regression is an important supervised learning algorithm mainly used for


classification problems. It predicts probabilities using the sigmoid function and
classifies data into categories using a decision boundary.

Because of its simplicity, efficiency, and interpretability, Logistic Regression is


widely used in machine learning, healthcare, finance, marketing, and data
analytics.

—---------------------------------------------------------

Decision Trees: Classification and Regression Trees (CART)

Introduction

Decision Trees are one of the most popular supervised learning algorithms used for
both:

● Classification problems
● Regression problems

A Decision Tree works like a flowchart where:

● Each internal node represents a test condition


● Each branch represents the outcome
● Each leaf node represents the final prediction

The most common form is called:

CART – Classification and Regression Trees


1. What is a Decision Tree?

A Decision Tree divides data into smaller subsets based on conditions.

It creates a tree-like structure for decision making.

Basic Structure
Root Node
|
-----------------------
| |
Condition 1 Condition 2
| |
--------- ----------
| | | |
Leaf Leaf Leaf Leaf

2. Components of Decision Tree

(a) Root Node

● Starting point of the tree


● Represents the complete dataset

(b) Decision Node

● Splits data into branches

(c) Leaf Node


● Final output or prediction

(d) Branch

● Connection between nodes

3. Working of Decision Tree

The algorithm:

1. Selects best feature


2. Splits dataset
3. Repeats splitting recursively
4. Stops when condition is satisfied

4. Classification Tree

A Classification Tree is used when output is categorical.

Examples

● Yes / No
● Spam / Not Spam
● Pass / Fail

5. Classification Tree Diagram

Example: Student Pass Prediction


Study Hours?
|
--------------------
| |
> 5 Hours <= 5 Hours
| |
Pass Fail

Explanation
● If study hours > 5 → Pass
● Otherwise → Fail

6. Regression Tree

Regression Trees are used when output is continuous numerical values.

Examples

● House price prediction


● Temperature prediction
● Salary estimation

7. Regression Tree Diagram

Example: House Price Prediction


House Size?
|
--------------------
| |
> 1500 [Link] <=1500 [Link]
| |
Price = 20L Price = 10L

8. CART (Classification and Regression Trees)

CART is a machine learning algorithm that supports:

● Classification
● Regression

It uses:

● Gini Index for classification


● Mean Squared Error (MSE) for regression

9. Gini Index in Classification


The Gini Index measures impurity in the dataset.

Where:

● = probability of class

Properties

● Gini = 0 → Pure node


● Higher Gini → More impurity

10. Entropy in Decision Trees

Entropy measures randomness in data.

Interpretation

● Lower entropy → Better split


● Higher entropy → More disorder

11. Information Gain

Information Gain measures reduction in entropy after splitting.

The feature with highest information gain is selected for splitting.

12. Decision Tree Building Process

Flowchart
Start
|
Input Dataset
|
Select Best Feature
|
Split Dataset
|
Create Branches
|
Repeat Recursively
|
Leaf Node Reached
|
Stop

13. Tree Splitting Example

Weather Prediction
Weather?
/ | \
Sunny Rainy Cloudy
| | |
Play No Play Play

14. Advantages of Decision Trees

1. Simple and easy to understand


2. Easy visualization
3. Handles both numerical and categorical data
4. Requires less data preprocessing
5. Supports classification and regression

15. Disadvantages of Decision Trees

1. Overfitting problem
2. Sensitive to noisy data
3. Large trees become complex
4. Lower accuracy compared to ensemble methods

16. Overfitting in Decision Trees

Overfitting occurs when the model learns training data too perfectly.

Overfitting Diagram
Training Accuracy → High
Testing Accuracy → Low

Solution

● Tree pruning
● Limiting depth
● Minimum samples split

17. Pruning in Decision Trees

Pruning removes unnecessary branches.

Pruning Diagram
Before Pruning After Pruning

A A
/\ /\
B C B C
/\
D E

Pruning improves generalization.

18. Applications of Decision Trees

Real-World Applications

1. Medical diagnosis
2. Fraud detection
3. Loan approval
4. Customer segmentation
5. Weather prediction
6. Recommendation systems

19. Comparison: Classification vs Regression Trees


Feature Classification Tree Regression Tree
Output Categorical Continuous
Example Pass/Fail Price Prediction
Criteria Gini / Entropy MSE
Goal Class prediction Numerical prediction
20. Real-Time Example

Loan Approval System


Income?
|
----------------
| |
High Low
| |
Credit Score? Reject
|
------------
| |
Good Bad
| |
Approve Reject

21. Conclusion

Decision Trees are powerful supervised learning algorithms used for both
classification and regression tasks. CART (Classification and Regression Trees)
builds a tree-like structure using recursive splitting techniques.

Because of their simplicity, interpretability, and ability to handle different types of


data, Decision Trees are widely used in machine learning, business analytics,
healthcare, and finance.

—----------------------------------------------------------
Neural Networks and Multilayer Perceptron (MLP)

Introduction

Artificial Neural Networks (ANNs) are computational models inspired by the


structure and working of the human brain. Neural networks are widely used in
machine learning and artificial intelligence for solving complex problems such as:

● Image recognition
● Speech recognition
● Pattern classification
● Prediction systems

A Multilayer Perceptron (MLP) is one of the most important types of neural


networks.

1. What is a Neural Network?

A Neural Network consists of interconnected processing units called neurons.

Each neuron:

● Receives input
● Processes data
● Produces output

2. Biological Neuron vs Artificial Neuron

Biological Neuron
Dendrites → Cell Body → Axon

Artificial Neuron
Inputs → Weights → Summation → Activation Function → Output

3. Structure of Artificial Neural Network

A neural network contains:


1. Input Layer
2. Hidden Layer(s)
3. Output Layer

4. Neural Network Architecture

Basic ANN Diagram


Input Layer Hidden Layer Output Layer

x1 --------\
\
x2 ----------( O ) --------\
/ \
x3 ---------/ ( O ) ---- Output

Explanation

● Inputs are fed into the network


● Hidden layers process information
● Output layer gives prediction

5. Artificial Neuron Model

The neuron computes weighted sum of inputs.

Where:

● = input
● = weight
● = bias
● = activation function
● = output

6. Working of Neural Network

Steps
1. Receive inputs
2. Multiply inputs with weights
3. Add bias
4. Apply activation function
5. Generate output

7. Activation Functions

Activation functions introduce nonlinearity.

Common Activation Functions

1. Sigmoid
2. ReLU
3. Tanh
4. Softmax

8. Sigmoid Activation Function

The sigmoid function converts output into values between 0 and 1.

9. Sigmoid Function Diagram


Output
1.0 | *****
| ***
0.8 | ***
| ***
0.5 |-------------***
| ***
0.2 | ***
| ***
0.0 |***______________________
Input

10. ReLU Activation Function

ReLU is widely used in deep learning.


Advantages

● Faster computation
● Reduces vanishing gradient problem

11. Multilayer Perceptron (MLP)

A Multilayer Perceptron is a feedforward neural network containing:

● One input layer


● One or more hidden layers
● One output layer

MLP uses supervised learning.

12. MLP Architecture

Multilayer Perceptron Diagram


Input Layer Hidden Layer 1 Hidden Layer 2 Output

x1 --------\
( O ) ----\
x2 --------/ \
( O ) ----\
x3 --------\ / \
( O ) ----/ (O)
x4 --------/

13. Feedforward Process

In MLP, data moves only in one direction:

● Input → Hidden → Output

This is called Feedforward Neural Network.

14. Backpropagation
Backpropagation is used for training MLP.

It adjusts weights to reduce prediction error.

15. Backpropagation Flow Diagram


Input Data
|
Forward Propagation
|
Prediction
|
Calculate Error
|
Backward Propagation
|
Update Weights
|
Repeat

16. Error Function

The network minimizes prediction error using loss functions.

Mean Squared Error

Where:

● = actual value
● = predicted value

17. Learning Process in MLP

Training Steps

1. Initialize weights
2. Perform forward propagation
3. Calculate error
4. Apply backpropagation
5. Update weights
6. Repeat until minimum error

18. Advantages of Neural Networks

1. Learns complex patterns


2. High prediction accuracy
3. Handles nonlinear problems
4. Supports parallel processing
5. Adaptive learning capability

19. Disadvantages of Neural Networks

1. Requires large datasets


2. High computational cost
3. Training takes more time
4. Difficult to interpret
5. Risk of overfitting

20. Applications of Neural Networks

Real-World Applications

1. Image recognition
2. Speech recognition
3. Medical diagnosis
4. Fraud detection
5. Self-driving cars
6. Natural language processing

21. Difference Between Single Layer and Multilayer Perceptron


Feature Single Layer Multilayer Perceptron
Perceptron
Hidden Layer No Yes
Complexity Simple Complex
Learning Ability Linear problems Nonlinear problems
Accuracy Lower Higher
22. Real-Time Example

Handwritten Digit Recognition


Input Image
|
Neural Network
|
Hidden Layers
|
Predicted Digit

The network learns patterns of handwritten digits and predicts the correct number.

23. Conclusion

Neural Networks are powerful machine learning models inspired by the human
brain. A Multilayer Perceptron (MLP) is a feedforward neural network with
multiple hidden layers capable of solving complex nonlinear problems.

Using activation functions, forward propagation, and backpropagation, MLPs can


learn patterns from data effectively. Neural networks are widely used in artificial
intelligence, healthcare, robotics, finance, and computer vision applications.

—----------------------------------------------------------

Support Vector Machines (SVM): Linear and Nonlinear Kernel Functions

Introduction

Support Vector Machine (SVM) is a powerful supervised learning algorithm used


for:
● Classification
● Regression
● Pattern recognition

SVM is mainly used for solving classification problems by finding the best
separating boundary between classes.

Examples:

● Face recognition
● Text classification
● Spam detection
● Image classification

1. What is Support Vector Machine?

Support Vector Machine finds the optimal hyperplane that separates data into
different classes.

The goal is to maximize the margin between classes.

2. Basic Concept of SVM

Hyperplane

A hyperplane is a decision boundary separating different classes.

For 2D data:

● Hyperplane is a straight line

For 3D data:

● Hyperplane is a plane

3. SVM Classification Diagram


Class A o o o
\
\
---------------\---------------- Hyperplane
\
\
Class B x x x

Explanation

● Circles and crosses are two classes


● SVM finds the best boundary between them

4. Support Vectors

Support vectors are the data points closest to the hyperplane.

They are very important because they determine the position of the hyperplane.

5. Margin in SVM

Margin is the distance between:

● Hyperplane
● Closest data points

SVM tries to maximize this margin.

Margin Diagram
o o o Margin x x x
|<------------------->|
---------------- Hyperplane ----------------

Larger margin → Better generalization

6. Linear SVM

Linear SVM is used when data is linearly separable.

Linear Separation Diagram


o o o

----------------------- Straight Line

x x x

The data can be separated using a straight line.

7. Linear SVM Equation

The hyperplane equation is:

Where:

● = weight vector
● = input vector
● = bias

8. Nonlinear SVM

Sometimes data cannot be separated using a straight line.

In such cases, SVM uses Kernel Functions.

9. Nonlinear Data Diagram


ooo
o o

xx
x x

Straight line cannot separate the classes properly.

10. Kernel Functions

Kernel functions transform low-dimensional data into higher-dimensional space


where separation becomes easier.
This technique is called the Kernel Trick.

11. Kernel Trick Diagram


Nonlinear Data
|
v
Kernel Transformation
|
v
Higher Dimensional Space
|
v
Linear Separation

12. Types of Kernel Functions

(a) Linear Kernel

Used for linearly separable data.

Equation:

Features

● Simple
● Fast
● Suitable for text classification

13. Polynomial Kernel

Used for curved decision boundaries.

Equation:

Where:

● = constant
● = polynomial degree
14. Polynomial Kernel Diagram
ooo

----------- Curved Boundary -----------

xxx

15. Radial Basis Function (RBF) Kernel

Most commonly used nonlinear kernel.

Equation:

Where:

● controls influence of data points

Advantages

● Handles complex data


● High accuracy

16. Sigmoid Kernel

Inspired by neural networks.

Equation:

Where:

● = slope parameter
● = constant

17. Working of SVM

Steps

1. Input training data


2. Select kernel function
3. Find support vectors
4. Construct optimal hyperplane
5. Predict new data classes

18. SVM Training Flowchart


Start
|
Input Dataset
|
Choose Kernel Function
|
Find Optimal Hyperplane
|
Identify Support Vectors
|
Classify Data
|
Stop

19. Advantages of SVM

1. High accuracy
2. Effective in high-dimensional data
3. Works for linear and nonlinear problems
4. Memory efficient
5. Good generalization capability

20. Disadvantages of SVM

1. Slow for large datasets


2. Complex parameter tuning
3. Difficult interpretation
4. Sensitive to noise

21. Applications of SVM


Real-World Applications

1. Face detection
2. Image classification
3. Handwriting recognition
4. Bioinformatics
5. Spam filtering
6. Medical diagnosis

22. Comparison: Linear vs Nonlinear SVM


Feature Linear SVM Nonlinear SVM
Data Type Linearly separable Nonlinear data
Boundary Straight line Curved boundary
Complexity Simple Complex
Kernel Required No Yes
23. Real-Time Example

Email Spam Detection


Input Email
|
Feature Extraction
|
Support Vector Machine
|
Spam / Not Spam

SVM classifies emails based on learned patterns.

24. Conclusion

Support Vector Machines are powerful supervised learning algorithms used for
both linear and nonlinear classification problems. SVM works by finding the
optimal hyperplane that maximizes the margin between classes.
For nonlinear data, kernel functions such as Polynomial, RBF, and Sigmoid
kernels help transform data into higher-dimensional space for effective separation.
Due to their high accuracy and strong performance, SVMs are widely used in
machine learning, computer vision, healthcare, and text classification systems.

—---------------------------------------------------------

K-Nearest Neighbours (KNN)

Introduction

K-Nearest Neighbours (KNN) is one of the simplest and most widely used
supervised learning algorithms. It is mainly used for:

● Classification
● Regression
● Pattern recognition

KNN predicts the output based on the nearest neighboring data points.

Examples:

● Recommendation systems
● Handwriting recognition
● Medical diagnosis
● Image classification

1. What is K-Nearest Neighbours?

KNN is a supervised learning algorithm that classifies new data points based on
similarity with existing data.

The algorithm stores all training data and predicts the class of new data using
nearest neighbors.

2. Basic Idea of KNN


The new data point is assigned to the class most common among its nearest
neighbors.

● represents the number of nearest neighbors.


● Distance is used to identify similarity.

3. KNN Working Principle

Steps

1. Choose value of
2. Calculate distance from new point to all training points
3. Select K nearest neighbors
4. Find majority class
5. Assign class to new data point

4. KNN Classification Diagram


Class A (o) Class B (x)

o o

● ← New Point

x x

If most nearby points are from Class B, the new point is classified as Class B.

5. Choosing the Value of K

● Small K → Sensitive to noise


● Large K → More stable but slower

Common values:

● 3
● 5
● 7
6. Example of KNN

Suppose:

● K=3
● Among nearest neighbors:
○ 2 belong to Class A
○ 1 belongs to Class B

Then:

● New point → Class A

7. Distance Metrics in KNN

KNN uses distance measures to identify nearest neighbors.

8. Euclidean Distance

Most commonly used distance metric.

Where:

● , = data points

9. Euclidean Distance Diagram


A(x1,y1) *
\
\
\ Distance
\
*
B(x2,y2)

10. Manhattan Distance

Measures distance along grid paths.

11. Minkowski Distance


Generalized distance metric.

12. KNN Algorithm Flowchart


Start
|
Load Training Data
|
Choose Value of K
|
Calculate Distance
|
Find K Nearest Neighbors
|
Majority Voting
|
Predict Class
|
Stop
13. KNN for Classification

Used when output is categorical.

Examples

● Spam / Not Spam


● Pass / Fail
● Disease / No Disease

14. KNN for Regression

KNN can also predict continuous values.

In regression:

● Average value of neighbors is calculated.

Example
● Predicting house prices

15. Decision Boundary in KNN

KNN creates nonlinear decision boundaries.

Diagram
ooooooo
ooooooo
------ Curved Boundary ------
xxxxxxx
xxxxxxx

16. Advantages of KNN

1. Simple and easy to understand


2. No training phase required
3. Handles multiclass problems
4. Works well with small datasets
5. Effective for nonlinear data

17. Disadvantages of KNN

1. Slow for large datasets


2. High memory usage
3. Sensitive to noise
4. Requires feature scaling
5. Choosing optimal K is difficult

18. Feature Scaling in KNN

Since KNN uses distance calculations, features should be normalized.

Common scaling methods:

● Min-Max Scaling
● Standardization
19. Real-Time Example

Fruit Classification
Weight + Color
|
v
KNN Algorithm
|
v
Apple / Orange

The algorithm compares features with nearby fruits and predicts the class.

20. Applications of KNN

Real-World Applications

1. Recommendation systems
2. Face recognition
3. Medical diagnosis
4. Pattern recognition
5. Image classification
6. Credit scoring

21. Comparison: KNN Classification vs Regression


Feature Classification Regression
Output Category Continuous value
Prediction Majority vote Average value
Example Spam detection Price prediction
22. Practical Example

Suppose K = 5:
Neighbor Class
1 A
2 A
3 B
4 A
5 B

Majority class = A

Therefore:

● New point → Class A

23. Conclusion

K-Nearest Neighbours (KNN) is a simple yet powerful supervised learning


algorithm used for classification and regression tasks. It predicts outputs by
analyzing the nearest neighboring data points using distance metrics such as
Euclidean distance.

Because of its simplicity and effectiveness, KNN is widely used in


recommendation systems, healthcare, image recognition, and pattern classification
applications.

—------------------------------------------------------------

Ensemble Learning: Bagging and Boosting

Introduction

Ensemble Learning is a machine learning technique in which multiple models are


combined to improve prediction accuracy and performance.
Instead of using a single model, ensemble learning combines several weak learners
to create a strong learner.

Ensemble methods are widely used in:

● Classification
● Regression
● Fraud detection
● Recommendation systems
● Medical diagnosis

The two major ensemble techniques are:

1. Bagging
2. Boosting

1. What is Ensemble Learning?

Ensemble learning combines predictions from multiple machine learning models to


produce better results.

Basic Idea
Model 1 \
Model 2 \
Model 3 ---> Combined Prediction
Model 4 /
Model 5 /

Advantages

● Higher accuracy
● Better generalization
● Reduced overfitting
● Improved robustness

2. Types of Ensemble Learning

Main ensemble techniques:


1. Bagging
2. Boosting
3. Stacking

This topic focuses on:

● Bagging
● Boosting

3. Bagging (Bootstrap Aggregating)

Bagging improves performance by training multiple models independently on


random subsets of data.

The final output is obtained by:

● Majority voting (classification)


● Averaging (regression)

4. Working of Bagging

Steps

1. Create multiple random datasets using bootstrap sampling


2. Train separate models
3. Combine outputs
4. Produce final prediction

5. Bagging Architecture Diagram


Original Dataset
|
-----------------------------------
| | |
Sample 1 Sample 2 Sample 3
| | |
Model 1 Model 2 Model 3
\ | /
\ | /
Final Prediction

6. Bootstrap Sampling

Bootstrap sampling means selecting random samples from the dataset with
replacement.

Some data points may appear multiple times.

7. Majority Voting in Bagging

For classification:

● Final class = Most common prediction

Example
Model Prediction
Model 1 A
Model 2 A
Model 3 B

Final Output = A

8. Random Forest – Example of Bagging

Random Forest is the most popular bagging algorithm.

It combines multiple decision trees and performs:

● Voting for classification


● Averaging for regression

9. Advantages of Bagging

1. Reduces variance
2. Prevents overfitting
3. Improves accuracy
4. Parallel training possible
5. Stable predictions

10. Disadvantages of Bagging

1. Increased computational cost


2. Complex model interpretation
3. Requires more memory

11. Boosting

Boosting is an ensemble technique where models are trained sequentially.

Each new model focuses on correcting errors made by previous models.

12. Working of Boosting

Steps

1. Train first weak learner


2. Identify errors
3. Increase weight of misclassified data
4. Train next learner
5. Combine all learners

13. Boosting Architecture Diagram


Dataset
|
Weak Learner 1
|
Errors Identified
|
Weak Learner 2
|
Errors Corrected
|
Weak Learner 3
|
Final Strong Learner

14. Key Idea of Boosting

Boosting converts weak learners into strong learners by focusing more on difficult
samples.

15. AdaBoost (Adaptive Boosting)

AdaBoost is a popular boosting algorithm.

It assigns higher importance to incorrectly classified points.

16. AdaBoost Workflow Diagram


Training Data
|
Weak Classifier 1
|
Increase Weight of Errors
|
Weak Classifier 2
|
Increase Weight Again
|
Final Combined Classifier

17. Gradient Boosting

Gradient Boosting minimizes prediction errors using gradient descent techniques.

It builds trees sequentially.

18. XGBoost

XGBoost stands for:

● Extreme Gradient Boosting


Features:

● Fast
● Efficient
● Highly accurate

Widely used in:

● Competitions
● Data science projects
● Industry applications

19. Advantages of Boosting

1. High prediction accuracy


2. Handles complex datasets
3. Reduces bias
4. Effective for classification and regression

20. Disadvantages of Boosting

1. Training is slower
2. Sensitive to noise
3. Risk of overfitting
4. Difficult parameter tuning

21. Difference Between Bagging and Boosting


Feature Bagging Boosting
Training Parallel Sequential
Goal Reduce variance Reduce bias
Data Weight Equal Weighted
Overfitting Less Higher risk
Example Random Forest AdaBoost, XGBoost
22. Bagging vs Boosting Diagram
Bagging:
Model 1
Model 2 ---> Parallel Learning ---> Combined Output
Model 3

Boosting:
Model 1 ---> Model 2 ---> Model 3 ---> Final Output
Sequential Learning

23. Applications of Ensemble Learning

Real-World Applications

1. Fraud detection
2. Medical diagnosis
3. Recommendation systems
4. Image recognition
5. Customer churn prediction
6. Stock market analysis

24. Real-Time Example

Loan Approval Prediction


Applicant Data
|
Ensemble Learning
(Bagging / Boosting)
|
Loan Approved / Rejected

Multiple models improve decision accuracy.

25. Conclusion
Ensemble Learning is a powerful machine learning approach that combines
multiple models to improve performance and prediction accuracy.

● Bagging reduces variance by training models independently.


● Boosting reduces bias by training models sequentially and correcting
previous mistakes.

Popular ensemble methods such as Random Forest, AdaBoost, and XGBoost are
widely used in modern machine learning applications because of their high
accuracy and robustness.

—------------------------------------------------------------

You might also like