0% found this document useful (0 votes)

4 views77 pages

Unit 3

The document provides an overview of supervised learning, including core algorithms such as Linear Regression, Logistic Regression, Decision Trees, and Random Forest, along with their applications in predicting equipment failure. It outlines key concepts, types of machine learning, and the workflow for building and deploying supervised models. Additionally, it discusses the importance of model evaluation metrics and techniques to prevent overfitting.

Uploaded by

ca8ham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views77 pages

Unit 3

Uploaded by

ca8ham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Supervised Learning

Supervised Learning
• Core algorithms: Linear Regression, Logistic Regression,
Decision Trees, Random Forest
• Applications: Predicting equipment failure or component
lifespan

[Link]
Supervised Learning and Artificial
Intelligence
An in-depth exploration of supervised learning, its place in artificial intelligence, and key machine learning components.
Learning Objectives
1 Define Core Concepts
Distinguish between AI, ML, and Deep Learning.

2 Identify ML Paradigms
Understand different types of machine learning approaches.

3 Grasp Supervised Learning

Explore its theory and practical applications.

4 Discover Algorithms & Metrics

Common algorithms and how to evaluate model performance.

5 Model Workflow
Steps to build and deploy supervised models.
Artificial Intelligence (AI)

▪ Artificial Intelligence refers to the design of systems capable of

performing tasks that typically require human intelligence. These
include logical reasoning, decision-making, pattern recognition,
and language understanding.

▪ AI systems can operate autonomously, adapt to new data, and

optimise performance with minimal human intervention,

revolutionising industries from healthcare to finance.

Machine Learning (ML)
▪ Machine Learning is a subfield of AI that focuses on algorithms

and models that learn from data. Rather than following explicit

instructions, ML systems identify patterns and make

predictions or decisions based on historical data.

▪ This data-driven approach enables flexible and scalable

problem-solving across a range of domains, from predictive

analytics to natural language processing.

AI, ML, and Deep Learning: A Hierarchical View
Deep Learning
Layered neural networks for complex patterns

Machine Learning
Algorithms that learn from data

Artificial Intelligence
Broad goal of mimicking intelligence

These distinctions define different levels of abstraction and algorithmic depth, with each layer building upon the last to

achieve more sophisticated intelligent behaviors.

Types of Machine Learning

Supervised Learning Unsupervised Learning Reinforcement Learning

Trained on labeled datasets with Identifies structure in unlabeled Learns optimal actions by
input-output pairs. data via clustering or reduction. interacting with environment for

rewards.

Each paradigm has distinct use-cases and assumptions regarding data availability and learning objectives, driving diverse

applications across various industries.

Supervised Learning
▪ Supervised learning involves learning a function from input-

output pairs. The model is trained on a dataset where each

input is associated with a correct output label.

▪ The objective is to generalise from the training data to

accurately predict outputs for new, unseen inputs. This

approach is used for both classification (categorical output)

and regression (continuous output), making it versatile for

various prediction tasks.

Key Elements of Supervised Learning
Features (X) Labels (Y)
Input variables describing each data instance. Known output values for each instance.

Training Set Test Set

Data subset used to train the model. Separate data for evaluating model performance.

Loss Function Optimization Algorithm

Quantifies prediction error. Adjusts parameters to minimise loss.
Supervised Learning Algorithms
Linear & Logistic Regression Decision Trees & Random Forests
Modelling linear relationships and predicting Tree-based models for classification and regression,

binary/multi-class outcomes. ensemble methods for enhanced accuracy.

SVM & k-NN Neural Networks

Support Vector Machines for clear class separation and Layered architectures for complex nonlinear

k-Nearest Neighbors for similarity-based predictions. relationships.

Linear & Logistic Regression

Linear Regression
A supervised learning algorithm used for predicting a continuous

output variable. It models the relationship between independent

variables and a dependent variable by fitting the best linear equation to

observed data.

Ideal for tasks like predicting house prices or stock trends.

It assumes that there is a linear relationship
between the input and output, meaning the output
changes at a constant rate as the input changes.
This relationship is represented by a straight line.

For example we want to predict a student's exam

score based on how many hours they studied. We
observe that as students study more hours, their
scores go up. In the example of predicting exam
scores based on hours studied. Here
•Independent variable (input): Hours studied
because it's the factor we control or observe.
•Dependent variable (output): Exam score because
it depends on how many hours were studied.
Mathematics of Linear Regression
Linear regression involves fitting a straight line (the regression line) to a set of data points in
such a way that the sum of the squared differences between the observed and predicted
values is minimized. The equation of the regression line is typically represented as:
Y = mx + b
where (Y) is the dependent variable, (x) is the independent variable, (m) is the slope of the
line, and (b) is the y-intercept.
The goal is to find the values of (m) and (b) that best fit the data, often using methods like
the least squares approach.
House Price Prediction using Linear Regression

House price prediction using linear regression unfolds with following strategic sequence of steps:

•Data Collection
•Data Pre-processing
•Feature Engineering
•Model Selection [Link]
•Model Training
•Model Prediction
•Model Evaluation

•Libraries Required

[Link]: For numerical operations and array handling

[Link]: To manipulate and analyze structured data efficiently
[Link]: For creating visualizations and plots
[Link]: To enhance the aesthetics of our visualizations built on top of Matplotlib
[Link]-learn: A comprehensive machine learning library for model building and evaluation
Logistic Regression

▪ Primarily used for binary or multi-class classification problems. It

estimates the probability of an instance belonging to a particular

class by applying a sigmoid function to a linear equation. Common

applications include spam detection and disease prediction.

Refer to Logistic [Link] for an implementation example

Classification & Logistic Regression: Key Concepts

▪ Binary Classification: The simplest type, where each input is assigned

to one of two classes. The image typically shows a clear division
between two groups.

▪ Multi-class Classification: The task involves categorising inputs into one

of three or more distinct classes. The visual often depicts several
separated groups with distinct labels.

▪ Multi-label Classification: Unlike the other two, here each input can
belong to multiple classes simultaneously. The image might show
overlapping labels or sets associated with each input to indicate
multiple simultaneous classifications.
Decision Trees & Random Forests

Decision Trees

▪ A supervised learning algorithm that creates a tree-like model of

decisions and their possible consequences.

▪ It splits the dataset into smaller and smaller subsets while at the same

time an associated decision tree is incrementally developed. They are

used for both classification and regression tasks.

▪ Decision Trees are intuitive and easy to interpret, but a single

tree can be prone to overfitting, especially with complex data,

leading to poor generalisation on new data.

▪ Explanation of each path:

▪ If you have time and the weather is good, check your motivation:

▪ High motivation → Go to Gym

▪ Low motivation → Stay Home

▪ If you have time and the weather is bad, you still go to the gym.

▪ If you don’t have time, you stay home.

Decision trees are of two main types:
Classification tree analysis is when the predicted
outcome is categorical class to which the data
belongs.

Regression tree analysis is when the predicted

outcome can be considered a real number (e.g. the
price of a house, or a patient's length of stay in a
hospital).
Regression tree
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
Complete Algorithm Process for
Decision Tree (ID3)
▪ Start with the entire dataset as the root.
▪ Calculate the entropy (impurity) at the current node.
▪ For each feature, compute the information gain when splitting on that
feature.
▪ Choose the feature with the highest information gain to split the data.
▪ Create child nodes for each split branch.
▪ Repeat the process recursively for each child node until stopping criteria are
met (e.g., pure nodes or max depth reached).
▪ The result is a tree that classifies data by traversing through feature-based
splits.
ROC Curve
▪ The "ROC Curve" (Receiver Operating Characteristic curve) image illustrates
the trade-off between the True Positive Rate (Recall) and False Positive
Rate at various classification thresholds:

▪ The curve plots sensitivity (TPR) on the y-axis against (1 - specificity) or FPR
on the x-axis.

▪ A point is plotted for each threshold setting.

▪ The area under the curve (AUC) is highlighted to indicate overall model
performance higher AUC means better ability to distinguish between
classes.
Random Forests

▪ An ensemble learning method that operates by constructing a

multitude of decision trees during training. For classification tasks,
the output is the class selected by most trees (voting), and for
regression, it's the mean prediction of the individual trees.

▪ This approach effectively reduces the problem of overfitting

common in single decision trees and generally provides much
higher accuracy and stability, making them highly robust for
various complex datasets.

Refer to Random [Link] for an implementation example

8:59 AM
8:59 AM
Random Forest Ensemble

▪ An ensemble of multiple decision trees, each trained on a

different bootstrapped sample.

▪ At each split in a tree, only a random subset of features is

considered to further inject diversity.

▪ To classify a new instance, each tree provides a prediction

(vote).

▪ The final output is determined by majority voting across all

trees.

▪ This ensemble approach reduces overfitting and improves

prediction accuracy compared to a single decision tree.
Bootstrap Sampling for Random Forest

▪ Randomly sample data points from the original dataset with

replacement to create multiple bootstrapped subsets.

▪ Each bootstrapped sample is used to train an individual decision

tree.

▪ Because of sampling with replacement, some data points may

appear multiple times, while others may be left out.

▪ This randomness leads to diverse trees, improving overall model

robustness.
SVM & k-NN
Support Vector Machines (SVM) k-Nearest Neighbors (k-NN)
Support Vector Machines (SVM) are powerful supervised learning algorithms, mainly used for classification k-Nearest Neighbors (k-NN) is a non-parametric, instance-based supervised learning algorithm. It classifies a

tasks. SVMs work by finding the optimal hyperplane that best separates data points into different classes, new data point based on the majority class among its 'k' nearest neighbours in the feature space. For regression,
maximising the margin between the closest data points of different classes. This makes them highly effective for it predicts the average value of its 'k' nearest neighbours. k-NN is simple to understand and implement, making it

tasks requiring clear class separation, even in high-dimensional spaces, such as image recognition and text suitable for recommendation systems, pattern recognition, and anomaly detection.
categorization.
➢ A Support Vector Machine (SVM) is a
discriminative classifier formally defined by a
separating hyperplane.

➢ In other words, given labeled training data

(supervised learning), the algorithm outputs
an optimal hyperplane which categorizes new
examples.
The operation of the SVM algorithm is based on
finding the hyperplane that gives the largest
minimum distance to the training examples
The K-nearest neighbor (KNN) is a supervised machine learning algorithm.

The K-Nearest Neighbors Algorithm classify new data points to a particular category based on its
similarity with the other data points in that category.
Neural Networks

▪ Neural Networks are a subset of machine learning, inspired by the

structure and function of the human brain.
▪ They consist of interconnected nodes, or 'neurons', organised in layers:
an input layer, one or more hidden layers, and an output layer.

▪ These networks are highly effective at identifying complex, non-

linear relationships within data, making them suitable for tasks
such as image recognition, natural language processing, and
advanced predictive modelling.
▪ Their ability to learn representations from raw data autonomously
distinguishes them from simpler algorithms.
Overfitting in Machine Learning
What is Overfitting?

▪ Occurs when a model learns the training data too well, including noise and
random fluctuations

▪ The model performs excellently on training data but poorly on new,

unseen data

▪ It memorises specific patterns rather than learning general relationships

Model Complexity Comparison:

▪ Simple Model (Underfitting): Too basic, misses important patterns, poor

performance on both training and test data

▪ Optimal Model: Balances complexity, captures true patterns, good

performance on both training and test data

▪ Complex Model (Overfitting): Too complicated, fits noise in training data,

excellent training performance but poor test performance
Overfitting in Machine Learning

Key Indicators:

▪ Large gap between training accuracy and validation/test accuracy

▪ Training error continues to decrease while validation error starts to

increase

▪ The model has too many parameters relative to the amount of training
data

Prevention Methods:

▪ Cross-validation to monitor performance on unseen data

▪ Regularization techniques (Ridge, Lasso)

▪ Pruning in decision trees

▪ Early stopping in iterative algorithms

▪ Using more training data when possible

Confusion Matrix

▪ True Positive (TP): Correctly predicted positive cases.

▪ True Negative (TN): Correctly predicted negative cases.
▪ False Positive (FP): Incorrectly predicted as positive (Type I error).
▪ False Negative (FN): Incorrectly predicted as negative (Type II error).

This matrix is the foundation for calculating common evaluation metrics

like accuracy, precision, recall, and F1 score, helping to assess the
performance of a classifier.
Supervised learning Applications: Predicting equipment failure

A manufacturing plant uses sensors to monitor equipment such as motors, pumps,

and turbines. The data collected includes:

•Temperature (°C)
•Vibration levels (Hz)
•Pressure (Pa)
•Running hours
•Lubricant quality index

Historical data is labeled as:

•0 = No Failure
•1 = Failure Occurred
Steps to Predict Equipment Failure using Random Forest
•Collect historical data from sensors: temperature, vibration, pressure,
running hours, and failure status (0 = No Failure, 1 = Failure).
•Prepare the dataset by separating features (Temperature, Vibration,
Pressure, Running_Hours) and target (Failure).
•Split the data into training and testing sets to evaluate performance.
•Initialize the Random Forest Classifier with parameters
like n_estimators (number of trees) and random_state.
•Train the model using the training set ([Link](X_train, y_train)).
•Predict failures on the test set ([Link](X_test)) and for new equipment
data.
•Evaluate performance using metrics such as Accuracy, Confusion Matrix,
and Classification Report.
•Interpret results to decide maintenance actions before failure occurs.
Program attached , on same topic
Thanks

Supervised vs Unsupervised Learning Overview
No ratings yet
Supervised vs Unsupervised Learning Overview
62 pages
Supervised Learning in AI: Key Concepts
No ratings yet
Supervised Learning in AI: Key Concepts
25 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
10 pages
ML - Supervised and Unsupervised Learning
No ratings yet
ML - Supervised and Unsupervised Learning
146 pages
Unit 2 Final
No ratings yet
Unit 2 Final
94 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
12 pages
Supervised Learning Fundamentals
No ratings yet
Supervised Learning Fundamentals
17 pages
Supervised vs Unsupervised Learning
No ratings yet
Supervised vs Unsupervised Learning
92 pages
Introduction to Supervised Learning
No ratings yet
Introduction to Supervised Learning
35 pages
Linear Regression and Classification Methods
No ratings yet
Linear Regression and Classification Methods
38 pages
Types and Applications of Machine Learning
No ratings yet
Types and Applications of Machine Learning
33 pages
Unit-2 Supervised Learning - Regression
No ratings yet
Unit-2 Supervised Learning - Regression
5 pages
Overview of Supervised Learning
No ratings yet
Overview of Supervised Learning
24 pages
Independence of Events in Machine Learning
No ratings yet
Independence of Events in Machine Learning
39 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
29 pages
Unit 2nd Data Analytics
No ratings yet
Unit 2nd Data Analytics
68 pages
ML Chapter1 English
No ratings yet
ML Chapter1 English
14 pages
Understanding Regression Analysis
No ratings yet
Understanding Regression Analysis
50 pages
Supervised Learning: Classification & Regression
No ratings yet
Supervised Learning: Classification & Regression
115 pages
Regression and Logistic Models Explained
No ratings yet
Regression and Logistic Models Explained
46 pages
Understanding Machine Learning Models
No ratings yet
Understanding Machine Learning Models
39 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
100 pages
Supervised Learning Overview and Types
No ratings yet
Supervised Learning Overview and Types
31 pages
ML Notes
No ratings yet
ML Notes
22 pages
Supervised Learning: Classification & Regression
No ratings yet
Supervised Learning: Classification & Regression
83 pages
Machine Learning
No ratings yet
Machine Learning
22 pages
ML Unit-2
No ratings yet
ML Unit-2
5 pages
Machine Learning Types and Algorithms
No ratings yet
Machine Learning Types and Algorithms
30 pages
Machine Learning Tutorial Part 2
No ratings yet
Machine Learning Tutorial Part 2
101 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
17 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
76 pages
Types of Machine Learning Algorithms
No ratings yet
Types of Machine Learning Algorithms
13 pages
Machine Learning Regression Overview
No ratings yet
Machine Learning Regression Overview
24 pages
CAD 201-SM05 Removed
No ratings yet
CAD 201-SM05 Removed
19 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
38 pages
Types of Machine Learning Algorithms
No ratings yet
Types of Machine Learning Algorithms
33 pages
Types of Machine Learning Explained
No ratings yet
Types of Machine Learning Explained
14 pages
Supervised Learning Algorithms Overview
No ratings yet
Supervised Learning Algorithms Overview
67 pages
Overview of Machine Learning Models
No ratings yet
Overview of Machine Learning Models
10 pages
Understanding Machine Learning Types
No ratings yet
Understanding Machine Learning Types
244 pages
Machine Learning Algorithms Guide with R
No ratings yet
Machine Learning Algorithms Guide with R
15 pages
Supervised Learning: Regression vs. Classification
No ratings yet
Supervised Learning: Regression vs. Classification
10 pages
Unit 2 - Machine Learning Notes
No ratings yet
Unit 2 - Machine Learning Notes
43 pages
Building Classification Models in Python
No ratings yet
Building Classification Models in Python
33 pages
AI in Predictive and Prescriptive Analytics
No ratings yet
AI in Predictive and Prescriptive Analytics
4 pages
Regression Analysis and Metrics Guide
No ratings yet
Regression Analysis and Metrics Guide
54 pages
Supervised Machine Learning Overview
No ratings yet
Supervised Machine Learning Overview
132 pages
Supervised Learning: Classification vs Regression
No ratings yet
Supervised Learning: Classification vs Regression
5 pages
Supervised Learning: Classification & Regression
No ratings yet
Supervised Learning: Classification & Regression
14 pages
Essential Machine Learning Algorithms in Python & R
100% (5)
Essential Machine Learning Algorithms in Python & R
46 pages
Machine Learning: Types and Applications
No ratings yet
Machine Learning: Types and Applications
20 pages
Comprehensive Machine Learning Guide
No ratings yet
Comprehensive Machine Learning Guide
7 pages
Supervised Learning: Classification & Regression
No ratings yet
Supervised Learning: Classification & Regression
187 pages
Unit 3
No ratings yet
Unit 3
34 pages
ML New 20.02.26
No ratings yet
ML New 20.02.26
52 pages
Solution Manual Management 10e Angelo Kinicki
No ratings yet
Solution Manual Management 10e Angelo Kinicki
7 pages
Design Thinking Workshop at IIM Raipur
No ratings yet
Design Thinking Workshop at IIM Raipur
4 pages
20 Questions for Social Marketing Strategy
No ratings yet
20 Questions for Social Marketing Strategy
5 pages
AI's Impact: Pros and Cons Explained
No ratings yet
AI's Impact: Pros and Cons Explained
2 pages
Human Rights in Science and Technology
No ratings yet
Human Rights in Science and Technology
4 pages
Italian Short Stories For Beginners p01 20
100% (4)
Italian Short Stories For Beginners p01 20
20 pages
Global Dimensions of International Business
No ratings yet
Global Dimensions of International Business
3 pages
Understanding Complex Engineering Problems
No ratings yet
Understanding Complex Engineering Problems
47 pages
Time Management and Academic Performance
No ratings yet
Time Management and Academic Performance
6 pages
Prac Res 2 Q2 Week 1
No ratings yet
Prac Res 2 Q2 Week 1
9 pages
Friendship and Spiritual Recollection Activities
100% (7)
Friendship and Spiritual Recollection Activities
4 pages
Design Thinking Training for Innovators
No ratings yet
Design Thinking Training for Innovators
30 pages
OIP+ Overview and Psychometric Analysis
100% (1)
OIP+ Overview and Psychometric Analysis
47 pages
Phonostylistics: Analyzing Sound in Context
No ratings yet
Phonostylistics: Analyzing Sound in Context
3 pages
Entrepreneurial Motivation and Student Interest
No ratings yet
Entrepreneurial Motivation and Student Interest
6 pages
Theoretical Framework for Research Design
No ratings yet
Theoretical Framework for Research Design
3 pages
Understanding Sentence Stress in English
No ratings yet
Understanding Sentence Stress in English
22 pages
Understanding Abnormal Behavior
No ratings yet
Understanding Abnormal Behavior
34 pages
Casey Sheehan: Counseling Credentials
No ratings yet
Casey Sheehan: Counseling Credentials
6 pages
Audio-Lingual Lesson on Animals
No ratings yet
Audio-Lingual Lesson on Animals
2 pages
Objectives of Process Recording in Nursing
No ratings yet
Objectives of Process Recording in Nursing
4 pages
Phases of the Social Work Helping Process
No ratings yet
Phases of the Social Work Helping Process
7 pages
Understanding Metacognition in Learning
No ratings yet
Understanding Metacognition in Learning
20 pages
Four Stages of Acculturation Explained
No ratings yet
Four Stages of Acculturation Explained
2 pages
Non-Digital and Digital Teaching Tools
No ratings yet
Non-Digital and Digital Teaching Tools
19 pages
8th Grade EFL Microcurricular Plan Unit 1
No ratings yet
8th Grade EFL Microcurricular Plan Unit 1
9 pages
Earthquake Preparedness Lesson Plan
No ratings yet
Earthquake Preparedness Lesson Plan
2 pages
MCA Research Methodology Assignment 2024
No ratings yet
MCA Research Methodology Assignment 2024
5 pages
Business Ethics: Plato & Socratic Method
No ratings yet
Business Ethics: Plato & Socratic Method
7 pages
PODD Communication Books Overview
No ratings yet
PODD Communication Books Overview
22 pages