0% found this document useful (0 votes)
4 views77 pages

Unit 3

The document provides an overview of supervised learning, including core algorithms such as Linear Regression, Logistic Regression, Decision Trees, and Random Forest, along with their applications in predicting equipment failure. It outlines key concepts, types of machine learning, and the workflow for building and deploying supervised models. Additionally, it discusses the importance of model evaluation metrics and techniques to prevent overfitting.

Uploaded by

ca8ham
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views77 pages

Unit 3

The document provides an overview of supervised learning, including core algorithms such as Linear Regression, Logistic Regression, Decision Trees, and Random Forest, along with their applications in predicting equipment failure. It outlines key concepts, types of machine learning, and the workflow for building and deploying supervised models. Additionally, it discusses the importance of model evaluation metrics and techniques to prevent overfitting.

Uploaded by

ca8ham
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Supervised Learning

Supervised Learning
• Core algorithms: Linear Regression, Logistic Regression,
Decision Trees, Random Forest
• Applications: Predicting equipment failure or component
lifespan

[Link]
Supervised Learning and Artificial
Intelligence
An in-depth exploration of supervised learning, its place in artificial intelligence, and key machine learning components.
Learning Objectives
1 Define Core Concepts
Distinguish between AI, ML, and Deep Learning.

2 Identify ML Paradigms
Understand different types of machine learning approaches.

3 Grasp Supervised Learning


Explore its theory and practical applications.

4 Discover Algorithms & Metrics


Common algorithms and how to evaluate model performance.

5 Model Workflow
Steps to build and deploy supervised models.
Artificial Intelligence (AI)

▪ Artificial Intelligence refers to the design of systems capable of


performing tasks that typically require human intelligence. These
include logical reasoning, decision-making, pattern recognition,
and language understanding.

▪ AI systems can operate autonomously, adapt to new data, and

optimise performance with minimal human intervention,

revolutionising industries from healthcare to finance.


Machine Learning (ML)
▪ Machine Learning is a subfield of AI that focuses on algorithms

and models that learn from data. Rather than following explicit

instructions, ML systems identify patterns and make

predictions or decisions based on historical data.

▪ This data-driven approach enables flexible and scalable

problem-solving across a range of domains, from predictive

analytics to natural language processing.


AI, ML, and Deep Learning: A Hierarchical View
Deep Learning
Layered neural networks for complex patterns

Machine Learning
Algorithms that learn from data

Artificial Intelligence
Broad goal of mimicking intelligence

These distinctions define different levels of abstraction and algorithmic depth, with each layer building upon the last to

achieve more sophisticated intelligent behaviors.


Types of Machine Learning

Supervised Learning Unsupervised Learning Reinforcement Learning


Trained on labeled datasets with Identifies structure in unlabeled Learns optimal actions by
input-output pairs. data via clustering or reduction. interacting with environment for

rewards.

Each paradigm has distinct use-cases and assumptions regarding data availability and learning objectives, driving diverse

applications across various industries.


Supervised Learning
▪ Supervised learning involves learning a function from input-

output pairs. The model is trained on a dataset where each

input is associated with a correct output label.

▪ The objective is to generalise from the training data to

accurately predict outputs for new, unseen inputs. This

approach is used for both classification (categorical output)

and regression (continuous output), making it versatile for

various prediction tasks.


Key Elements of Supervised Learning
Features (X) Labels (Y)
Input variables describing each data instance. Known output values for each instance.

Training Set Test Set


Data subset used to train the model. Separate data for evaluating model performance.

Loss Function Optimization Algorithm


Quantifies prediction error. Adjusts parameters to minimise loss.
Supervised Learning Algorithms
Linear & Logistic Regression Decision Trees & Random Forests
Modelling linear relationships and predicting Tree-based models for classification and regression,

binary/multi-class outcomes. ensemble methods for enhanced accuracy.

SVM & k-NN Neural Networks


Support Vector Machines for clear class separation and Layered architectures for complex nonlinear

k-Nearest Neighbors for similarity-based predictions. relationships.


Linear & Logistic Regression

Linear Regression
A supervised learning algorithm used for predicting a continuous

output variable. It models the relationship between independent

variables and a dependent variable by fitting the best linear equation to

observed data.

Ideal for tasks like predicting house prices or stock trends.


It assumes that there is a linear relationship
between the input and output, meaning the output
changes at a constant rate as the input changes.
This relationship is represented by a straight line.

For example we want to predict a student's exam


score based on how many hours they studied. We
observe that as students study more hours, their
scores go up. In the example of predicting exam
scores based on hours studied. Here
•Independent variable (input): Hours studied
because it's the factor we control or observe.
•Dependent variable (output): Exam score because
it depends on how many hours were studied.
Mathematics of Linear Regression
Linear regression involves fitting a straight line (the regression line) to a set of data points in
such a way that the sum of the squared differences between the observed and predicted
values is minimized. The equation of the regression line is typically represented as:
Y = mx + b
where (Y) is the dependent variable, (x) is the independent variable, (m) is the slope of the
line, and (b) is the y-intercept.
The goal is to find the values of (m) and (b) that best fit the data, often using methods like
the least squares approach.
House Price Prediction using Linear Regression

House price prediction using linear regression unfolds with following strategic sequence of steps:

•Data Collection
•Data Pre-processing
•Feature Engineering
•Model Selection [Link]
•Model Training
•Model Prediction
•Model Evaluation

•Libraries Required

[Link]: For numerical operations and array handling


[Link]: To manipulate and analyze structured data efficiently
[Link]: For creating visualizations and plots
[Link]: To enhance the aesthetics of our visualizations built on top of Matplotlib
[Link]-learn: A comprehensive machine learning library for model building and evaluation
Logistic Regression

▪ Primarily used for binary or multi-class classification problems. It

estimates the probability of an instance belonging to a particular

class by applying a sigmoid function to a linear equation. Common

applications include spam detection and disease prediction.

Refer to Logistic [Link] for an implementation example


Classification & Logistic Regression: Key Concepts

▪ Binary Classification: The simplest type, where each input is assigned


to one of two classes. The image typically shows a clear division
between two groups.

▪ Multi-class Classification: The task involves categorising inputs into one


of three or more distinct classes. The visual often depicts several
separated groups with distinct labels.

▪ Multi-label Classification: Unlike the other two, here each input can
belong to multiple classes simultaneously. The image might show
overlapping labels or sets associated with each input to indicate
multiple simultaneous classifications.
Decision Trees & Random Forests

Decision Trees

▪ A supervised learning algorithm that creates a tree-like model of

decisions and their possible consequences.

▪ It splits the dataset into smaller and smaller subsets while at the same

time an associated decision tree is incrementally developed. They are

used for both classification and regression tasks.

▪ Decision Trees are intuitive and easy to interpret, but a single

tree can be prone to overfitting, especially with complex data,

leading to poor generalisation on new data.

Refer to Decision [Link] for an implementation example


8:59 AM
Decision Tree
▪ Start
▪ ▼
▪ Decision: Time Available?
▪ ├─ Yes Decision: Weather
▪ │ ├─ Good Decision: Motivation?
▪ │ │ ├─ High Go to Gym
▪ │ │ └─ Low Stay Home
▪ │ └─ Bad Go to Gym
▪ └─ No Stay Home

▪ Explanation of each path:

▪ If you have time and the weather is good, check your motivation:

▪ High motivation → Go to Gym

▪ Low motivation → Stay Home

▪ If you have time and the weather is bad, you still go to the gym.

▪ If you don’t have time, you stay home.


Decision trees are of two main types:
Classification tree analysis is when the predicted
outcome is categorical class to which the data
belongs.

Regression tree analysis is when the predicted


outcome can be considered a real number (e.g. the
price of a house, or a patient's length of stay in a
hospital).
Regression tree
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
8:59 AM
Complete Algorithm Process for
Decision Tree (ID3)
▪ Start with the entire dataset as the root.
▪ Calculate the entropy (impurity) at the current node.
▪ For each feature, compute the information gain when splitting on that
feature.
▪ Choose the feature with the highest information gain to split the data.
▪ Create child nodes for each split branch.
▪ Repeat the process recursively for each child node until stopping criteria are
met (e.g., pure nodes or max depth reached).
▪ The result is a tree that classifies data by traversing through feature-based
splits.
ROC Curve
▪ The "ROC Curve" (Receiver Operating Characteristic curve) image illustrates
the trade-off between the True Positive Rate (Recall) and False Positive
Rate at various classification thresholds:

▪ The curve plots sensitivity (TPR) on the y-axis against (1 - specificity) or FPR
on the x-axis.

▪ A point is plotted for each threshold setting.

▪ The area under the curve (AUC) is highlighted to indicate overall model
performance higher AUC means better ability to distinguish between
classes.
Random Forests

▪ An ensemble learning method that operates by constructing a


multitude of decision trees during training. For classification tasks,
the output is the class selected by most trees (voting), and for
regression, it's the mean prediction of the individual trees.

▪ This approach effectively reduces the problem of overfitting


common in single decision trees and generally provides much
higher accuracy and stability, making them highly robust for
various complex datasets.

Refer to Random [Link] for an implementation example


8:59 AM
8:59 AM
Random Forest Ensemble

▪ An ensemble of multiple decision trees, each trained on a


different bootstrapped sample.

▪ At each split in a tree, only a random subset of features is


considered to further inject diversity.

▪ To classify a new instance, each tree provides a prediction


(vote).

▪ The final output is determined by majority voting across all


trees.

▪ This ensemble approach reduces overfitting and improves


prediction accuracy compared to a single decision tree.
Bootstrap Sampling for Random Forest

▪ Randomly sample data points from the original dataset with


replacement to create multiple bootstrapped subsets.

▪ Each bootstrapped sample is used to train an individual decision


tree.

▪ Because of sampling with replacement, some data points may


appear multiple times, while others may be left out.

▪ This randomness leads to diverse trees, improving overall model


robustness.
SVM & k-NN
Support Vector Machines (SVM) k-Nearest Neighbors (k-NN)
Support Vector Machines (SVM) are powerful supervised learning algorithms, mainly used for classification k-Nearest Neighbors (k-NN) is a non-parametric, instance-based supervised learning algorithm. It classifies a

tasks. SVMs work by finding the optimal hyperplane that best separates data points into different classes, new data point based on the majority class among its 'k' nearest neighbours in the feature space. For regression,
maximising the margin between the closest data points of different classes. This makes them highly effective for it predicts the average value of its 'k' nearest neighbours. k-NN is simple to understand and implement, making it

tasks requiring clear class separation, even in high-dimensional spaces, such as image recognition and text suitable for recommendation systems, pattern recognition, and anomaly detection.
categorization.
➢ A Support Vector Machine (SVM) is a
discriminative classifier formally defined by a
separating hyperplane.

➢ In other words, given labeled training data


(supervised learning), the algorithm outputs
an optimal hyperplane which categorizes new
examples.
The operation of the SVM algorithm is based on
finding the hyperplane that gives the largest
minimum distance to the training examples
The K-nearest neighbor (KNN) is a supervised machine learning algorithm.

The K-Nearest Neighbors Algorithm classify new data points to a particular category based on its
similarity with the other data points in that category.
Neural Networks

▪ Neural Networks are a subset of machine learning, inspired by the


structure and function of the human brain.
▪ They consist of interconnected nodes, or 'neurons', organised in layers:
an input layer, one or more hidden layers, and an output layer.

▪ These networks are highly effective at identifying complex, non-


linear relationships within data, making them suitable for tasks
such as image recognition, natural language processing, and
advanced predictive modelling.
▪ Their ability to learn representations from raw data autonomously
distinguishes them from simpler algorithms.
Overfitting in Machine Learning
What is Overfitting?

▪ Occurs when a model learns the training data too well, including noise and
random fluctuations

▪ The model performs excellently on training data but poorly on new,


unseen data

▪ It memorises specific patterns rather than learning general relationships

Model Complexity Comparison:

▪ Simple Model (Underfitting): Too basic, misses important patterns, poor


performance on both training and test data

▪ Optimal Model: Balances complexity, captures true patterns, good


performance on both training and test data

▪ Complex Model (Overfitting): Too complicated, fits noise in training data,


excellent training performance but poor test performance
Overfitting in Machine Learning

Key Indicators:

▪ Large gap between training accuracy and validation/test accuracy

▪ Training error continues to decrease while validation error starts to


increase

▪ The model has too many parameters relative to the amount of training
data

Prevention Methods:

▪ Cross-validation to monitor performance on unseen data

▪ Regularization techniques (Ridge, Lasso)

▪ Pruning in decision trees

▪ Early stopping in iterative algorithms

▪ Using more training data when possible


Confusion Matrix

▪ True Positive (TP): Correctly predicted positive cases.


▪ True Negative (TN): Correctly predicted negative cases.
▪ False Positive (FP): Incorrectly predicted as positive (Type I error).
▪ False Negative (FN): Incorrectly predicted as negative (Type II error).

This matrix is the foundation for calculating common evaluation metrics


like accuracy, precision, recall, and F1 score, helping to assess the
performance of a classifier.
Supervised learning Applications: Predicting equipment failure

A manufacturing plant uses sensors to monitor equipment such as motors, pumps,


and turbines. The data collected includes:

•Temperature (°C)
•Vibration levels (Hz)
•Pressure (Pa)
•Running hours
•Lubricant quality index

Historical data is labeled as:

•0 = No Failure
•1 = Failure Occurred
Steps to Predict Equipment Failure using Random Forest
•Collect historical data from sensors: temperature, vibration, pressure,
running hours, and failure status (0 = No Failure, 1 = Failure).
•Prepare the dataset by separating features (Temperature, Vibration,
Pressure, Running_Hours) and target (Failure).
•Split the data into training and testing sets to evaluate performance.
•Initialize the Random Forest Classifier with parameters
like n_estimators (number of trees) and random_state.
•Train the model using the training set ([Link](X_train, y_train)).
•Predict failures on the test set ([Link](X_test)) and for new equipment
data.
•Evaluate performance using metrics such as Accuracy, Confusion Matrix,
and Classification Report.
•Interpret results to decide maintenance actions before failure occurs.
Program attached , on same topic
Thanks

You might also like