0% found this document useful (0 votes)

6 views76 pages

Chapter 2 - Supervised Learning

This document provides an overview of supervised learning, focusing on its definition, tasks, and types, including classification and regression. It details regression analysis, including simple and multiple linear regression, polynomial regression, and the associated cost functions and optimization techniques like gradient descent. Additionally, it discusses the concepts of overfitting, underfitting, regularization techniques, and evaluation metrics for regression models.

Uploaded by

maamiyee1210

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views76 pages

Chapter 2 - Supervised Learning

Uploaded by

maamiyee1210

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Fundamentals of Machine

Learning
Chapter 2: Supervised Learning

Target Group: G3SE A & B

Instructor: Melaku M.
What is Supervised learning?
❖ Approach:
• Given a collection of records (training set)
• each record contains a set of attributes, one of the attributes is the
class attribute (label).
• Dataset D={(x1,y1),(x2,y2),..., (xn,yn)}

❖ Task: Find a model for class attribute as a function of the values of other
attributes.

❖ Goal: Previously unseen records should be assigned a class as accurately as

possible.
Examples of supervised learning Tasks
❖ Classifying credit card transactions as legitimate or fraudulent
• Attributes: your age, income, debts, …
• Class: are you getting credit by your bank?
❖ Marketing
• Attributes: previously bought products, browsing behavior
• Class: are you a target customer for a new product?
❖ SPAM Detection: Categorizing email messages
• Attributes: words and header fields of an e-mail
• Class: regular e-mail or spam e-mail?
❖ Identifying tumor cells: Predicting tumor cells as benign or malignant
• Attributes: features extracted from x-rays or MRI scans
• Class: malignant or benign cells
❖ Categorizing news stories as finance, weather, entertainment, sports, etc.
Supervised learning
There are two major types of supervised machine learning problems.
• Classification
• Regression
1. Regression
❖Regression Analysis is a statistical approach for modelling the relationships
between the dependent variables and one or more independent variables.

❖In Regression, we plot a graph between the variables, which best fit the given
data points. In naïve words, “Regression shows a line or curve that passes
through all the data points on a target-predictor graph in such a way that the
vertical distance between the data points and the regression line is minimum”.

❖Regression is principally used for prediction, forecasting, time series modeling,

and determining the causal-effect relationship between variables.
Predicting House Prices
Problem: Given features like the size of a house, and the number of
bedrooms of the house, predict its price.
Independent variable Dependent variable

Size (sq ft) Bedrooms Price ($)

1914 4 245679.32
2859 3 338129.41
1465 2 192672.58
1297 5 194902.16
2433 1 251292.31
Types of Regression models
❖There are several types of regression techniques, each suited for
different types of data and different types of relationships.

❖The main types of regression techniques are:

➢Linear Regression
➢Polynomial Regression
Linear Regression
❖This is the simplest and most common type of regression algorithm.

❖It models the relationship/connection between a dependent variable (what

you are trying to predict) and one or more independent variables (the
features that influence the dependent variable) using a linear equation.

❖Linear regression is a regression model that uses a straight line to

describe the relationship between variables.

❖The goal of the algorithm is to find the best Fit Line that passes through
most of data points.
Linear Regression

In the figure given above, on X-axis is the independent variable and on Y-axis is
the dependent variable(target). The regression line (a sloped straight line) is
referred to as the best-fit line for a model. And our main objective in this
algorithm is to find this best fit line based on the given data points.
Types of Linear Regression
1) Simple Linear Regression

❖In a simple linear regression, there is one independent variable(x) and one
dependent variable(y).

❖The model estimates the regression coefficients(slope and intercept) of

the line of best fit, which represents the relationship between the
variables.

❖The goal of the linear regression algorithm is to get the best values
for B0 and B1 to find the best fit line.
1) Simple Linear Regression

❖The equation for simple linear regression

y=mx+b => ŷ = B0+B1x

❖where,
y = predicted value. It represents the value that the model tries to predict.
x = feature value
B0 = bias term or intercept of the line.
B1 = the feature weights or regression coefficient. Slope of best-fit line.
❖B0, B1 are known as Model Parameters.
2) Multiple Linear Regression
❖Multiple linear regression is a technique to model the relationship between
a single dependent variable and multiple independent variables.
❖The equation for multiple linear regression is:
ŷ=𝛽0+𝛽1x1+ 𝛽2x2 +……… 𝛽nxn
where:
❖ŷ is the predicted value
❖n is the number of features
❖xi ith feature value
❖𝛽0 is the intercept/bias term
❖𝛽i, ith feature weight value
Hypothesis function in Linear Regression
❖Let us assume that our independent feature is the experience i.e. X and the
respective salary Y is the dependent variable. Let’s assume there is a linear
relationship between X and Y then the salary can be predicted using:

Ŷ= B0 + B1 X Or ŷi= B0 + B1 xi

𝑦𝑖 𝜖 𝑌(𝑖=1,2,⋯,𝑛) are labels to data

𝑥𝑖 𝜖 𝑋(𝑖=1,2,⋯,𝑛) are the input independent training data

ŷi 𝜖 Ŷ(𝑖=1,2,⋯,𝑛) are the predicted values.

❖Once we find the best B0 and B1 values, we get the best-fit line.
This algorithm explains the linear relationship between the dependent variable y
and the independent(predictor) variable X using a straight-line Y= B0 + B1 X.

▪ Red line is the best fit line predicted

by the model.
▪ Blue dots are the data points
▪ b1 is the slope of the x variable.
Random error

The best fit line should have the least error means the error between predicted values and actual
values should be minimized.
Cont’d
❖The vertical distance between the data point and the regression line is
known as error or residual. Each data point has one residual and the sum of
all the differences is known as the Sum of Residuals/Errors.

❖Mathematical Approach:

❖Residual/Error = Actual values – Predicted Values

❖Sum of Residuals/Errors = Sum(Actual- Predicted Values)

❖Square of Sum of Residuals/Errors = (Sum(Actual- Predicted Values))2

Cost Function
❖The cost function is nothing but the error or difference between the actual value yi and
the predicted value Y.
❖In Linear Regression, the Mean Squared Error (MSE) cost function is employed, which
calculates the average of the squared errors between the actual values ŷ and the
predicted values yi.
❖It helps us to figure out optimal/best values for B0 and B1, which provides the best fit line
for the given data points.
• Our MSE tells us how well (or rather how poorly) our function fits our data. This means
that we want to make our MSE as low as possible.
• Our MSE is just a function, we can compute the minimum of it directly by taking the first
derivative of the MSE.
Cost Function
➢Utilizing the MSE function, the iterative process of gradient descent is applied to
update/adjust the values of B0 and B1 .This ensures that the MSE value converges
to the global minima, signifying the most accurate fit of the linear regression line
to the dataset.
➢This process involves continuously adjusting the parameters based on the
gradients calculated from the MSE. The final result is a linear regression line that
minimizes the overall squared differences between the predicted and actual
values, providing an optimal representation of the underlying relationship in the
data.
Gradient Descent
❖Gradient Descent is a generic optimization algorithm capable of
finding optimal solutions to a wide range of problems. The general idea
of Gradient Descent is to tweak model parameters iteratively in order to
minimize a cost function, to reach the optimal solution.
❖A linear regression model can be trained using the optimization
algorithm gradient descent by iteratively modifying the model’s
parameters to reduce the mean squared error (MSE) of the model on a
training dataset.
Gradient Descent
❖To update B0 and B1, we take gradients from the MSE cost function. To find these
gradient, we compute the partial derivatives of the cost function (MSE) with respect
to both m and c.

❖These gradients tell us how much we should adjust B0 and B1 to reduce the error.
❖Initially, it starts with random B0 and B1 values and then iteratively update the
values, reaching minimum cost.
❖A gradient is nothing but a derivative that defines the effects on outputs of the
function with a little bit of variation in inputs.
How Linear regression works
1. Initialize parameters: Start with initial values for the slope (m) and y-intercept (c).
2. Calculate predictions: Use your current parameter values (m and c) to predict the
dependent variable (ŷ) for each data point.
3. Calculate gradients: Compute the partial derivatives of the cost function (MSE) with
respect to both m and c. These gradients indicate the direction of the steepest descent in
the cost function landscape.
4. Update parameters: Use the learning rate (α) to determine the step size for the update.
Subtract the learning rate multiplied by the gradients from the current values of m and c.
This moves the parameters in the direction that minimizes the cost function.
5. Repeat steps 2-4: Continue iterating until the cost function converges (stops decreasing
significantly) or reaches a certain threshold.
Impact of different values for learning rate
Polynomial regression
❖Polynomial regression helps to build a model over non-linear data.
❖A simple linear regression algorithm only works when the relationship
between the data is linear. But suppose we have non-linear data, then
linear regression will not be able to draw a best-fit line.

Consider the diagram, which has a

non-linear relationship, and you can
see the linear regression results on it,
which does not perform well,
meaning it does not come close to
reality.
Polynomial regression
❖Polynomial Regression is a regression algorithm that models the
relationship between a dependent variable(y) and independent
variable(x) as nth degree polynomial.
❖The Polynomial Regression equation is given below:

❖Polynomial regression is a form of Linear regression where only

due to the Non-linear relationship between dependent and
independent variables, we add some polynomial terms to linear
regression to convert it into Polynomial regression.
Types of Polynomial Regression
Linear Regression Vs. Polynomial Regression
• We build our model and realize that it performs badly. We examine the
difference between the actual value and the best fit line we predicted,
and it appears that the true value has a curve on the graph, but our line is
nowhere near cutting the mean of the points. This is where polynomial
regression comes into play; it predicts the best-fit line that matches the
pattern of the data (curve).
Math Behind Polynomial Regression!
Cost Function
❖A common cost function employed in polynomial regression is the
mean squared error (MSE). It calculates the average of the squared
differences between the predicted values and the actual values (y).

❖Cost = (1 / n) * sum((yi - y_predi) ^ 2)

❖Where:
❖n represents the total number of training examples.
❖The summation calculates the squared error for each data point.

❖So, the equation can also be written as:

Cost = 1/n*sum(square(yi - (b0 + b1x + b2x2 + b3x3.....))
Gradient Descent for Polynomial Regression
• Gradient descent is an optimization algorithm used to find the values of
parameters (coefficients) of a function that minimizes a cost function.

The values of slope (m) and slope-intercept (b) will be set to 0 at the start of
the function, and the learning rate (α) will be introduced. The learning rate
(α) is set to an extremely low number, perhaps between 0.01 and 0.0001.
The learning rate is a tuning parameter in an optimization algorithm that
sets the step size at each iteration as it moves toward the cost function’s
minimum.
Overfitting Vs Under-fitting
❖In linear regression, when analyzing a dataset linearly, we encounter an under-
fitting problem.
• Model is unable to capture data patterns due to the model’s simple assumptions in
fitting the data.
❖In polynomial regression one thing that we face is the problem of overfitting
this happens because while we increase the order/degree of the polynomial
regression to achieve better and better performance, model gets overfit on the
data and does not perform on the new data points.
• Model memorizes the training data too well and loses generalizability.
❖Due to this reason only while using the polynomial regression, do we try to
penalize the weights of the model to regularize the effect of the overfitting
problem.
Overfitting Vs Under-fitting
Regularization Technique
• While developing machine learning models you must have encountered a
situation in which the training accuracy of the model is high but the
validation accuracy is low. This is the case which is popularly known as
overfitting.

• Regularization in ML which helps us to solve the problem of overfitting.

• Regularization is implemented by adding a “penalty” term to the best fit

derived from the trained data, in order to achieve a lesser variance with the
tested data and also restricts the influence of feature variables over the
output variable by compressing their coefficients.
Regularization in Machine Learning
❖Regularization is a technique used to prevent overfitting by adding a
penalty term to the loss function, discouraging the model from assigning
too much importance to individual features or coefficients.

❖ The commonly used regularization techniques are:

▪ Lasso Regularization – L1 Regularization
▪ Ridge Regularization – L2 Regularization
Evaluation metrics for regression models
• To build and deploy a generalized model we need to evaluate the model
on different metrics which helps us to better optimize the performance,
fine-tune it, and obtain a better result.
• Some of the Evaluation metrics used for Regression analysis are:
1. Mean Absolute Error (MAE)

• MAE is simplest and easily comprehensible loss

functions, which calculates the absolute difference
between actual and predicted values. We aim to get a
minimum MAE because this is a loss.
Evaluation metrics for regression models

2. Mean Squared Error (MSE) :

MSE is a most used and very simple metric
MSE represents the squared distance between actual and
predicted values.

3. Root Mean Squared Error (RMSE) :

• As RMSE is clear by the name itself, that it is a

simple square root of mean squared error.
R2 score is a metric that tells the
performance/accuracy of your model. R² Score usually
ranges from 0 to 1. SST is the total sum of squares
4. R-squared (R²) Score: and SSR is the total sum of squares of residuals.
Classification
✓Given: Dtrain= (x1, y1), . . . , (xn, yn) / xi ∈ Rd and yi is the label

➢Goal:
✓Learn a function to predict y given x
– y is categorical/discrete == classification
✓In other word, Build a model that can accurately predict the
class labels of new, unseen instances.
Types of classification tasks
• Types of classification tasks based on the nature of the class labels.
• Binary Classification: The simplest case, where there are only two
classes (e.g., classifying an email as spam/not spam(ham)).
• Multi-Class Classification: More than two classes (e.g., classifying
images as cat, dog, or horse. Predicting handwritten digits[0-9]
where there are 10 classes( response variable can have any value
ranging from 0 to 9).
• Multi-Label Classification: A data point can belong to multiple
classes simultaneously. For example, an image can contain a cat, a
dog, and a person. predicting the presence of multiple diseases in a
patient based on medical records.
Classification problem 1: Predicting the Weather
• Suppose we take a real-world example of predicting the weather.
• Let’s keep it simple and say we are trying to predict if the weather is sunny
or rainy based on multiple input data samples consisting of attributes or
features like humidity, temperature, pressure, and precipitation.
• Input features = { humidity, temperature, pressure, and precipitation}
• Output labels = {Sunny, runny}

• Since the prediction can be either sunny or rainy, there are a total of two
distinct classes in total; hence this problem can also be termed as a binary
classification problem.
Figure. Supervised learning: binary classification for weather prediction
Classification problem 2: Iris plant classification
Cont.… the data:
Cont.…
Cont.…
Classification has a vast array of real-world applications
1) Fraud Detection: Classifying financial transactions as fraudulent or legitimate based on
patterns, anomalies, or historical data.
2) Sentiment analysis: Categorizing customer product reviews as positive, negative, or
neutral.
3) Image recognition: Classifying images into various categories like cat, dog, horse etc.
4) Object recognition: Identifying objects in an image, like a picture containing a cat, a
person, and a tree.
5) News article/document classification: Categorizing a news article by topic, such as
politics, sports, and technology (based on their content).
6) Cybersecurity threat detection: Identifying malicious network activity from a vast
amount of regular network traffic.
7) Credit Risk Assessment: Classifying loan applicants as low risk or high risk based on their
credit history, income, and other relevant factors.
8) Customer Churn Prediction: Classifying customers as likely to churn(cancel service) or
not churn from a subscription or service based on their behavior, usage patterns, or
demographics.
Typical classification models
Typical classification models include the following major types of
methods; however, the list is not exhaustive.
• Linear models like logistic regression, Naïve Bayes, and support vector
machines
• Non-parametric models like K-nearest neighbors(KNN)
• Tree based methods like decision trees
• Ensemble methods like random forests (bagging) and gradient boosted
machines (boosting)
• Neural networks (MLPs)
DecisionTree
• Decision Trees are versatile supervised machine Learning algorithms that
can perform both classification and regression tasks. They are powerful
algorithms, capable of fitting complex datasets.
Sample Dataset

• Columns denote features

X i, (xi, y i )
• Rows denote labeled
instances (xi, y i )
• Class label denotes
whether customer buys a
computer.
Decision Tree Induction

• Decision tree induction is the

learning of decision trees from
class-labeled training tuples.
• To build optimal decision tree
select an attribute that produces
purest subset of records.

select best attribute

that splits this data set.
Examples of Decision Tree Induction:

• A decision tree is grown by recursively partitioning the training records

successively into purer subsets.
Cont’d

Purest record because all instances

belongs to the same class

yes
all examples are yes
Cont’d

select best attribute

that splits this data set.

Yes
all examples are yes,
No further split is required
Decision Tree Induction Algorithm: Example

yes

no yes
all examples are no all examples are yes
Decision Tree Induction Algorithm: Example

select best attribute that splits this data set.

Decision Tree Induction Algorithm: Example
Training and Testing Decision Tree Model
Training Data Model: Decision Tree

Test Data

Apply Model
Tree-based Models is Easy to Interpret
• A path from root to a leaf node Set of rules can be
corresponds to a rule visualized as a tree.
• E.g., if age<=30 and student=no then
target value, buys=no

✓ Each path from the tree root to a

leaf corresponds to a conjunction
of attribute tests.
✓ The instance (age <=30, income =
low, student = no, credit_rating =
fair) is classified as a negative
instance.
Decision Tree Induction Algorithm: Example 2
Features Class/Label

.
Build a decision tree model for the given
dataset

Given : <outlook=sunny,
temperature=hot, humidity=high,
wind=weak> ?

Predict, if there will be a match?

Many possible Decision Trees
There could be more than one tree that fits the samedata!

Which Decision Tree is the Best?

Decision Tree Induction Algorithms
❖ How to learn a decision tree from training data?

❖ Which feature should be used to break the dataset?

✓ finding an optimal decision tree is NP-hard

✓ tree building algorithms thus use a greedy, top-down, recursive partitioning

strategy to induce a reasonable solution
Number of Algorithms:
• Hunt’s: Hunt's Algorithm (1966)
• Quinlan’s: Iterative Dichotomizer3 (ID3) (1975) uses Entropy
– C4.5 / 4.8 / 5.0 (1993)(Successor of ID3) uses Entropy
• Brieman’s: CART: Classification and Regression Trees (1984) uses Gini
Design Issues for Decision Tree induction
1. How should training records be split?
❖How to specify the attribute test condition (Splitting Criterion)?
◆ Depending on attribute types(Nominal, Ordinal, Continuous)
❖How to determine the best split?
• Measure for evaluating the goodness of a test condition
• Different purity measures can be used:
• information gain, , Gini index, gain ratio misclassification error, …
2. When should the splitting procedure stop?
–Stop splitting if all the records belong to the same class.
• Early termination depending on the results of a statistical test, b/c fully
grown trees might overfit training data
How to Determine the Best Split?

Before splitting the dataset contains:

❖ 10 records of class C0 and
❖ 10 records of class C1

Which attribute test is the best?

How to determine the Best Split?

❖ Key problem: choosing which attribute to split a given set of training records.
❖ Greedy approach:
▪ Nodes with purer/homogeneous class distribution are preferred
▪ Need a measure of node impurity:

Non-homogeneous Homogeneous
High degree of node impurity Low degree of node impurity
How to determine the Best Split?

Node Impurity
Very impure group Less impure Minimum impurity

❖ Common measures of node impurity

1. GINI Index
2. Information gain:-follows max-gain

• The ID3 algorithm uses the Max-Gain method of selecting the

best attribute
Splitting Based on Information Gain
Weighted impurity measure of children
❖ Information Gain:

Parent Node p is split into k partitions; ni is

number of records in partition i
❖ Information gain measures the entropy reduction of a split

❖ We choose the split with the largest reduction (maximal GAIN)

❖ Disadvantage: Tends to prefer splits that result in large number of
partitions, each being small but pure (split by ID attribute?)
How to determine the Best Split?
Splitting Based on Information gain
❖ Information gain relies on the entropy of each node
❖ Entropy of a given node t:

p( j | t) is the relative frequency of class j at node t

❖ Entropy measures homogeneity of a node(impurity of a node)

Splitting Based on Information Gain
Calculating Information Gain
Parametric and non-parametric methods
These methods approach learning from data in fundamentally different ways:
Parametric methods:

❖Parametric methods make assumptions about the underlying statistical

distribution of your data and functional forms(eg. Y=mx+b).

❖Parametric method assumes/follows a specific predefined form for model,

and summarizes the given data with a fixed number of parameters (independent
of the number of training examples).

❖Imagine a straight line for linear regression - that's the form, and the parameters
are the slope and intercept.
Parametric and non-parametric methods
Parametric methods:
❖The algorithms involve two steps:
• Select a form for the function.
• Learn the coefficients for the function from the training data.
A. Linear Regression: It assumes a linear relationship between the input
variables and the target variable and estimates the coefficients of the
linear equation.
B. Naive Bayes: It assumes that the features are conditionally
independent given the class label and estimates the class probabilities
using Bayes' theorem.
Non-parametric Methods
❖These methods make minimal assumptions about the data's underlying
structure. They learn the model(functional form) directly from the data,
allowing for more complex relationships.
❖Non-parametric methods don't rely on predefined forms, and often have a
flexible number of parameters that grow with the amount of training data.
The model complexity can grow with more data, making them adaptable.
❖If your data is complex or you're unsure about its distribution(no prior
knowledge), non-parametric methods offer more flexibility.
Choosing the right method:
The best method depends on your specific problem and data:
❖If you have a strong understanding of your data and can make
reasonable assumptions about its distribution, parametric methods
can be a good choice, especially when data is limited.

Supervised Learning: Linear Regression Guide
No ratings yet
Supervised Learning: Linear Regression Guide
31 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
9 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
10 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
19 pages
Unit 3
No ratings yet
Unit 3
103 pages
Comprehensive Linear Regression Guide
No ratings yet
Comprehensive Linear Regression Guide
17 pages
Supervised Learning: Linear Regression Guide
No ratings yet
Supervised Learning: Linear Regression Guide
34 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
19 pages
Basic Interview Question of Linear Regression
No ratings yet
Basic Interview Question of Linear Regression
9 pages
Understanding Linear Regression Basics
100% (1)
Understanding Linear Regression Basics
8 pages
Supervised Learning and Regression Analysis
No ratings yet
Supervised Learning and Regression Analysis
20 pages
Machine Learning Module 2 Notes
No ratings yet
Machine Learning Module 2 Notes
12 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
11 pages
Linear Regression - pptx-3
No ratings yet
Linear Regression - pptx-3
17 pages
Introduction to Linear Regression in ML
No ratings yet
Introduction to Linear Regression in ML
4 pages
Supervised Learning: Regression & Classification
No ratings yet
Supervised Learning: Regression & Classification
136 pages
Machine Learning: Linear Regression Insights
No ratings yet
Machine Learning: Linear Regression Insights
24 pages
Supervised Learning in AI & ML
No ratings yet
Supervised Learning in AI & ML
33 pages
Linear Regression in AI Fundamentals
No ratings yet
Linear Regression in AI Fundamentals
22 pages
Regression in Machine Learning
No ratings yet
Regression in Machine Learning
13 pages
Supervised Learning: Regression Models
No ratings yet
Supervised Learning: Regression Models
123 pages
Understanding Regression in Machine Learning
No ratings yet
Understanding Regression in Machine Learning
76 pages
Statistical Decision Theory & Linear Regression
No ratings yet
Statistical Decision Theory & Linear Regression
16 pages
Supervised Learning LR MLR LogisticR
No ratings yet
Supervised Learning LR MLR LogisticR
80 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
10 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
17 pages
Understanding Linear Regression in ML
No ratings yet
Understanding Linear Regression in ML
26 pages
Intro ML
No ratings yet
Intro ML
66 pages
Supervised Learning: Linear Regression Guide
No ratings yet
Supervised Learning: Linear Regression Guide
24 pages
Understanding Linear Regression in ML
No ratings yet
Understanding Linear Regression in ML
17 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
79 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
15 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
54 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
20 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
83 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Understanding Linear and Polynomial Regression
No ratings yet
Understanding Linear and Polynomial Regression
32 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
41 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
13 pages
Linear Regression Explained: Concepts & Code
No ratings yet
Linear Regression Explained: Concepts & Code
7 pages
Supervised Learning: Classification & Regression
No ratings yet
Supervised Learning: Classification & Regression
307 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
36 pages
Applications of Regression Analysis
No ratings yet
Applications of Regression Analysis
98 pages
Linear Regression Basics Explained
No ratings yet
Linear Regression Basics Explained
6 pages
Linear Regression Basics in ML
No ratings yet
Linear Regression Basics in ML
1 page
Unit II 2
No ratings yet
Unit II 2
79 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
29 pages
Lecture Slide 04 - Regression
No ratings yet
Lecture Slide 04 - Regression
25 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
130 pages
Machine Learning: Linear Regression Guide
No ratings yet
Machine Learning: Linear Regression Guide
36 pages
Salary Prediction via Linear Regression
No ratings yet
Salary Prediction via Linear Regression
11 pages
Understanding Simple Linear Regression
No ratings yet
Understanding Simple Linear Regression
47 pages
Regression Model Performance Metrics
No ratings yet
Regression Model Performance Metrics
11 pages
Supervised Learning: Regression & Classification
No ratings yet
Supervised Learning: Regression & Classification
30 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
24 pages
Lossy vs Lossless Compression Explained
No ratings yet
Lossy vs Lossless Compression Explained
12 pages
Compound Nouns
No ratings yet
Compound Nouns
1 page
Positive Discipline in Teaching
No ratings yet
Positive Discipline in Teaching
33 pages
Motor and Generator Principles Explained
No ratings yet
Motor and Generator Principles Explained
2 pages
Ethiopia Early Learning System Report
No ratings yet
Ethiopia Early Learning System Report
80 pages
Montalvo Salon & Spa Work Plan Analysis
No ratings yet
Montalvo Salon & Spa Work Plan Analysis
23 pages
Non-Digital Alternatives in Education
No ratings yet
Non-Digital Alternatives in Education
43 pages
Math5 - Q3 - M19 - Measuring The Circumference of A Circle Using Appropriate Tools
No ratings yet
Math5 - Q3 - M19 - Measuring The Circumference of A Circle Using Appropriate Tools
17 pages
Ceniza Pearl Corp Income Statement 2024
No ratings yet
Ceniza Pearl Corp Income Statement 2024
42 pages
Verifying Bernoulli's Law with Autodesk CFD
No ratings yet
Verifying Bernoulli's Law with Autodesk CFD
8 pages
SPN 4364 FMI 18 Diagnostic Update
No ratings yet
SPN 4364 FMI 18 Diagnostic Update
16 pages
Tennis Speed and Agility Training Guide
No ratings yet
Tennis Speed and Agility Training Guide
3 pages
Enphase Technology Overview and Impact
No ratings yet
Enphase Technology Overview and Impact
31 pages
GK Logistics Vietnam Insights
No ratings yet
GK Logistics Vietnam Insights
7 pages
Gestalt Therapy Techniques for Depression
100% (1)
Gestalt Therapy Techniques for Depression
26 pages
14LK14 PDF
No ratings yet
14LK14 PDF
28 pages
Computer Science Graduate with AI/ML Skills
No ratings yet
Computer Science Graduate with AI/ML Skills
1 page
Pharmacy Interview Self-Introduction Guide
No ratings yet
Pharmacy Interview Self-Introduction Guide
19 pages
Android Backup File Viewer Settings
No ratings yet
Android Backup File Viewer Settings
5 pages
Surface Tension Experiment Overview
No ratings yet
Surface Tension Experiment Overview
11 pages
Polycab Cable Current Rating Chart
No ratings yet
Polycab Cable Current Rating Chart
2 pages
Phonetics and Grammar Exercises for G8
No ratings yet
Phonetics and Grammar Exercises for G8
6 pages
Burt Labeler Conversion Kit Overview
No ratings yet
Burt Labeler Conversion Kit Overview
1 page
Multicultural London English Explained
No ratings yet
Multicultural London English Explained
23 pages
Grade 9 English Weekly Lesson Plan
No ratings yet
Grade 9 English Weekly Lesson Plan
3 pages
Retail Deployment Guide: Microsoft Dynamics AX 2012 Feature Pack
No ratings yet
Retail Deployment Guide: Microsoft Dynamics AX 2012 Feature Pack
39 pages
Roki 2
No ratings yet
Roki 2
2 pages
Operations Improvement Strategies Explained
No ratings yet
Operations Improvement Strategies Explained
41 pages
NLP for Cross-Domain Ambiguity Detection
No ratings yet
NLP for Cross-Domain Ambiguity Detection
40 pages
Grant Application Cover Letter Guide
100% (1)
Grant Application Cover Letter Guide
7 pages

Chapter 2 - Supervised Learning

Uploaded by

Chapter 2 - Supervised Learning

Uploaded by

Fundamentals of Machine

Target Group: G3SE A & B

❖ Goal: Previously unseen records should be assigned a class as accurately as

❖Regression is principally used for prediction, forecasting, time series modeling,

Size (sq ft) Bedrooms Price ($)

❖The main types of regression techniques are:

❖It models the relationship/connection between a dependent variable (what

❖Linear regression is a regression model that uses a straight line to

❖The model estimates the regression coefficients(slope and intercept) of

❖The equation for simple linear regression

𝑦𝑖 𝜖 𝑌(𝑖=1,2,⋯,𝑛) are labels to data

ŷi 𝜖 Ŷ(𝑖=1,2,⋯,𝑛) are the predicted values.

▪ Red line is the best fit line predicted

❖Residual/Error = Actual values – Predicted Values

❖Sum of Residuals/Errors = Sum(Actual- Predicted Values)

❖Square of Sum of Residuals/Errors = (Sum(Actual- Predicted Values))2

Consider the diagram, which has a

❖Polynomial regression is a form of Linear regression where only

❖Cost = (1 / n) * sum((yi - y_predi) ^ 2)

❖So, the equation can also be written as:

• Regularization in ML which helps us to solve the problem of overfitting.

• Regularization is implemented by adding a “penalty” term to the best fit

❖ The commonly used regularization techniques are:

• MAE is simplest and easily comprehensible loss

2. Mean Squared Error (MSE) :

3. Root Mean Squared Error (RMSE) :

• As RMSE is clear by the name itself, that it is a

• Columns denote features

• Decision tree induction is the

select best attribute

• A decision tree is grown by recursively partitioning the training records

Purest record because all instances

select best attribute

that splits this data set.

select best attribute that splits this data set.

✓ Each path from the tree root to a

Predict, if there will be a match?

Which Decision Tree is the Best?

❖ Which feature should be used to break the dataset?

✓ finding an optimal decision tree is NP-hard

✓ tree building algorithms thus use a greedy, top-down, recursive partitioning

Before splitting the dataset contains:

Which attribute test is the best?

❖ Common measures of node impurity

• The ID3 algorithm uses the Max-Gain method of selecting the

Parent Node p is split into k partitions; ni is

❖ We choose the split with the largest reduction (maximal GAIN)

p( j | t) is the relative frequency of class j at node t

❖ Entropy measures homogeneity of a node(impurity of a node)

❖Parametric methods make assumptions about the underlying statistical

❖Parametric method assumes/follows a specific predefined form for model,

You might also like