Unit -3
Forecasting Numeric Data –
Regression Methods
Supervised learning has two types:
•Classification: It predicts the class of the
dataset based on the independent input variable.
Class is the categorical or discrete values. like
the image of an animal is a cat or dog?
•Regression: It predicts the continuous output
variables based on the independent input
variable. like the prediction of house prices based
on different parameters like house age, distance
from the main road, location, area, etc.
Variables
• An independent variable is the condition that you
change in an experiment. In other words, it is the
variable you control. It is independent because its
value does not depend on and is not affected by
the state of any other variable in the experiment.
• The dependent variable is the condition that you
measure in an experiment. You are assessing how
it responds to a change in the independent
variable, so you can think of it as depending on
the independent variable.
Example of dependent & independent variable
• In a study to determine whether the
amount of time a student sleeps affects
test scores, the independent variable is
the amount of time spent sleeping while
the dependent variable is the test score.
Another Examples
• predicting sale price depending upon vehicle
age
• each year of job experience may be worth an
additional $1,000 in yearly salary
• body weight is a function of one's calorie
intake
How to Plot Variables on a Graph
• There is a standard method for graphing independent
and dependent variables. The x-axis is the
independent variable, while the y-axis is the
dependent variable. You can use the
DRY MIX acronym to remember how to graph
variables:
DRY MIX
• D = dependent variable
R = responding variable
Y = graph on the vertical or y-axis
• M = manipulated variable
I = independent variable
X = graph on the horizontal or x-axis
Regression
• It is a statistical method that helps us understand and
predict the relationship between variables.
• Describes how one variable changes as another
variable changes.
• Dependent variable: we are trying to predict or
explain (y)
• Independent variable: are used to predict or explain
the changes in the dependent variable x.
• Mathematical relationships help us to
understand many aspects of everyday life.
• When such relationships are expressed with
exact numbers, we gain additional clarity.
MARKS
HOURS
Regression
Linear Logistic
Regression Regression
Linear Regression types
• Single linear regression
• Multi linear regression
• Polynomial regression
Single Linear Regression
This is the simplest form of linear
regression, and it involves only one
independent variable and one dependent
variable. The equation for simple linear
regression is:
Least square method
Least Square Method formula is used to find
the best-fitting line through a set of data points.
Example
• Find the line of best fit for the following
data points using the Least Square method:
(x,y) = (1,3), (2,4), (4,8), (6,10), (8,15).
The slope of the line of best fit can be calculated
from the formula as follows:
m = (Σ (X – xi)*(Y – yi)) /Σ(X – xi)2
m = 55/32.8 = 1.68 (rounded upto 2 decimal places)
Now, the intercept will be calculated from the formula
as follows:
c = Y – mX
c = 8 – 1.68*4.2 = 0.94
Thus, the equation of the line of best fit becomes, y =
1.68x + 0.94.
Question 1: Predict the price of pizza
whose diameter is 20 inch
Solution
Example
Question 2: Find the weight of a person whose
height is 6.3? Height Weight
5.1 63
5.5 66
5.8 69
6.1 72
6.4 75
6.7 78
6.4 75
6.1 72
5.10 69
5.7 66
A well-fitting model should have residuals that are symmetrically distributed around
zero, which is generally the case here
24.515 represents the intercept. It means that when height (x) is 0, the
predicted weight (y) is 24.515 units
7.807 is the slope(m), means for each 1- unit increase in height, the predicted
weight increases by approx. 7.807 units , 1.313 indicates how much uncertainty is
there in this estimate.
Multiple linear regression
• Most real-world analyses have more than one
independent variable. Therefore, it is likely
that you will be using multiple linear
regression for most numeric prediction tasks.
The strengths and weaknesses of multiple
linear regression are shown in the following
table:
• It is the most common form of Linear
Regression. Multiple Linear Regression
basically describes how a single response
variable Y depends linearly on a number of
predictor variables.
• The basic examples where Multiple Regression
can be used are as follows:
• The selling price of a house can depend on the
desirability of the location, the number of
• bedrooms, the number of bathrooms, the year
the house was built, the square footage of the
lot and a number of other factors.
• The height of a child can depend on the height
of the mother, the height of the father,
nutrition, and environmental factors.
Example
Consider the following dataset: x1, x2 are independent variables and y is dependent
variable
The matrices for Y and X are given as:
-1
Multiple linear equation is given as:
Example
Correlation
• The correlation between two variables is a number
that indicates how closely their relationship follows
a straight line. Without additional qualification,
correlation typically refers to Pearson's correlation
coefficient, which was developed by the 20th
century mathematician Karl Pearson. The correlation
ranges between -1 and +1. The extreme values
indicate a perfectly linear relationship, while a
correlation close to zero indicates the absence of a
linear relationship.
Eg: Study Eg. Price and
hours vs marks demand of product