MODULE 2
REGRESSION
Regression
Regression is a statistical technique used for modeling the
relationship between one or more independent variables (also
known as predictors or features) and a dependent variable (also
known as the response or outcome)
The goal of regression analysis is to understand how changes in the
independent variables are associated with changes in the dependent
variable
It allows you to make predictions, understand trends, and quantify
the strength and nature of the relationships between variables.
In essence, regression helps you find a mathematical equation that best fits the data
and represents the underlying relationship between the variables
This equation can then be used to predict the value of the dependent variable for
new or unseen values of the independent variables.
There are various types of regression techniques, including simple linear regression,
multiple linear regression, logistic regression, polynomial regression
Regression analysis is widely used in fields such as economics, social sciences,
biology, finance, engineering, and machine learning to uncover patterns, make
predictions, and gain insights into how variables are related to each other.
Different regression models
Different regression models are defined based on the type of functions used to represent the relation
between the dependent variable y and the independent variables.
1. Simple linear regression
There is only one independent variable x and the relation between x and y is modeled by the
relation
y = a + bx
2. Multiple regression
Let there be more than one independent variable, say x1, x2, . . ., xn, and let the relation between y and the
independent variables be modeled as
y = α0 + α1x1 + ⋯ + αnxn
then it is case of multiple linear regression or multiple regression.
3. Polynomial regression
Let there be only one variable x and let the relation between x , y be modeled as
y = a 0 + a 1 x + a 2 x 2 + ⋯ + a nx n
for some positive integer n > 1
4. Logistic regression
Logistic regression is used when the dependent variable is binary (0/1,
True/False, Yes/No) in nature. Even though the output is a binary variable, what is being
sought is a probability function which may take any value from 0 to 1.
Criterion for minimisation of error
In regression, the output is the sum of a function f(x) of the input and some random error denoted by : y
= f(x) + €
Here the function f(x) is unknown and we would like to approximate it by some estimator g(x, θ)
containing a set of parameters θ.
we can apply the method of maximum likelihood estimation to estimate the values of the parameter θ.
The values of θ which maximizes the likelihood function are the values of θ that minimizes the
following sum of squares:
E(θ) = (y1 − g(x1, θ))2 + ⋯ + (yn − g(xn, θ))2
The method of finding the value of θ as that value of θ that minimizes E(θ) is known as the ordinary least squares
method.
Simple linear regression
Let x be the independent predictor variable and y the dependent variable. Assume that we
have a set of observed values of x and y:
A simple linear regression model defines the relationship between x and y using a line
defined by an equation in the following form:
y = α + βx
In order to determine the optimal estimates of α and β, an estimation method known as
Ordinary Least Squares (OLS) is used.
Ordinary Least Squares (OLS)
In the OLS method, the values of y-intercept and slope are chosen such that they minimize the
sum of the squared errors; that is, the sum of the squares of the vertical distance between the
predicted y-value and the actual y-value
find the values of α and β such that E is minimum.
Using methods of calculus, the values of a and b, which are respectively the
values of α and β for which E is minimum, can be obtained by solving the
following equations
Formulas to find a and b
Obtain a linear regression for the data in the table ,assuming
that y is the independent variable
the linear regression model for the data is y = 0.785 + 0.425x.
Applications of linear regression with one
variable
Example 1: Predicting House Prices Suppose you have a dataset of house prices and
their corresponding areas (in square feet). You want to predict the price of a house based
on its area using linear regression.
Example 2: Exam Score vs. Study Time Suppose you're analyzing the relationship
between the amount of time students spend studying for an exam and their exam scores.
Example 3 : The weight of the person is linearly related to their height
Polynomial regression
If the relationship between
dependent and independent
variables is linear , then we can
use a straight line to fit the given
data
Quadratic polynomial Regression
A polynomial regression model defines the relationship between x
and y by an equation in the following form:
y = α0 + α1x + α2x2 + ⋯ + αkxk
Let the quadratic polynomial regression model be
y = a0 + a1x + a2x2
The value of a0 ,a1 and a2 are calculated using the following system
of equations:
a0 = 12.4285714 a1 = −5.5128571 a2 = 0.7642857
The required quadratic polynomial model is
y = 12.4285714 − 5.5128571x + 0.7642857x 2
y = a0 + a1x + a2x2
The coefficients of a0, a1 & a2 are calculated using the formula:
a= X-1 B
y = -0.75 + 0.95x + 0.75x2
Linear regression with multiple variables
In simple linear regression we have one independent and one
dependent variable
Multiple linear regression model involves multiple predictors or
independent variables and one dependent variable(response variable)
This is an extension of simple linear regression
We assume that there are N independent variables x 1, x2, ⋯, xN . Let
the dependent variable be y
Let there also be n observed values of these variables
The multiple linear regression model defines the relationship between the N
independent variables and the dependent variable by an equation of the
following form:
y = β0 + β1x1 + ⋯ + βN xN
Ordinary least squares method is used to obtain the optimal estimates of
β0, β1, ⋯, βN
Multiple Linear Regression
Multiple Linear Regression -Example
Fit a multiple linear regression model to the following data:
Here n = 2 and N = 4
The multiple linear regression model for this problem has the form
y = β0 + β1x1 + β2x2.
The regression plane for the data