0% found this document useful (0 votes)
7 views42 pages

Understanding Multiple Linear Regression

Chapter 3 discusses multiple linear regression models, which involve one dependent variable and multiple explanatory variables, allowing for the analysis of complex relationships in economic outcomes. It covers the derivation of Ordinary Least Squares (OLS) estimators, the interpretation of partial regression coefficients, and the importance of the coefficient of determination (R-squared) in assessing model fit. The chapter also includes a numerical example to illustrate the estimation of regression coefficients and the significance testing of individual and joint effects of explanatory variables.

Uploaded by

yimer
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views42 pages

Understanding Multiple Linear Regression

Chapter 3 discusses multiple linear regression models, which involve one dependent variable and multiple explanatory variables, allowing for the analysis of complex relationships in economic outcomes. It covers the derivation of Ordinary Least Squares (OLS) estimators, the interpretation of partial regression coefficients, and the importance of the coefficient of determination (R-squared) in assessing model fit. The chapter also includes a numerical example to illustrate the estimation of regression coefficients and the significance testing of individual and joint effects of explanatory variables.

Uploaded by

yimer
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Chapter 3.

Multiple Linear
Regressions
3.1 Introduction
Multiple regression model is a model that contains more than
two variables.
In simple regression model we have two variables ( k = 2).
One explanatory variable (regressor) and one dependent
variable (regressand). Simple regression model is often
inadequate for many factors or variables simultaneously
affect a given economic outcome.
• Multiple regression models contain number of variables of
which one dependent variable (regressand)and number of
explanatory variables (regressors). The number of variables
to be contained in multiple regression is not limited as such;
it depends on the nature of the problem and the
researcher’s decisions.
3.1 Introduction
For example, the following is a 4 variables model (4
parameters including the intercept)

K–1 = 3 explanatory variables with 3 coefficients


(excluding coefficients), and One dependent variable.
The coefficients , , are called Partial Regression
Coefficients. The interpretations of the β coefficients are
different from the case in simple regression. In multiple
regressions Model single coefficient can only be
interpreted under ‘ceteris paribus conditions’ (other
things being constant).
3.1 Introduction
- measures change in Y due to per unit change in
while
holding value of constant. In other words, is a
partial regression coefficient that measures the
effects of explanatory variable on the dependent
variable while effects all other variables kept
constant.
- Partial regression coefficient that measure the effects
of the explanatory variable on the dependent
variable(Y) while affects all other variables kept
constant (.
3.1 Introduction
- Partial regression coefficient that measure the effects
of
the explanatory variable on the dependent
variable(Y) while affects all other variables kept
constant (.
– is still intercept coefficient
• An important consequence of the ceteris paribus
condition is that it is not possible to interpret a
single coefficient in a regression model without
holding effects of other variables in the model
constant.
• In multiple regression model, we still use the OLS
3.1 Introduction
All the CLRM assumptions we have already discussed are
also maintained here. These include:
• Linearity in parameters of the model
• Zero expected value of the error term, = 0
• No serial correlation between error terms, = = 0
• Homoscedasticity, - variance of the error term is
constant
• Zero covariance between error term and each
explanatory variable:
= 0 and = 0
• No specification bias. The model is correctly specified
3.2. Multiple Regression with three variables
we write the three variables PRF as:

Deriving OLS Estimators for three variable regression


model - To find the OLS estimators, let us first write the SRF
corresponding to the PRF as follows:
+ ++
The OLS estimators are obtained in a way squared sum of the
residuals (RSS) from the estimation is as minimum as possible.
This done following the economic optimization conditions; FOC-
first order Conditions where partial derivative of the objective
function with respect to the three parameters has to be zero,
separately.
3.2. Multiple Regression with three variables
Problem: min =
FOCs are:
1) = 0 2) =0 3) = 0

1) FOC – first take the partial derivative w.r.t.


= (sum is over n)
- dividing both sides by -2
3.2. Multiple Regression with three variables
=0
- =0
- - = 0 – ( after summing over n)
= -
Divide both sides by n (sample size) to get the formula:

= - ………… [1] - Estimator of the true


3.2. Multiple Regression with three variables
2) Take the partial derivative w.r.t.
=

- divide both sides by -2


3.2. Multiple Regression with three variables

(Note: = ; and Substitute for from equation [1]


3.2. Multiple Regression with three variables

Hence, we get:

• In deviation form we can write the following equation:


………………..[2]
3.2. Multiple Regression with three variables
3) Take the partial derivative w.r.t.
=
• Following a similar procedure used above, estimator of given as
follows:
---- [3]
3.2. Multiple Regression with three variables
• Following a similar procedure used above, the deviation form of
estimator of given as follows:
………..[4]
By substituting equation 4 (for ) into equation [2] we can
derive formula the point estimate of the partial coefficient as
follows:
3.2. Multiple Regression with three variables

Write this as follows

, then substitute for


3.2. Multiple Regression with three variables
-
= -

Note - This is a point estimate of the partial coefficient . This formula is


when is not known. However, if is known or already determine we can
simple use equation [2]; Simply substitute the known value of in [2] to get
estimate of
3.2. Multiple Regression with three variables

• Know let’s go back to equation [4], once we have determined value of ,


we can insert it into [4] to get the partial estimate of coefficient .
Otherwise, follow the above procedures or steps to derive a formula for
estimator, , which looks like as follows:
3.2.1 Variances and Standard Errors of OLS Estimators

a) Variance of the residuals or the error terms (


The estimated variance of the error term is computed
from the sample data. It is obtained from RSS divided
by the degree of freedom (n – k). The CLRM also
assume a constant variance of the error term.

is the degree of freedom; where n is sample size


and k is the number of variables in the model.
3.2.1 Variances …..

b)
Hence:
c) ;Hence:
d) Hence:
3.2.2 Important properties/ Relationships

By subtracting from both sides we can derive its deviation


form

Note that =
[a]
Another way of writing the model
( in deviation form)

= +
= +++
= ++ (deviaton form)
3.2.2 Important properties/ Relationships

= + +
=
It can be proved that =
Hence ;
= + - subtract and from the respective side to get
deviation form
- = -+
[b] = + - (in deviation form)
3.2.2 Important properties
Note also that could be written in deviation form as
[c] = + and
= ++
= +2+
[d] + - total variation of the dependent variable
TSS , = ESS = RSS
TSS = ESS + RSS
3.2.2 Important properties
Note: =
= =
( since: & = 0)
[e] Hence: =
= =
=
=
Hence, could be rewritten as:
3.2.2 Important properties
[f] = = RSS
Where = TSS and
= = ESS
[g] = + = ESS
Hence: +
[h] = + = TSS
3.2.2 Coefficient of Determination ()
Measures the goodness - of - fit of the regression
equation
Coefficient of Determination ( ) measures the proportion
of variation in the dependent variable (Y) explained
jointly by explanatory variables in the model (such as X1,
X2, etc.). R square tells to what extent explanatory
variables of the model explain the variability of the
dependent variable. Computed as:
and 0<< 1
= =
3.2.2 Coefficient of Determination ()

Adjusted denoted by
Note that usually increases as the number of
explanatory variable increase while the residual sum of
squared will decrease. This makes it a poor tool for
deciding whether one variable or several variables should
be added to a model. This implies that the goodness-of-fit
of an estimated model depends on the number of
independent (explanatory) variables regardless of
whether they are important or not.
• To eliminate this dependency, we compute the adjusted
(denoted by ) used, which computed by adjusting it to
the degrees of freedom of RSS and TSS:
= ; adjusted R square.
Or equivalently computed as:
= or = 1 -

Numerical example
suppose we have the following econometric model:
= + + +
–Annual income in 1000 of birr,
- level of higher education in years,
- work experience in years.
Suppose we have the following data on the variables of the
model
Numerical example
Obs.
1 30 4 10 = 812, = 1552,
2 20 3 8 = 262, = 141,
3 36 6 11 = 510, = 4772
4 24 4 9
5 40 8 12 = 30, , , n = 5
Sum 150 25 50

= 62 , = 52, = 16,
= 10, = 272 and = 12
Numerical example … cont
Estimate the partial regression coefficients and
intercept of the model based on the data
= = = = - 0.25
Interpretation: based on the data, the partial
regression coefficient of years spent in higher
education () is – 0.25. given other factors constant or
given (work experience is constant) , it showed that a
one year increase in time spent higher education will
decrease annual income by birr 250.
Numerical example … cont
= = = = 5.5
• Interpretation: based on the data, the partial regression coefficient of
years of experience (the variable ) is 5.5. Given other factors being
constant, a one year increase in work experience will increase annual
income by birr 5,500.
= - = 30 – 5.5 10 =

The Estimated sample regression function, SRF will be:


= +
Numerical example … cont
a) =
= = = = 0.75
b)
= 0.75
= 0.75(0.2+40.625) = 0.75 40.825= 30.62
= = 5.533
Numerical example … cont
c) = = 0.46875
0.46875,: = = 0.685
d) = = 0.75
0.75, then: = = 0.866
The t - statistics of individual coefficient
of coefficients are used to test hypothesis about the
statistical significance of individual coefficient.
Here, based on the hypothetical data, t- statistics for each
coefficient is computed below assuming the following null
and alternative hypothesis:
, , ;
; ;
• = = = = - the t- statistics for
• = = = = - 0.365 - the t- statistics for
• = = = = 6.35 - the t- statistics for
Compute the and F- statistics from the data
= = = = 0.994485
= =
== =
=
Compute F- statistics
F - statistics - used test the joint significance the
partial regression coefficients of the model, help
determine weather our model is adequate or not.
The hypothesis will be:
;
F- Statistics …
Compute F- statistics
F - statistics - used test the joint significance the
partial regression coefficients of a model, helps us to
determine weather our model is adequate (efficient) or
not. Used to test overall significance of a model.
F- statistics helps us to measure the efficiency or
adequacy of a model by considering the explanatory
power(ESS) of a model relative to the unexplained part
of the model(RSS).
for our 3 variable model above , the hypothesis for F-
test will be given as:
= + + + Y- annual income in 1000 birr.
;
F- statistics …..
Ho ( null hypothesis) states that both coefficients are
jointly zero. That means, the combined effects of higher
education and work experience (X1 and X2) on annual
income is statistically insignificant or there combined
effect is almost zero. That is, jointly taken, the two
explanatory variables has no significant effect on the
dependent variable.
While H1 (alternative hypothesis) asserted the opposite of
this; the combined effects of theses variables is significant.
That is, education and work experience (X1 and X2)
significantly affect annual income(Y). First Compute F-
statistics as follows ( from the data)
= =
F- statistics ….
Given the strong relation b/n F and R- square, it can also
computed as follows if we know R-square of the given model:

Decision rule:
If F – statistics > F- critical value, reject
If F – statistics <F- critical value; accept or it can’t be
rejected
The F - critical value at , df = (2, 2) is given as = 19 (from F-
table)
F – statistics > F- critical : 180.33 > 19, reject Ho; the
coefficients are jointly significant. Our model is adequate to
work with.
Reporting the regression results
= +
Se = (5.533) (0.685) (0.866)
t = (-4.292) (- 0.365) (6.351)
n = 5, , = 0.9890, = 180.33,
Given , The two tails (from t – table) is 4.303 at .
F - critical value = 19.0 at , df = (2, 2)
Are the coefficients individually significant?
1. Since < t- critical value, or coefficient of X1 is statistically
insignificant, is statistically zero and H0 can’t be rejected. Which
implies that higher education (X1) has no effect on the annual
income of individuals.
Reporting ….
2. Since t- critical value, or coefficient of X2 is statistically
significantly different from zero( reject Ho). Which implies that work
experience (X2) has significant effect on the annual income of
individuals (or on Y).
3. Since < t- critical value, or the intercept term is statistically
insignificant or is statistically zero and H0 can’t be rejected.
Overall significance of the model tested via F- test
;
F- computed from the data = 180.33
F - critical value at , df = (2, 2) is given as = 19.0 (from F- table)
Hence, F- statistics > F – critical (180.33 > 19.0) at 0.05 level of
significance. We conclude that the regression coefficients are jointly
statistically significant (or statistically significantly different from zero).
Our model is adequate to be used for our purpose.
Regressing Y over X1 and X2
Stata 14 out put
Source SS df MS
Model 270.5 2 135.25
Residual 1.5 2 0.75
Total 272 4 68
--------------------------------------------------------------------------
y Coef. Std. Err. t P>|t| [95% Conf.
Interval]
x1 -0.25 0.685 -0.37 0.750 -3.196
2.696
x2 5.5 0.866 6.35 0.024 1.774
9.226
cons -23.75 5.533 -4.29 0.050 -47.558
0.058
Test of significance …
Statistical software provides p- values (as on table in previous slide) for t -
statistics of individual coefficient and F- statistics computed. We can test
hypothesis for individual coefficient (using t- statistics) and F- test for over
all significance based on their respective p - value; without bothering to check
for critical values from tables.
For example, the P- value of F- statistics is about 0.0055 ( less 5%) which is
very small probability. This indicate probability of getting F value greater
than 180.33 is very - very small. We conclude the joint effect of the
explanatory variables is significant, we reject null hypothesis.
Hypothesis test using F- statistics
F- statistics used to test different kinds of hypothesis.
1) We have seen that F- statistics used to test the joint
significance of partial regression coefficients(overall
significance of a model). This is already discussed.
2) Tests of linear restriction imposed on coefficients
3) F- test used to test the marginal contribution of a newly
introduced variable to the overall improvement of a model.
4) Variable exclusion/inclusion tests. It is a test weather certain
explanatory variables have to kept or removed from a
model.
5) Test for Structural Stability of Coefficients (Chow Test) /time
Series Data/F-Test

You might also like