CHAPTER THREE
The Multiple Linear Regression Model
❑ In the previous chapter, we discussed the simple linear
regression in which we considered a dependent variable to be
a function of one independent variable.
❑ In real life, however, a dependent variable is a function of
many explanatory variables.
❑ The logic behind the multiple regression is, therefore, to study
many explanatory variables that explain a dependent variable.
❑ For instance, in demand studies we study the relationship
between quantity demanded of a good and price of the good,
price of substitute goods and the consumer’s income.
❑ The model we assume is:
y = b0 + b1x1 + b2x2 + b3x3 + u
Where = Y quantity demanded, X1 is price of the good, X2 is
price of substitute goods, X3 is consumer’s income, and ' s
are unknown parameters and U is the disturbance
The disturbance term is of similar nature to that in simple
regression, reflecting:
✓the basic random nature of human responses
✓ errors of aggregation
✓ errors of measurement
✓ errors in specification of the mathematical form of the model
and
✓Any other (minor) factors, other than xi that might influence Y.
3.2 ASSUMPTIONS OF MULTIPLE
REGRESSION MODEL
❑ In order to specify our multiple linear regression model and
proceed our analysis with regard to this model, some
assumptions are compulsory.
❑ But these assumptions are the same as in the single
explanatory variable model developed earlier except the
assumption of no perfect multicollinearity.
These assumptions are:
1. Randomness of the error term: The variable u is a real random variable.
2. Zero mean of the error term: E (Ui) 0
3. Homoscedasticity: The variance of each Ui is the same for all the xi
values. i.e. E(Ui2) = 2 (constant)
u
4. Normality of u: The values of each Ui are normally distributed.
i.e. Ui ~ N (0, 2 )
5. No auto or serial correlation: The values of Ui (corresponding to Xi) are
independent from the values of any other Ui (corresponding to Xj ) for
i j.
i.e. E (ui uj) 0 for xi j
6. Independence of ui and Xi : Every disturbance term ui is
independent of the explanatory variables
This condition is automatically fulfilled if we assume that the
values of the X’s are a set of fixed numbers in all
(hypothetical) samples.
7. No perfect multicollinearity: The explanatory variables are
not perfectly linearly correlated.
3.3 A MODEL WITH TWO EXPLANATORY
VARIABLES
Estimation of parameters of two-explanatory variables model
Example:-Consider the following Model
The model:
Where is referred to as the intercept and and are slopes of
the regression. Note that, for example measures the effect on Y
of a unit change in X2 when X1 is held constant.
Then residuals are
ε = Y - Ŷ =Y - βˆ0 - βˆ1x1 - βˆ2x2
So, the sum of squared residuals to be minimized is given by
To find the parameter estimators which minimize the residual
sum of squares
❑ We differentiate the equation with respect to βˆ0, βˆ1, βˆ2
❑ Set the first derivatives to zero, and
❑ Solve for each of these
Note:
❑ In a multiple linear regression analysis involving a large
number of explanatory variables, the computations are
tedious.
❑ Fortunately, there are a number of computer packages
readily available for such analysis
❑ Thus, one does not need to go through the details of the
calculations involved.
3.3.2 . The coefficient of determination and test of model
adequacy
❑ As in simple regression, R2 is also viewed as a measure of the
prediction ability of the model over the sample period, or as a
measure of how well the estimated regression fits the data.
Total sum of squares (TSS)
Regression (explained) sum of squares (RSS)
Error (unexplained) sum of squares (ESS)
❑ R-squared is bounded between zero and one (inclusive)
❑ The largest value that R-squared can assume is 1 (in which
case all observations fall on the regression line, plane or
surface).
❑ The smallest it can assume is zero.
❑ For instance, the relationship between quantity
demanded of a good and price of the good, price of
substitute goods and the consumer’s income.
❑ If The value of R-squared is 0.914.
❑ This indicates that 91.4% of the variation (change) in
demand is attributed to the effect of price, price of
substitute goods and consumer income.
❑ The remaining 8.6% of the variation in demand is due to
factors which are not included in our model.
❑ A small value of R- squared is an indication that
X1,X2,X3……….Xk are a poor explanatory variable in the sense
that their variation Y un affected, or
❑ X1,X2,X3……….Xk are relevant variable but their influence on
Y is weak as compared to some other variables that are omitted
from the regression equation.
❑ The regression equation is miss pacified (fore example an
exponential relationship)
❑ Thus small value of R- Squared cast doubt about the usefulness of
the regression equation.
However ,we do not pass the final judgement on the equation
until it has been subject to an objective statistical test.
A test of adequacy of the multiple linear regression model is
conducted through testing the hypothesis
Ho:β1=β2…..βK = 0
H1:Ho is not true ( at least one β #0)
❑ The null hypothesis states that all regression coefficients
are insignificant.
❑ This is equivalent to saying that none of the explanatory
variables explains the dependent variable.
❑ If the null hypothesis is not rejected, then such a model is
inadequate.
❑ The above test is accomplished by means of analysis
of variance
❑ If regression model is adequate, the explained
variation (RSS) should be considerably higher than
the unexplained variation ( Ess)
❑ This is equivalent to saying that the test statistic for
testing Ho versus H1 given by the variance ratio
Fcal = RSS(K-1)
Ess(n-k ) should be large
❑ In the formula of the test statistic, k is the number of
parameters (regression coefficients) estimated from the
sample data and n is the sample size.
❑ This test statistic is significant (that is, the null hypothesis is
rejected) if it exceeds the critical value from the F-distribution
with (k-1) and (n-k) degrees of freedom for a given
significance level .
❑ Consider if The value of F-statistic is 63.98 and the p-value,
Prob(F-statistic) in the output, is less than 0.01.
❑ Thus, we reject the null hypothesis at the 1% level of
significance.
❑ We conclude that there is a significant linear relationship
between demand and the explanatory variables price, price of
substitute goods and consumer income.
Hypothesis Testing in Multiple Regression Model
❑ In multiple regression models we will undertake different tests of
significance.
❑ One is significance of individual parameters of the model.
❑ This test of significance is the same as the tests discussed in simple
regression model.
❑ The second test is overall significance of the model.
TESTS OF INDIVIDUAL SIGNIFICANCE
❑ we can use either the t-test or standard error test to test a
hypothesis about any individual partial regression coefficient.
❑ To illustrate consider the following example.
❑ The null hypothesis (A) states that, holding X2 constant X1 has no
(linear) influence on Y.
❑ Similarly, hypothesis (B) states that holding X1 constant, X2 has
no influence on the dependent variable Yi.
❑ To test these null hypothesis, we will use the following tests:
❑ Standard error test: under this and the following testing methods
we test only for ˆ . The test for ˆ will be done in the same
way.
we accept the null hypothesis that is, we can
conclude that the estimate is not statistically
significant.
we reject the null hypothesis that is, we can
conclude that the estimate is statistically
significant.
Note: The smaller the standard errors, the stronger the evidence
that the estimates are statistically reliable.
THE STUDENT’S T-TEST: WE COMPUTE THE T-
RATIO FOR EACH ˆ
where n is number of observation and k is number of parameters.
If we have 3 parameters, the degree of freedom will be n-3. So;
❑ If t*>t (tabulated), we reject the null hypothesis and we accept
the alternative one; ˆ is statistically significant.
❑ Thus, the greater the value of t* the stronger the evidence that
is statistically significant.
TESTING THE OVERALL SIGNIFICANCE OF
A REGRESSION
This test aims at finding out whether the explanatory variables (X1, X2, …Xk)
do actually have any significant influence on the dependent variable. The test
of the overall significance of the regression implies testing the null hypothesis
H0: 1 = 2 = … = k = 0
Against the alternative hypothesis
H1: not all i ’s are zero
If the null hypothesis is true, then there is no linear relationship between y
and the regressors.
The above joint hypothesis can be tested by the analysis of variance (AOV)
technique..
Therefore to undertake the test first find the calculated value of F and
compare it with the F tabulated. The calculated value of F can be obtained
by using the following formula
where k – 1 refers to degrees of freedom of the numerator
n – k refers to degrees of freedom of the denominator
k – number of parameters estimated
Decision Rule: If Fcalculated > Ftabulated (F (k – 1, N – k)), reject
H0: otherwise, you may accept it, where F (k – 1, N – k) is the
critical F value at the level of significance and (k – 1)
numerator df and (N – k) denominator df.
Note that there is a relationship between the coefficient of
determination R2 and the F test used in the analysis of variance.
When R2 = 0, F is zero. The larger the R2, the greater the F value.
In the limit, when R2 = 1, F is infinite. Thus the F test, which is a
measure of the overall significance of the estimated regression, is
also a test of significance of R2. Testing the null hypothesis is
equivalent to testing the null hypothesis that (the population) R2
is zero.
The F test expressed in terms of R2 is easy for computation.