Regression Analysis
Estimation and Interpretation Of Regression
Equation
Dummy Independent Variable
Instructor
Taimoor Naseer Waraich
Parallels with Simple Regression
• b0 is still the intercept
• b1 to bk all called slope parameters
• u is still the error term (or disturbance)
• Still need to make a zero conditional mean assumption, so now
assume that
• E(u|x1,x2, …,xk) = 0
• Still minimizing the sum of squared residuals, so have k+1 first
order conditions
Interpreting Multiple Regression
yˆ ˆ0 ˆ1 x1 ˆ 2 x2 ... ˆ k xk , so
yˆ ˆ x ˆ x ... ˆ x ,
1 1 2 2 k k
so holding x2 ,..., xk fixed implies that
yˆ ˆ x , that is each has
1 1
a ceteris pa ribus interpreta tion
A “Partialling Out” Interpretation
Consider t he case where k 2, i.e.
yˆ ˆ ˆ x ˆ x , then
0 1 1 2 2
ˆ1 rˆi1 yi rˆ2
i1 , where rˆi1 are
the residuals from the estimated
regression xˆ1 ˆ0 ˆ2 xˆ 2
“Partialling Out” continued
• Previous equation implies that regressing y on x1 and x2 gives
same effect of x1 as regressing y on residuals from a regression
of x1 on x2
• This means only the part of xi1 that is uncorrelated with xi2 are
being related to yi so we’re estimating the effect of x1 on y after
x2 has been “partialled out”
Simple vs Multiple Reg Estimate
~ ~ ~
Compare the simple regression y 0 1 x1
with the multiple regression yˆ ˆ0 ˆ1 x1 ˆ 2 x2
~
Generally, 1 ˆ1 unless :
ˆ 0 (i.e. no partial effect of x ) OR
2 2
x1 and x2 are uncorrelat ed in the sample
Goodness-of-Fit
We can think of each observatio n as being made
up of an explained part, and an unexplaine d part,
yi yˆ i uˆi We then define the following :
y y is the total sum of squares (SST)
2
i
yˆ y is the explained sum of squares (SSE)
2
i
uˆ is the residual sum of squares (SSR)
2
i
Then SST SSE SSR
Goodness-of-Fit (continued)
• How do we think about how well our sample regression line
fits our sample data?
• Can compute the fraction of the total sum of squares (SST) that
is explained by the model, call this the R-squared of regression
• R2 = SSE/SST = 1 – SSR/SST
More about R-squared
• R2 can never decrease when another independent variable is
added to a regression, and usually will increase
• Because R2 will usually increase with the number of
independent variables, it is not a good way to compare models
Assumptions for Unbiasedness
• Population model is linear in parameters: y = b0 + b1x1 + b2x2 +…+
bkxk + u
• We can use a random sample of size n, {(xi1, xi2,…, xik, yi): i=1, 2, …,
n}, from the population model, so that the sample model is yi = b0 +
b1xi1 + b2xi2 +…+ bkxik + ui
• E(u|x1, x2,… xk) = 0, implying that all of the explanatory variables are
exogenous
• None of the x’s is constant, and there are no exact linear relationships
among them
Omitted Variable Bias
Suppose the true model is given as
y 0 1 x1 2 x2 u , but we
~ ~ ~
estimate y x u ,
0 1 1
Summary of Direction of Bias
Corr(x1, x2) > 0 Corr(x1, x2) < 0
b2 > 0 Positive bias Negative bias
b2 < 0 Negative bias Positive bias
Omitted Variable Bias Summary
• Two cases where bias is equal to zero
– b2 = 0, that is x2 doesn’t really belong in model
– x1 and x2 are uncorrelated in the sample
• If correlation between x2 , x1 and x2 , y is the same direction,
bias will be positive
• If correlation between x2 , x1 and x2 , y is the opposite
direction, bias will be negative
Dummy Variables
• A dummy variable is a variable that takes on the value 1 or 0
• Examples: male (= 1 if are male, 0 otherwise), south (= 1 if in
the south, 0 otherwise), etc.
• Dummy variables are also called binary variables, for obvious
reasons
A Dummy Independent Variable
• Consider a simple model with one continuous variable (x) and
one dummy (d)
• y = b0 + d0d + b1x + u
• This can be interpreted as an intercept shift
• If d = 0, then y = b0 + b1x + u
• If d = 1, then y = (b0 + d0) + b1x + u
• The case of d = 0 is the base group
Example of d0 > 0
y y = (b0 + d0) + b1x
d=1
slope = b1
d0
{ d=0
y = b0 + b1x
}
b0
x
Dummies for Multiple Categories
• We can use dummy variables to control for something with
multiple categories
• Suppose everyone in your data is either a HS dropout, HS grad
only, or college grad
• To compare HS and college grads to HS dropouts, include 2
dummy variables
• hsgrad = 1 if HS grad only, 0 otherwise; and colgrad = 1 if
college grad, 0 otherwise
Multiple Categories (cont)
• Any categorical variable can be turned into a set of dummy
variables
• Because the base group is represented by the intercept, if there
are n categories there should be n – 1 dummy variables
• If there are a lot of categories, it may make sense to group
some together
• Example: top 10 ranking, 11 – 25, etc.
Interactions Among Dummies
• Interacting dummy variables is like subdividing the group
• Example: have dummies for male, as well as hsgrad and
colgrad
• Add male*hsgrad and male*colgrad, for a total of 5 dummy
variables –> 6 categories
• Base group is female HS dropouts
• hsgrad is for female HS grads, colgrad is for female college
grads
• The interactions reflect male HS grads and male college grads
More on Dummy Interactions
• Formally, the model is y = b0 + d1male + d2hsgrad + d3colgrad
+ d4male*hsgrad + d5male*colgrad + b1x + u, then, for example:
• If male = 0 and hsgrad = 0 and colgrad = 0
• y = b 0 + b 1x + u
• If male = 0 and hsgrad = 1 and colgrad = 0
• y = b0 + d2hsgrad + b1x + u
• If male = 1 and hsgrad = 0 and colgrad = 1
• y = b0 + d1male + d3colgrad + d5male*colgrad + b1x + u
Other Interactions with Dummies
• Can also consider interacting a dummy variable, d, with a
continuous variable, x
• y = b0 + d1d + b1x + d2d*x + u
• If d = 0, then y = b0 + b1x + u
• If d = 1, then y = (b0 + d1) + (b1+ d2) x + u
• This is interpreted as a change in the slope
Example of d0 > 0 and d1 < 0
y
y = b0 + b1x
d=0
d=1
y = (b0 + d0) + (b1 + d1) x
x
Testing for Differences Across Groups
• Testing whether a regression function is different for one group
versus another can be thought of as simply testing for the joint
significance of the dummy and its interactions with all other x
variables
• So, you can estimate the model with all the interactions and
without and form an F statistic, but this could be unwieldy
The F-test
• The F-test is an analysis of the variance of a regression
• It can be used to test for the significance of a group of variables
or for a restriction
• It has a different distribution to the t-test, but can be used to test
at different levels of significance
• When determining the F-statistic we need to collect either the
residual sum of squares (RSS) or the R-squared statistic
• The formula for the F-test of a group of variables can be
expressed in terms of either the residual sum of squares (RSS)
or explained sum of squares (ESS)
F-test of explanatory power
• This is the F-test for the goodness of fit of a regression and in
effect tests for the joint significance of the explanatory variables.
• It is based on the R-squared statistic.
• It is routinely produced by most computer software packages.
• It follows the F-distribution, which is quite different to the t-test
F-test formula
• The formula for the F-test of the goodness
of fit is:
2
R / k 1
F 2
(1 R ) /( n k )
k 1
Fnk
F-distribution
• To find the critical value of the F-distribution, in general you
need to know the number of parameters and the degrees of
freedom.
• The number of parameters is then read across the top of the
table, the d of f. from the side. Where these two values
intersect, we find the critical value.
F-test critical value
1 2 3 4 5
1 161.4 199.5 215.7 224.6 230.2
2 18.5 19.0 19.2 19.3 19.3
3 10.1 9.6 9.3 9.1 9.0
4 7.7 7.0 6.6 6.4 6.3
5 6.6 5.8 5.4 5.2 5.1
F-statistic
• When testing for the significance of the goodness of fit, our
null hypothesis is that the explanatory variables jointly equal 0.
• If our F-statistic is below the critical value we fail to reject the
null and therefore we say the goodness of fit is not significant.
Joint Significance
• The F-test is useful for testing a number of hypotheses and is
often used to test for the joint significance of a group of
variables.
• In this type of test, we often refer to ‘testing a restriction’.
• This restriction is that a group of explanatory variables are
jointly equal to 0
F-test for joint significance
• The formula for this test can be viewed as:
Improvement in fit/
Extra degrees of freedom used up
Residual sum of squares remaining/
Degrees of freedom remaining
F-tests
• The test for joint significance has its own
formula, which takes the following form:
RSS R RSSu / m
F
RSSu / n k
m number of restrictio ns
k parameters in unrestrict ed mod el
RSSu unrestrict ed RSS
RSS R restricted RSS
Joint Significance of a group of variables
• To carry out this test you need to conduct two separate OLS
regression, one with all the explanatory variables in
(unrestricted equation), the other with the variables whose joint
significance is being tested, removed.
• Then collect the RSS from both equations.
• Put the values in the formula
• Find the critical value and compare with the test statistic. The
null hypothesis is that the variables jointly equal 0.
Joint Significance
• If we have a 3 explanatory variable model and wish
to test for the joint significance of 2 of the variables
(x and z), we need to run the following restricted and
unrestricted models:
yt 0 1wt ut restricted
yt 0 1wt 2 xt 3 zt ut
unrestrict ed
Example of the F-test for joint significance
• Given the following model, we wish to test the joint
significance of w and z. Having estimated them, we collect
their respective RSSs (n=60).
yt 0 1 xt 2 wt 3 zt ut unrestrict ed
RSSu 0.75
yt 0 1xt vt restricted
RSS R 1.5
Joint significance
where : 0 , 0 are constants.
1.... 3 , 1 are slope parameters .
ut , vt are error term s
xt , wt , zt are explanator y variables
Joint significance
H 0 : 2 3 0
H1 : 2 3 0
• As the F statistic is greater than the critical
value (28>3.15), we reject the null
hypothesis and conclude that the variables
w and z are jointly significant and should
remain in the model.
Linear Probability Model
• P(y = 1|x) = E(y|x), when y is a binary variable, so we can
write our model as
• P(y = 1|x) = b0 + b1x1 + … + bkxk
• So, the interpretation of bj is the change in the probability of
success when xj changes
• The predicted y is the predicted probability of success
• Potential problem that can be outside [0,1]
Linear Probability Model (cont)
• Even without predictions outside of [0,1], we may estimate
effects that imply a change in x changes the probability by
more than +1 or –1, so best to use changes near mean
• This model will violate assumption of homoskedasticity, so
will affect inference
• Despite drawbacks, it’s usually a good place to start when y is
binary