Lecture Note - I
Instrument Variable Regression
Krishna Ram∗
April 5,2020
Last updated on April 16, 2020
∗
This lecture note is heavily based upon Wooldridge (2009) Econometrics book, Ch -15
1
1 Introduction
We use Instrument Variable (IV) regression model to get rid from the problem
of endogeneity in the regression model. As we all know when regression model
suffers from the problem of endogeneity problem we get biased and inconsis-
tent estimates of the parameters. Endogeneity problem occur due to omitted
variable, simultaneous causality, sample selection problem, functional form
misspecification, and measurement error in explanatory variable. When re-
gression model suffers with this problem,the model violates one important
assumption of classical linear regression model (CLRM), i.e. E(Ui , Xi ) = 0.
In this lecture note we will see how the method of IV can be used to
solve the problem of endogeneity. The instrument variable method is also
called Two Stage Least Square(2SLS)method. The 2SLS method is second
in popularity after OLS method for estimating linear regression in applied
econometrics. we begin with endogeneity problem in case of simple/bivariate
linear regression model. Later, we extend it to the multiple linear regression
model.
2 IV in a Simple Linear Regression
Consider a regression equation
yi = β1 + β2 xi + ui (1)
Where,y is wage earning, x is level of education, and u is residual
Now, consider that the above regression equation is suffers from the omit-
ted variable bias. I mean, consider a variable called ability which affect wage
earning is missing from the model and related to the education. As the vari-
able is missing from the model, the effect of it capture by the error term. Since
ability is related with the education ( higher education -higher ability), we
find a correlation between error term and explanatory variable. This means,
regression model suffers with the problem of endogeneity. If we go ahead
with OLS estimation, without taking into account of endogeneity problem,
we get biased and inconsistent estimates.
Then question arises- what to do?
Answer of it is, first we look for a proxy variable and if we get a suitable
proxy, we can use that in place of ability, and go ahead with OLS procedure
to estimate the value of parameters.
In the above example, if we have an IQ variable ( IQ test score) then we
can use it as a proxy of omitted variable. But, since most of the time we do
not have a suitable proxy variable for an omitted variable. In such situation,
1
we look for a suitable IV, and once we find it, we will use that for estimating
the values of parameters. In order to use a suitable proxy or instrument
variable we require some additional information. The information comes by
way of a new variable that satisfies certain properties. The properties of a
proxy variable we will discuss when we deal with multiple regression model.
So, Let’s first discuss the properties of an instrument variable.
Suppose that we have an observable variable z that satisfies these two
conditions:
1. Cov(z, u) = 0 (2)
2. Cov(z, x) 6= 0 (3)
Then, we call z as an instrument variable for x
The requirement is that the instrument variable, z, must satisfies condi-
tion stated in equation (2). This means that there is no correlation between
error term , u, and instrument variable, z. We summarised by it saying
that ”z” is exogenous in equation. We often called it as instrument exo-
geneity. In the context of omitted variable , instrument exogeneity means-
first, z should have no partial effect on y once x and omitted variable (say,
q. In our example it is ability) included into the model. Second, z should
be uncorrelated with the omitted variable,q. The equation (3) implies that
instrument variable must be correlated (either positively or negatively) with
the endogenous variable x. The condition is sometimes referred to as in-
strument relevance. So, before proceeding further, our immediate task is
to check these two conditions. The point to note that first condition is not
testable , but the second is. We will use economic theory, expert’s opinion
to check the validity of first condition.
For checking the second condition, we just require to estimate coefficient
value of z regressing x on z and test the null hypothesis, H0 : θ1 6= 0.
xi = θ0 + θ1 zi + vi (4)
where, by definition of linear projection error, E(vi )=0,and vi is uncorrelated
with z. In a simple linear regression case θˆ1 =Cov(z1 , xk )/Var(z1 ). We can
use the t statistics to check the above null hypothesis.
Now for time being let’s suppose that z1 satisfies both conditions and can
be used as an IV for an endogenous variable, x.
The next question arises, how will we use it for estimating the population
parameters? Answer is- in one approach, we directly plug equation 4 into
equation 1 and use OLS to estimate the parameter. In this case,
Cov(z, y)
β̂2IV = (5)
Cov(x, z)
2
How? [ Try it by yourself ]
In an alternative approach, we use 2SLS procedures.
Steps for 2SLS are:
Step 1:
we estimate the equation (4), and test the H0 : θ1 6= 0. If we reject the
null hypothesis, we obtain predicted value of x, say, X̂.
Step 2:
we use X̂ as an explanatory variable in structural equation to estimate
the value of βk .
yi = β1 + β2 x̂ + ui (6)
Where, Pn
(x̂i − x̂)(y ¯
¯ i − y)
β̂2SLS = i=1
Pn ¯2 (7)
i=1 (x̂i − x̂)
Since,
x̂ = θˆ0 + θˆ1 zi
Substituting this into the equation 6gives us,
Pn ¯
((θˆ0 + θˆ1 zi ) − (θˆ0 + θˆ1 z̄)(yi − y))
β̂2SLS = i=1 Pn ˆ ˆ ˆ ˆ 2
i=1 ((θ0 + θ1 zi ) − (θ0 + θ1 z̄))
for convenience, now onward we do not write the upper and lower ranged
values unless required. by default it should be considered as 1 to n solving
the above, Pˆ
ˆ = P θ1 (zi − z̄)(yi − ȳ)
β2SLS
θ12 ((zi − ¯(z))2
P
1 (zi − z̄)(yi − ȳ)
=
ˆ ((zi − ¯(z))2
P
θ1
Substituting the value of θˆ1 from the above and solving the equation,
P
(zi − z̄(yi − ȳ
=P
(xi − x̄) − (zi − z̄)
Hence,
Cov(z, y)
β̂2SLS = = β̂2IV
Cov(x, z)
In matrix formulation we can write
(z0 z)−1 z0 y
β̂2IV =
(z0 z)−1 z0 x
3
β̂2IV = (z0 x)−1 z0 y (8)
Finitely,β̂IV estimated is biased and inconsistent,but become unbiased
and consistent when n → ∞ The IV estimator is consistent plim (β̂IV )=β2
and is normally distributed in large sample. you can prove by yourself
why instrument variable is not unbiased in small sample?
2.1 Statistical inference with IV estimator
To make inferences from β̂2IV , we need a standard error that can be used to
compute t statistics and confidence interval. So, for this along with assump-
tion mentioned in eq 2 and eq 3, we assume homeocedasticity assumption
just as in case of OLS. Now , the homoscedasticity assumption is stated con-
ditional on the instrument variable, z, not the endogenous variable , x. so
we add,
E(u2 |z) = σ 2 = V ar(u). (9)
it can be shown that , under (2) (3) and (7), the asymptotic variance of β2IV is
σ2
(10)
nσx2 ρ2x,z
where,σx2 is population variance of x. σ 2 is the population variance and ρ2x,z
is square of correlation between x and z. This tells us how highly correlated
x and z are. As like OLS estimator , the asymptotic variance of IV estima-
tor decreases to zero at the rate of 1/n. All quantities in equation 10 are
consistently estimated by given random sample. To estimate σx2 , we simply
compute the sample variance of xi . To estimate ρ2xz we can run a regression
of xi on zi . Finally to estimate σ 2 , we can use the IV residuals. so,
n
2 1 X 2
σ̂ = ûi
n − 2 n=1
where
ûi = yi − βˆ1 + βˆ2 x̂
βˆ1 and βˆ2 are IV estimators.
σ̂ 2
V ar(β̂2IV ) = 2
(11)
T SSx Rxz
2
given that Rxz < 1, V ar(β̂2IV ) > V ar(β̂2OLS ) This mean lower the corre-
ˆ . The lower correlation
lation between (x , z), higher the variance of β2IV
between x and z mean that we are having a weak IV.
4
2.1.1 Weak Instrument/Irrelevant Instrument
1. Irrelevant Instrument
IV is irrelevant when we do not reject H0 : θ1 = 0 against HA : θ1 6= 0.
As we know,
Cov(y, z)
β̂2IV =
Cov(z, x)
In case of irrelevant but exogenous instrument, both denominator and nu-
merator in above equation are zero.
2. Weaker Instrument
If IV is not irrelevant, but Cov(xi , zi ) tend to zero.
1. Sampling distribution of β2SLS is not normal.
2. β2SLS is severely bias in direction of OLS estimator even in large
sample. Therefore, we always check whether the IV is relevant enough or
not.
How to test weaker IV? for this we estimate reduced form regression
equation (4) and calculate F statistics. Only if when calculated F statistics
is less than 10 we can say that IV is a weak Instrument (Stock and Watson,
2018, pg. 490).
Stock and Yogo (2005) argue that IV is weak if IV bias is more than 10
percent OLS (Stock and Watson, 2018, pg. 517). Staiger and Stock (1997)
show that in a simple model Ff 1irst provides approximate estimates of finite
sample bias of β2SLS relative to β2OLS . Ff 1irst be the F statistics resulting
from reduced form regression equation (4).
Consequences of weak IV
1. Weak IV causes asymptotic bias.
2. t statistics and confidence interval become unreliable.
3. IV is not necessarily better than OLS.
How?
cov(z, u)
plim(β̂2IV ) = β2 + (12)
Cov(x, z)
cov(z, u) cov(x, z)
Since, corr(z, u) = ; corr(x, z) =
σz σx σx σz
Substuting these into above equation 12, we get
corr(z, u)σu
plim(β̂2IV ) = β2 + (13)
corr(x, z)σx
The above equation indicates that even if low corr(z,u) and low corr(x,z)
causes larger inconsistency. It is often recommended to use OLS rather than
5
IV estimator if inconsistency occurred due to weak IV is larger than OLS.
How do we know that IV biasedness is grater than OLS? Answer is-
we compare biasedness of each estimator.
For a simple linear regression equation:
Cov(xu)
β̂2OLS = β2 + (14)
V ar(x)
Converting covariance formula into correlation term,
σu
β̂2OLS = β2 + corr(x, u) (15)
σx
So, Asymptotic bias of IV is less than OLS only if
corr(z, u)
< corr(x, u) (16)
corr(x, z)
If corr(x,z) is small, then seemingly small corr(z,u) can be magnified and
make IV worse than OLS.
For example, let’s consider corr(x,z)=0.2
From equation (16), for asymptotic bias of IV estimator less than OLS,
corr(z,u) must be 1/5 of corr(x,u).However, in practice:
i) in many situation we get corr(x,z) < 0.2
ii) As we rarely have information about corr (x,u) and corr(z,u),we
never know for sure which estimator has larger asymptotic bias.
2.2 IV and R-Square
RSS
R2 = 1 − (17)
T SS
Some important findings:
2 2
1. RIV < ROLS
2
[Link] may be negative
2
[Link] is not very useful
2
4. We can’t use RIV value to calculate F statistics of joint restric-
tion. We will discuss more about it in when we deal with multiple regression
model.
Important Notes
1. If goal is to get higher R2 value → OLS is appropriate.
2. IV method is intended to provide better estimates of parameter when
x and u are correlated. R2 is not a factor.
3. High R2 in OLS has a little use if we can’t estimate consistent param-
eter.
4. When x and u are correlated, we can’t decompose the variance of y
into β22 var(x) + var(u). Hence, the R2 has no natural interpretation.
6
References
Staiger, D. and Stock, J. (1997). Instrumental variables regression with weak
instruments. Econometrica, 65(3):557–586.
Stock, J. and Yogo, M. (2005). Asymptotic distributions of instrumental
variables statistics with many instruments. Identification and inference
for econometric models: Essays in honor of Thomas Rothenberg, pages
109–120.
Stock, J. H. and Watson, M. W. (2018). Introduction to econometrics.
Wooldridge, J. M. (2009). Econometrics. CENGAGE Learning, New Delhi.