0% found this document useful (0 votes)

46 views73 pages

Understanding Endogeneity in Econometrics

This document discusses models with endogenous explanatory variables in econometrics. It defines endogeneity and explains that ordinary least squares estimators are invalid when endogeneity is present. It then introduces instrumental variable estimation as a method to obtain consistent parameter estimates in the presence of endogeneity. Several examples of potential sources of endogeneity, such as omitted variables, measurement error, and simultaneity, are provided. The properties of instrumental variables and instrumental variable estimation techniques like two-stage least squares are also outlined.

Uploaded by

Waqas Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views73 pages

Understanding Endogeneity in Econometrics

Uploaded by

Waqas Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Econometrics:

Models with Endogenous Explanatory Variables

Burcu Eke

UC3M
Endogeneity

I Given the following linear regression model:

Y = 0 + 1 X1 + 2 X2 + . . . + k Xk +

If E [|X1 , X2 , . . . Xk ] = 0 Xj , then we say that we have

explanatory exogenous variables
If, for some reason such as omission of relevant variables,
measurement errors, simultaneity, etc., Xj is correlated
with , we say that Xj is an endogenous explanatory
variable.

2
Endogeneity

I OLS estimators of the model parameters are invalid

(inconsistent, etc.) under the existence of endogenous
explanatory variables.

I In this topic, we will study how to obtain consistent

estimators of the model parameters in the presence of
endogenous explanatory variables using instrumental
variables and applying the two-stage least squares
estimation (2SLS).

3
Endogeneity Example 1: Measurement Error in
the Explanatory Variables

I Recall Y = 0 + 1 X + , where the classical assumptions

are satisfied, hence:
E [|X ] = 0 E [Y |X ] = L(Y |X ) = 0 + 1 X . Under
this case,
E[] = 0 and C(X , ) = 0
C(Y, X )
0 = E[Y ] 1 E[X ] and 1 =
V (X )

4
Endogeneity: Example 1

I However, X has measurement error. We observe X such

that X = X + v1 , where v1 is the measurement error

I Substituting X = X v1 , we get

Y = 0 + 1 X + 1 v1
| {z }
u

I Then, C(X, u) 6= 0 X is endogenous.

5
Endogeneity Example 2: Omission of
Explanatory Variables

I Recall the case of omitting a relevant variable

I Let Y = 0 + 1 X1 + u, where u = + 2 X2 and 2 6= 0.

Then this model is misspecified by omitting a relevant
variable

I In general, C(X1 , u) 6= 0 X1 is endogenous.

6
Endogeneity Examples

1. Ability is not observed in a wage equation such as:

log(wage) = 0 + 1 educ + 2 ability + .
Since the ability is not observable, we are left with the
following simple regression model:
log(wage) = 0 + 1 educ + u, where the ability is
included in the error term u
If we estimate this model by OLS, we will obtain a biased
and inconsistent estimator of 1 if C(educ, ability) 6= 0.
2. Effect of smoking on wages (ignoring the level of education)
3. Effect of smoking on Cancer (ignoring the physical health
state)

7
Endogeneity Example 3: Simultaneity

I It is quite common that the realizations of distict variables

are economically related.

I This causes the equation for the dependent variable to be a

part of a system of simultaneous equations:
Some of the variables on the right side of the equation of
interest appear as dependent variables in other equations,
and vice versa.

8
Endogeneity Example 3a: Market Equilibrium
Model
I Consider the following system:
Y1 = 1 Y2 + 2 X1 + u1 (Demand)
Y2 = 3 Y1 + 4 X2 + 5 X3 + u2 (Supply)
I The endogenous variables are Y1 =quantity, Y2 =price are
determined by
the exogenous variables, X1 =rent, X2 =salary, and
X3= interest rate
and by the disturbances: u1 =demand shock, and
u2 =supply shock
I The variables Y1 and Y2 , both of which appeared on the
right hand side of the supply and demand equations, are
not orthogonal to their respective shocks:
E[Y1 |Y2 , X1 ] = 1 Y2 + 2 X1 + E[u1 |Y2 , X1 ]
| {z }
6=0
9
Endogeneity Example 3b: Production Function

I If a company is a profit maximizer, or a cost minimizer

The quantities of the inputs are simultaneously determined
with the level of output

The disturbance, that is, the realization of the technologic

shocks, is in general correlated with the quantity of the
inputs.

10
Instrumental Variables

I Instrumental variables (IV) approach allows us to get

consistent estimators of the population parameters when
the OLS estimators are inconsistent (in situations such as
omitting a relevant variable, measurement errors, and
simultaneity)

11
Instrumental Variables

I In general, we have to use the model: Y = 0 + 1 X + ,

where C(X, ) 6= 0 0 and 1 are not the same as the
parameters of the linear projection, L(Y |X)

The OLS estimators (0 and 1 ) of the linear projection

of Y on X are inconsistent estimators of 0 and 1 :
(xi = Xi X), (yi = Yi Y )

12
Instrumental Variables
!
1X
p lim xi yi
n n
i
p lim 1 = !
n 1X 2
p lim xi
n n
i
!
1X
p lim xi (1 xi + i )
n n
i
= !
1X 2
p lim xi
n n
i
!
1 X
p lim x i i
n n
i C(X, )
= 1 + ! = 1 + 6= 1
1X 2 V (X)
p lim xi
n n
13 i
Instrumental Variables: Definition

I In the model:
Y = 0 + 1 X + (1)
, where C(X, ) 6= 0, in order to get consistent estimators
of 0 and 1 , we need additional information in the
form of additional variables.

I Suppose we have a new variable, Z, which is called the

instrumental variable, with the following properties:
1. It is uncorrelated with the error term: (a) C(Z, ) = 0;
2. It is correlated with the endogenous variable X: (b)
C(Z, X) 6= 0.

14
Instrumental Variables: IV Estimation in the
Simple Model

I Using Z as an instrument, we can obtain consistent

estimates 0 and 1 .

I From (1) we get: C(Z, Y ) = 1 C(Z, X) + C(Z, ), which,

given (a), implies that:

C(Z, Y )
1 =
C(Z, X)

C(Z, Y )
0 = E[Y ] 1 E[X] = E[Y ] E[X]
C(Z, X)

15
Instrumental Variables: IV Estimation in the
Simple Model

I Assuming we have a random sample of size n of the

population, and by replacing population moments with the
sample values in the expression above, (principle of
analogy), we get the Instrumental Variable Estimator (IV):
P
SY Z zi yi
1 = = Pi
SXZ i zi x i

0 = Y 1 X

I yi = Yi Y , xi = Xi X, and zi = Zi Z

16
Instrumental Variables: Properties of IV
Estimators in the Simple Model
I Provided that (a) and (b) hold, the IV estimator will be a
consistent estimator:
!
1X
p lim zi yi
n n
i
p lim 1 = !
n 1X 2
p lim zi
n n
i
!
1X
p lim zi (1 zi + i )
n n
i
= !
1X 2
p lim zi
n n
i
C(Z, )
= = 1 + = 1
V (Z)
17
Instrumental Variables: Properties of IV
Estimators in the Simple Model, (a)

I Any instrument or instrumental variable must meet the

two properties: (a) and (b). Regarding to this:
The condition (a): C(Z, ) = 0, can not be confirmed. So
we must depend our choice of Z on economic behavior or
some theory we must be very careful when choosing Z.

18
Instrumental Variables: Properties of IV
Estimators in the Simple Model, (b)

I The condition (b): C(Z, X) = 0 can be tested using the

sample data.
The simplest way is to consider the linear projection of X
on Z: X = 0 + 1 Z + v,
then, estimate this by OLS and perform the following test:
H0 : 1 = 0 vs H1 : 1 6= 0

I Note: If Z = X, we obtain the OLS estimation. That is,

when X is exogenous, can be used as its own instrument,
and the IV estimator is then identical to the OLS
estimator.

19
Instrumental Variables: The Variance of the IV
Estimator

I In general, the IV estimator will have a larger variance

than the OLS estimator.

I To see this, lets drive the estimated variance of the IV

estimator 1 , S2 :
1

2 SZ2 2
S2 V 1 = 2 = 2 S2
1 nSZX nrZX X

SZX
where rZX = , is the sample correlation coefficient
SZ SX
between X and Z, which measures the degree of linear
relationship between X and Z in the sample.

20
Instrumental Variables: The Variance of the IV
Estimator

I Recall the variance of the OLS estimator of 1 , 1 :

2
S2 = 2
1 nSX

1P 2
where 2 =
n i i

I If X is really exogenous, then, the OLS estimators are

consistent, and p lim 2 = p lim 2 = 2 =
n n
Since 0 < |rZX | < 1, this implies that S2 > S2 , and the
1 1
difference will be bigger, the lower the |rZX | is.

21
Instrumental Variables: The Variance of the IV
Estimator

I Therefore, if X is exogenous, using IV estimator instead

of the OLS estimator is costly, in terms of efficiency.

I The lower the correlation between Z and X, the greater

the difference between the IV variance and the OLS
variance, in favor of the OLS variance.
In the case where both 1 and 1 are consistent, then
asymptotically the IV estimator variance and that of the
OLS estimator have the following relationship:

p lim S2 /S2 = 1/2ZX
1 1

22
Instrumental Variables: The Variance of the IV
Estimator

I This implies, when n

I If ZX = 1%
q = 0.01, thenqV (1 ) = 10000V (1 ), and
therefore, V (1 ) = 100 V (1 )

I If ZX = 10%
q = 0.1, then q V (1 ) = 100V (1 ), and
therefore, V (1 ) = 10 V (1 )

I Even with a relatively high correlation, q ZX = 50% q = 0.5,

then V (1 ) = 4V (1 ), and therefore, V (1 ) = 2 V (1 )

23
Instrumental Variables: The Variance of the IV
Estimator

I BUT if X is endogenous, the comparison between OLS

and the IV variances has no meaning because the OLS
estimator is inconsistent.

24
Instrumental Variables: Inferences with the IV
Estimator

I Consider the simple model: Y = 0 + 1 X +

I Assume the conditional homoscedasticity assumption

holds: V (|Z) = 2 = V ()

I then, it can be shown that:

1 1 asy
N (0, 1)
S1

25
Instrumental Variables: Inferences with the IV
Estimator

I S1 is the standard error of the IV estimators:

2 SZ2
S2 V 1 = 2
1 nSZX

S1 =
nSZX

1P 2
I where 2 = , and i = Yi 0 + 1 X = yi 1 xi
n i i

I This result allows us to construct confidence intervals, and

perform hypothesis tests.

26
Instrumental Variables: Goodness of fit under
IV Estimation

I Most econometric programs calculate the R2 in the IV

estimation using the following formula:
P 2
2
R = 1 Pi i2
i yi

I However, when X and are correlated, this formula is not

correct. The R2 of the IV estimation:
can be negative if i 2i > i yi2 .
P P
has no natural interpretation, because if C(X, ) 6= 0, we
cannot decompose the variance of Y as 12 V (X) + V ().
cannot be used to calculate the W 0 test statistic (we have
to use SSR).

27
IV versus OLS

I If our objective is to maximize the R2 , then we should

always use the OLS.

I But if our goal is to properly estimate the causal effect of

X on Y , that is, 1 , then:
If C(X, ) = 0, we have to use the OLS. (It will be more
efficient than any IV estimator Z such that Z 6= X).
If C(X, ) 6= 0, OLS will not be consistent, thus using an IV
estimator Z 6= X is appropriate. (The goodness of fit, in
this context, is not of interest to us).

28
Instrumental Variables: Inadequate Instruments
I The IV estimator is consistent if (a) C(Z, ) = 0 and (b)
C(Z, X) 6= 0.

I If these conditions are not satisfied, the IV estimator has

an asymptotic bias, that is bigger than that of the OLS
estimator, especially if |XZ | is small.

I We can see this by comparing the p lim of both estimators:

the IV estimator (considering the possibility that Z and
are correlated) and the OLS estimator (when X is
endogenous).
C(Z, )
p lim = 1 +
C(Z, X)
C(X, )
p lim = 1 +
C(X)
29
Instrumental Variables: Inadequate Instruments

I Expressed in terms of correlations and population standard

deviations of Z, Y and , respectively, are:
Z
p lim = 1 +
ZX X

p lim = 1 + X
X

I Therefore, we prefer the IV estimator to the OLS estimator

Z
if < X
ZX

30
Instrumental Variables: Inadequate Instruments

I When Z and X are not correlated at all, then the situation

is particularly bad, regardless of whether Z and are
correlated or not.

I When Z and X have a very small sample correlation rZX ,

the problem will be very similar:
It may seem like C(Z, X) = 0.
The estimates will be very inaccurate and may present
values that are implausible.

31
Instrumental Variables: Example of Inadequate
Instruments

I Example: Effect of the Mothers cigarette consumption on

the birth weight of the baby:
Following example illustrates why we should always check
whether the endogenous explanatory variable is correlated
with the potential instrument.
Our estimation results for the effect of several variables on
the birth weight, including cigarette consumption of the
mother are presented below:

32
Instrumental Variables: Example of Inadequate
Instruments

Model 1: OLS, using observations 11388

Dependent variable: bwght

Coefficient Std. Error t-ratio p-value

PACKS 0.0837 0.0175 4.80 0.000
MALE 0.0262 0.0100 2.62 0.009
PARITY 0.0147 0.0054 2.72 0.007
LFAMINC 0.0180 0.0053 3.40 0.001
const 4.6756 0.0205 228.53 0.000

R2 0.0350

33
Instrumental Variables: Example of Inadequate
Instruments
Where

I LBWGHT = logarithm of the babys birth weight

I MALE = dummy variable that equals 1 if the baby is male

and 0 otherwise,

I PARITY = baby birth order (between siblings)

I LFAMINC = logarithm of family income in thousands of

dollars

I PACKS = average number of packs of cigarettes smoked

per day during pregnancy.
34
Instrumental Variables: Example of Inadequate
Instruments

I PACKS may be correlated with other health habits and /

or with good prenatal care PACKS and the error term
might be correlated.

I A possible instrumental variable for PACKS is the average

price of cigarettes per pack: CIGPRICE
Assume that CIGPRICE is uncorrelated with the error
term (although the availability of the state health care
might be correlated with taxes on cigarettes).
Economic theory suggests that C(PACKS, CIGPRICE) < 0.

35
Instrumental Variables: Example of Inadequate
Instruments

The Linear projection of PACKS on CIGPRICE and other

exogenous variables:

Model 1: OLS, using observations 11388

Dependent variable: packs
Coefficient Std. Error t-ratio p-value
const 0.1374 0.1040 1.32 0.187
CIGPRICE 0.0008 0.0008 1.00 0.317
MALE 0.0047 0.0159 0.30 0.766
PARITY 0.0182 0.0089 2.04 0.041
LFAMINC 0.0526 0.0087 6.05 0.000
R2 0.030454

36
Instrumental Variables: Example of Inadequate
Instruments

I The reduced form results indicate that there is no

relationship between smoking during pregnancy and the
price of cigarettes (that is, the price elasticity of cigarette
consumption, which is an addictive good, is not
statistically different from zero).

I Since CIGPRICE and PACKS and are not correlated,

CIGPRICE does not satisfy the condition (b)
CIGPRICE should not be used as an IV.

I But what happens if we use CIGPRICE as an instrument?

The IV estimation results are:

37
Instrumental Variables: Example of Inadequate
Instruments

Model 1: TSLS, using observations 11388

Dependent variable: LBWGHT
Instrumented: PACKS
Instruments: CIGPRICE
Coefficient Std. Error t-ratio p-value
const 4.4679 0.2563 17.43 0.000
PACKS 0.7991 1.1132 0.72 0.474
MALE 0.0298 0.0172 1.73 0.084
PARITY 0.0012 0.0254 0.05 0.961
LFAMINC 0.0636 0.0571 1.12 0.265
R2 . F-statistic 2.50

38
Instrumental Variables: Example of Inadequate
Instruments

I The coefficient of the variable PACKS is very large and it

has an opposite sign compared to what is expected. Its
standard error is also very large.

I But these estimates are meaningless since CIGPRICE does

not satisfy one of the requirements for a valid instrumental
variable.

39
Generalization: The 2SLS Estimator for the
Simple Model
I Let the model be: Y = 0 + 1 X + such that
C(X, ) 6= 0,

I We have two possible IVs: Z1 and Z2 that meet:

C(Z1 , ) = 0, C(Z2 , ) = 0,
C(Z1 , X) 6= 0, C(Z2 , X) 6= 0.

I We can get two simple and distinct IV estimators: one

with Z1 and another one with Z2 .

I BUT we can also obtain an IV estimator which uses the

linear combination of Z1 and Z2 as an instrument:
We get the estimators via two-stage least squares
estimation (2SLS)
40
Generalization: The 2SLS Estimator for the
Simple Model
I Stage 1: We use OLS to estimate the linear projection of
the endogenous explanatory variable X on the instruments
Z1 and Z2 (known as the reduced form):
X = 0 + 1 Z1 + 2 Z2 + v (2)

Let 0 , 1 and 2 be the OLS estimators of the reduced

form.
Then, the estimated values of X are: X = 0 + 1 Z1 + 2 Z2
I Stage 2: We use OLS to estimate the regression of Y on X
(hence the name):
Y = 0 + 1 X + u (3)

This estimate is equal to estimating 0 and 1 by using the

IV Z = X.
41
Generalization: The 2SLS Estimator for the
Simple Model

I Although in both cases the coefficients are the same, the

standard errors obtained by 2SLS are incorrect.
The reason is that the error term of the second stage, u,
includes v, but the standard error should only include the
variance of .

I The majority of the economic packages have special options

for IV estimation, which will conduct the 2SLS, so we dont
need to perform the two stages sequentially.

42
Generalization: The 2SLS Estimator for the
Simple Model

I The reduced form (2) decomposes the endogenous

explanatory variable into two additive parts:
The exogenous part, explained linearly by the instruments,
0 + 1 Z 1 + 2 Z 2
The endogenous part, the part that could not be explained
by the instruments, that is, the error term v

43
Generalization: The 2SLS Estimator,
Interpretation of the Reduced Form
I If the instruments are valid and V (|Z1 , Z2 ) is
homoscedastic, it can be shown that the 2SLS estimators
are consistent and asymptomatically normal.

I Thus, we can make inferences using an estimator of the

population variance
1X 2
2 = ui
n
i

where u2i are the residuals from the 2SLS estimation.

I Similar to the simple IV estimator, when instruments are

not appropriate (because theyre correlated with the error
term or weak correlations with the endogenous variable)
the 2SLS estimators can be worse than OLS estimators.
44
Generalization: The 2SLS Estimator for the
Multiple Model

I For simplicity, consider the following linear regression

model:
Y = 0 + 1 X1 + 2 X2 +
where E[] = 0, C(X1 , ) = 0, C(X2 , ) 6= 0

I That is:
X1 is an exogenous variable
But X2 is endogenous

45
Generalization: The 2SLS Estimator for the
Multiple Model

I Suppose we have an instrumental variable Z such that

C(Z, ) = 0.

I Then, the reduced form is X2 = 0 + 1 X1 + 2 Z + In

order Z to be a valid instrument, we need 2 6= 0, in other
words, C(Z, X2 ) 6= 0.

I Very important: Notice that the reduced form for the

endogenous explanatory variable includes the instruments
AND all the exogenous explanatory variables of the model.

46
Generalization: The 2SLS Estimator for the
Multiple Model

I What if we have one more endogenous variable?

I Suppose Y = 0 + 1 X1 + 2 X2 + 3 X3 + , where X1 and

X2 are endogenous, and X3 is exogenous.
E[] = 0, C(X1 , ) 6= 0, C(X2 , ) 6= 0, C(X3 , ) = 0

I In that case, we will need at least as many additional

exogenous variables as endogenous explanatory variables to
use as instruments.

47
Generalization: The 2SLS Estimator for the
Multiple Model
I In this case, Let Z1 and Z2 be such that
C(Z1 , ) = C(Z2 , ) = 0.

I We will have a reduced form equation for each endogenous

explanatory variable, where use all the exogenous
explanatory variables and the instruments:

X1 = 10 + 11 X3 + 11 Z1 + 12 Z2 + v1
X2 = 20 + 21 X3 + 21 Z1 + 22 Z2 + v2

where, at least, 11 6= 0 and 22 6= 0 OR 12 6= 0 and 21 6= 0

I In general, all instruments are present in the equations for

the reduced form of each of the endogenous explanatory
variables.
48
Test of Endogeneity: Hausman Test

I In practice, there are many situations where we do not

know whether an explanatory variable is endogenous. One
possibility is we can perform a hypothesis test for this.

I For example, for the following model Y = 0 + 1 X + , we

can consider the following hypothesis:

H0 : C(X, ) = 0 (exogenous)
H1 : C(X, ) 6= 0 (endogenous)

I How can we perform such a test?

I Suppose we have a valid instrument Z such that

C(Z, ) = 0 and C(Z, X) 6= 0

49
Test of Endogeneity: Hausman Test

I Then, from the reduced form X = 0 + 1 Z + v,we can

easily get

C(X, ) = C(0 + 1 Z + v, ) = C(v, )

C(X, ) = 0 C(v, ) = 0

I Therefore, if H0 : C(X, ) = 0 is true, the coefficient of

= v + should satisfy = 0, or, equivalently, the
coefficient of

Y = 0 + 1 X + v + (4)

50
Test of Endogeneity: Hausman Test
I Therefore, if you could estimate (4), we could test
H0 : = 0, which is equivalent to H0 : C(X, ) = 0.

I In practice, v is not observable. Therefore, it is replaced by

the OLS residuals v from the reduced model.

I Therefore, the model

Y = 0 + 1 X + v + 1 (5)

is estimated by OLS, where v = X (0 + 1 Z)

I The null hypothesis is that X is exogenous, ie H0 : = 0.

I Therefore, if we reject that is 0 in model (5), we conclude

that X is endogenous.
51
Test of Endogeneity: Hausman Test

I Generalization:The Hausman test for r potential

endogenous variables would be:
Estimate r reduced forms corresponding to each of the r
variables,
Obtain the residuals from each of the reduced forms
Include all these r residuals as additional regressors to the
main model of Y
Test for joint significance of the residuals by W 0 test:
SSRr SSRun asy 2
W0 = n r
SSRun

52
Test of Endogeneity: Hausman Test

I where
SSRr is the sum of the squares of the residuals of the
restricted model
SSRun is the sum of the squares of the residues of the full
model, which includes the residuals from each of the
reduced forms as additional regressors
r is the number of potential endogenous variables.

I If the conclusion is in favor of joint significance, then at

least one of the explanatory variables is endogenous.

53
Hausman Test Example

I To illustrate, suppose we have the following model:

Y = 0 + 1 X1 + 2 X2 + 3 X3 + ,
where X1 and X2 are potentially endogenous, and X3 is
exogenous.

I We will need at least two instruments Z1 and Z2 such that

C(Z1 , ) = C(Z2 , ) = 0.

I Then, we have the following two reduced form equations:

X1 = 10 + 11 X3 + 11 Z1 + 12 Z2 + v1
X2 = 20 + 21 X3 + 21 Z1 + 22 Z2 + v2

54
Hausman Test Example

I The hypothesis of exogeneity is now

H0 : C(X1 , ) = C(X2 , ) = 0

I Equivalently, we can use the extended regression with the

residuals:
Y = 0 + 1 X1 + 2 X2 + 3 X3 + 1 v1 + 2 v2 + , where v1
and v2 are the residuals from the reduced forms for X1 and
X2 , respectively

I The null hypothesis can be written as H0 : 1 = 2 = 0.

I To test this hypothesis with two restrictions, we should

estimate both the restricted and unrestricted models, and
calculate the sums of squares of residuals, and calculate the
test statistic W 0 , which is distributed, approximately, 22
55
Testing Overidentification Restrictions: Sargan
Test
I If we have only one instrumental variable for each
endogenous explanatory variable, we say that the model is
exactly identified.
In this case, we can not test the condition (a) (C(Z, ) = 0).

I BUT if we have more instrumental variables than the

potentially endogenous explanatory variables, we say that
the model is overidentified.
In this case, we can test whether any of the IVs is
correlated with the error term.

I Suppose we have r potentially endogenous explanatory

variables and q instruments, where q > r
(q r), then, is the number of overidentification restrictions
(number of extra instruments).
56
Testing Overidentification Restrictions: Sargan
Test

I We do not observe the errors of the equation of interest, u

I But we can implement a test based on residuals of the

2SLS, u (sample realizations of u).

I The test procedure:

1. Estimate the equation of interest using 2SLS and obtain the
2SLS residuals, u.
2. Regress u on all exogenous variables (including IVs).
Obtain the R2 of this regression, say Ru2
3. Under the null hypothesis that none of the IVs is correlated
asy
with u, we have nRu2 2qr

57
Testing Overidentification Restrictions: Sargan
Test
I Intuition of the test: the fitted values of this auxiliary
have zero mean and variance 2 . Under
regression, u, u
conditional homoscedasticity, asymptotically
P u 2
i 2 is the sum of squares of N (0, 1) random variables,
u
out of which only q r are independent. Hence, this
expression follows an asymptotic 2 distribution with q r
degrees of freedom
I In practice, we will replace u2 by its estimator
1 P 2
s2u = u
n i
I Therefore, this statistic also has the same distribution
X 2
u
= nRu2
1 P 2
i u
n i
58
Testing Overidentification Restrictions: Sargan
Test

I If nRu2 exceeds the critical value of 2qr at the significance

level, we reject H0 and conclude that there is no evidence
for exogeneity.

I Another thing is that this test does not distinguish which

variable is the reason for rejecting the null of no correlation.

I This test is also known as Hansen-Sargan test.

59
Example: The Wage Equation

I Let the model be: ln(wage) = 0 + 1 educ + 2 cap +

where 2 6= 0 (ie, the variable capacity, which is
unobserved, is a relevant variable).

I If we estimate the model using OLS:

ln(wage) = 0 + 1 educ + u, with u = 2 cap + , we will
have inconsistent estimates.

I If we have an instrumental variable for educ, we can

estimate the model by IV estimation.

60
Example: The Wage Equation

I What are the conditions for the IV in order our estimators

to be consistent?
1. C(Z, u) = 0: IV should not be correlated with the capacity
or other unobserved variables that affect wages.
2. C(Z, educ) 6= 0: IV should be correlated with education.

I Some examples of possible instruments (Z) for education

are, mothers education, fathers education, number of
siblings, distance between school and home, etc.

61
Example: The Wage Equation, OLS estimation

I We have a sample of 336 married women.

I The OLS estimation results are as follows:

\ = 0.286 + 0.083 educ
ln(wage)
(0.120) (0.009)

I The interpretation is that an additional year of schooling,

on average, increases the wages by 8.3%.

62
Example: The Wage Equation, IV estimation (a
single instrument)

I Possible instrument: Fathers education educf

I Reduced form:
d = 9.799 + 0.282 educf, R2 = 0.196
educ
(0.198) (0.021)

I t test statistic for the instrumental variable is

0.282
t= 13.52, that is, we reject H0 : 1 = 0
0.021

I Therefore, the education of women (educ), is significantly

correlated with the education of their father (educf).

63
Example: The Wage Equation, IV estimation (a
single instrument)

I \ = 0.363 + 0.076 educ

The IV Estimate: ln(wage)
(0.289) (0.023)

I By comparing the results of OLS with that of IV, we see

that the OLS estimate is higher, which is consistent with a
positive bias due to omitting capacity.

I Notice that the standard error of the IV estimators are

substantially higher than those of the OLS estimators as
the theory suggests (but education still remains significant)

64
Example: The Wage Equation, IV estimation (a
single instrument)

I Hausman Test

I From the reduced form, we get the variable v, the residual

of the estimated equation:
v = educ (9.799 + 0.282 educf), and perform the
regression:
ln(wage) = 0 + 1 educ + v + e, and obtain:
\ = 0 + 1 educ + 0.007 v
ln(wage)
(0.024)

I Then we test H0 : = 0 (educ is exogenous), with

t = 0.007/0.024 0.3 DNR H0 (DNR exogeneity)

65
Example: The Wage Equation, IV estimation
(multiple instruments)

I Suppose that in addition to the fathers education, we also

have the mothers education, educm, as an instrument.
I Now, the reduced form is:
d = 8.976 + 0.183 educf + 0.183 educm, R2 = 0.245
educ
(0.226) (0.025) (0.026)
I Test statistic for the joint significance of educf and educm
is W 0 243.3, and has asymptotic 22 distribution
I The IV Estimate using two IVs is:
\ = 0.396 + 0.074 educ
ln(wage)
(0.272) (0.023)

66
Example: The Wage Equation, IV estimation
(multiple instruments)

I In order to implement the Hausman test, we take the

residuals v from the reduced form:
v = educ (8.976 + 0.183 educf + 0.183 educm)

I And use OLS to estimate the regression

model:ln(wage) = 0 + 1 educ + v + e, and obtain:
\ = 0 + 1 educ + 0.0107v
ln(wage)
0.022

I hen we test H0 : = 0 (educ is exogenous), with

t = 0.0107/0.022 0.5 DNR H0 (DNR exogeneity)

67
Example: The Wage Equation, IV estimation
(multiple instruments)

I Sargan Test

I Following the latter case, we have two instruments (educf

and educm) for a potentially endogenous variable (educ),
with q r = 1 overidentification restrictions.

I We can, therefore, partially assess the validity of

instruments (that is, the null hypothesis of exogeneity) by
testing that the IVs have no correlation with the error term
using a Sargan test.

68
Example: The Wage Equation, IV estimation
(multiple instruments)

I To do this, we, first, calculate the residuals of the 2SLS

estimation
u = ln(wage) (0.396 + 0.074 educ)

I Then, perform the auxiliary regression of the residuals on

the exogenous variables and the instruments:
= 0.0054 + 0.0020 educf 0.0025 educm, R2 = 0.0003
u
(0.0703) (0.0075) (0.0081)

69
Example: The Wage Equation, IV estimation
(multiple instruments)

I Therefore, the test statistic, nRu2 = 0.1008, has a value less

than the critical value from the 21 distribution. Therefore
we do not reject the null of no correlation between the
instruments and the error term of the model.

I That is, there is no evidence against the validity of the

instruments.

70
Final Points

I In practice, in many situations, it is difficult to find valid

instruments, that is, a variable which is not included in the
model, and is highly correlated with potentially
endogenous explanatory variables, and is not correlated
with the error term of the model.

I The problem is that for the economic variables, most of the

available variables are results of agents decisions, and
therefore, their exogeneity is questionable.

71
Final Points

I Ideally, we would like to use, as instrumental variables,

variables that are given to the agent (i.e., exogenous). We
have seen an example the price of cigarettes as instrument
for the number of packets of cigarettes consumed.

I The problem is that in many contexts (like the example we

mentioned above) the quality of the instrument is reduced
by the weak correlation between the IV and the
endogenous explanatory variable we use the IV for.

72
Final Points

I EXAMPLE: The availability of information about previous

realizations of the variables of interest opens interesting
possibilities for possible instruments. Thus, an endogenous
explanatory variable could be replaced by the previous
realizations of the same variable (since the previous
realizations are given before the current values are realized)

For example, for the consumption and permanent income

equation, we use income since the forme is not available.
But, we could also use the disposable income corresponding
to the previous period as IV.
If we analyze this relationship with aggregated time series,
we could use the lagged disposable income as an instrument.
If we analyze the relationship with longitudinal family data,
lagged disposable income of each family could be used as an
instrument.

Instrumental Variables and 2SLS Overview
No ratings yet
Instrumental Variables and 2SLS Overview
21 pages
Understanding Instrumental Variables in Econometrics
100% (1)
Understanding Instrumental Variables in Econometrics
27 pages
Instrumental Variables in Microeconometrics
No ratings yet
Instrumental Variables in Microeconometrics
77 pages
Outliers and Influence in Regression Analysis
No ratings yet
Outliers and Influence in Regression Analysis
71 pages
Econometrics II: ML Estimation Exercises
No ratings yet
Econometrics II: ML Estimation Exercises
6 pages
Basee2 Students
No ratings yet
Basee2 Students
142 pages
Types of Data in Econometric Analysis
No ratings yet
Types of Data in Econometric Analysis
57 pages
Instrumental Variable Method Overview
No ratings yet
Instrumental Variable Method Overview
32 pages
Dynamic Models for Time Series Analysis
100% (2)
Dynamic Models for Time Series Analysis
28 pages
Understanding Econometrics Fundamentals
100% (1)
Understanding Econometrics Fundamentals
43 pages
Intro to Econometrics: Spring 2025
No ratings yet
Intro to Econometrics: Spring 2025
52 pages
Understanding Economic Regression
No ratings yet
Understanding Economic Regression
19 pages
Stata Basics for Econometric Analysis
No ratings yet
Stata Basics for Econometric Analysis
59 pages
STATA Regression Analysis Techniques
No ratings yet
STATA Regression Analysis Techniques
9 pages
Econ3005 Applied Econometrics Exam
No ratings yet
Econ3005 Applied Econometrics Exam
11 pages
ARCH and GARCH Models Explained
No ratings yet
ARCH and GARCH Models Explained
8 pages
GMM Estimation in Monetary Policy Analysis
No ratings yet
GMM Estimation in Monetary Policy Analysis
5 pages
Econometrics Final Exam Cheat Sheet
No ratings yet
Econometrics Final Exam Cheat Sheet
4 pages
Multiple Regression Analysis: Inference: Wooldridge: Introductory Econometrics: A Modern Approach, 5e
No ratings yet
Multiple Regression Analysis: Inference: Wooldridge: Introductory Econometrics: A Modern Approach, 5e
23 pages
Introduction to Panel Data Analysis
No ratings yet
Introduction to Panel Data Analysis
42 pages
Understanding Panel Data Models
No ratings yet
Understanding Panel Data Models
112 pages
Financial Analytics with R: Regression Insights
100% (1)
Financial Analytics with R: Regression Insights
115 pages
Linear Probability Model Overview
100% (1)
Linear Probability Model Overview
12 pages
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2013 1
No ratings yet
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2013 1
80 pages
Introduction to Cointegrated VAR Models
No ratings yet
Introduction to Cointegrated VAR Models
16 pages
Panel Data Analysis Course Overview
No ratings yet
Panel Data Analysis Course Overview
48 pages
Fixed Effects Regression Explained
No ratings yet
Fixed Effects Regression Explained
49 pages
Understanding Heteroskedasticity in Regression
100% (1)
Understanding Heteroskedasticity in Regression
23 pages
Linear Random & Fixed Effects Models
No ratings yet
Linear Random & Fixed Effects Models
62 pages
OLS Asymptotics in Regression Analysis
No ratings yet
OLS Asymptotics in Regression Analysis
8 pages
Understanding Panel Data Models
No ratings yet
Understanding Panel Data Models
46 pages
Koyck Distributed Lag Model Explained
No ratings yet
Koyck Distributed Lag Model Explained
1 page
Econometrics Problem Set 9
No ratings yet
Econometrics Problem Set 9
3 pages
Econometrics Books: Books On-Line Books / Notes
100% (1)
Econometrics Books: Books On-Line Books / Notes
8 pages
Understanding Panel Data Analysis Techniques
No ratings yet
Understanding Panel Data Analysis Techniques
14 pages
Time Series Analysis Concepts Guide
No ratings yet
Time Series Analysis Concepts Guide
3 pages
Time Series Analysis in Econometrics
No ratings yet
Time Series Analysis in Econometrics
31 pages
EC481 Data Analysis Guidelines
No ratings yet
EC481 Data Analysis Guidelines
15 pages
VECM Analysis of Income and Consumption
No ratings yet
VECM Analysis of Income and Consumption
8 pages
Regression Analysis and Outliers Explained
No ratings yet
Regression Analysis and Outliers Explained
8 pages
Qualitative Data in Regression Analysis
No ratings yet
Qualitative Data in Regression Analysis
25 pages
Linear Regression Methods for STARDEX
No ratings yet
Linear Regression Methods for STARDEX
6 pages
ARDL Approach to Cointegration Analysis
100% (1)
ARDL Approach to Cointegration Analysis
43 pages
Econometrics Final Exam Instructions
No ratings yet
Econometrics Final Exam Instructions
9 pages
Elasticity of Substitution Comparison
No ratings yet
Elasticity of Substitution Comparison
8 pages
Understanding Stationarity in Time Series
No ratings yet
Understanding Stationarity in Time Series
45 pages
JTPA Acceptance Analysis: LPM, Probit, Logit
No ratings yet
JTPA Acceptance Analysis: LPM, Probit, Logit
3 pages
Fundamentals of Econometrics Overview
100% (1)
Fundamentals of Econometrics Overview
31 pages
Advanced Macroeconomic Theory Overview
No ratings yet
Advanced Macroeconomic Theory Overview
292 pages
OLS Assumption Violations and Remedies
No ratings yet
OLS Assumption Violations and Remedies
64 pages
Panel Data Modelling Overview
No ratings yet
Panel Data Modelling Overview
211 pages
Understanding the Chow Test for Breaks
No ratings yet
Understanding the Chow Test for Breaks
23 pages
Understanding Instrumental Variable Estimation
No ratings yet
Understanding Instrumental Variable Estimation
30 pages
Estimation of Simultaneous Equations Model
No ratings yet
Estimation of Simultaneous Equations Model
4 pages
Instrumental Variables in Microeconometrics
No ratings yet
Instrumental Variables in Microeconometrics
13 pages
Advanced Econometrics: GLS & IV Estimation
No ratings yet
Advanced Econometrics: GLS & IV Estimation
34 pages
Understanding Instrumental Variables in Econometrics
No ratings yet
Understanding Instrumental Variables in Econometrics
20 pages
Understanding Endogeneity in Econometrics
No ratings yet
Understanding Endogeneity in Econometrics
42 pages
Reducing Income Inequalities in Pakistan
No ratings yet
Reducing Income Inequalities in Pakistan
35 pages
CPEC Book 2016 PDF
No ratings yet
CPEC Book 2016 PDF
183 pages
Overview of Shanghai Cooperation Organization
No ratings yet
Overview of Shanghai Cooperation Organization
20 pages
China-Pakistan Economic Corridor Overview
No ratings yet
China-Pakistan Economic Corridor Overview
1 page
Management Lessons from Gabbar
No ratings yet
Management Lessons from Gabbar
1 page
FDI's Impact on Pakistan's Economic Growth
75% (4)
FDI's Impact on Pakistan's Economic Growth
4 pages
A Tale of Two Cities Overview
No ratings yet
A Tale of Two Cities Overview
537 pages
Goals of Pakistan's Billion Tree Initiative
100% (1)
Goals of Pakistan's Billion Tree Initiative
12 pages
Correlation and Regression Analysis Results
No ratings yet
Correlation and Regression Analysis Results
4 pages
Caloocan City Population Analysis
No ratings yet
Caloocan City Population Analysis
2 pages
Linear Regression Model Overview
No ratings yet
Linear Regression Model Overview
21 pages
Least Squares Method Explained
100% (3)
Least Squares Method Explained
18 pages
Machine Learning for Time Series Course
No ratings yet
Machine Learning for Time Series Course
78 pages
Pengaruh Kualitas, Harga, Promosi pada Pembelian Motor
No ratings yet
Pengaruh Kualitas, Harga, Promosi pada Pembelian Motor
15 pages
Introduction To Econometrics, 5 Edition: Chapter 5: Dummy Variables
No ratings yet
Introduction To Econometrics, 5 Edition: Chapter 5: Dummy Variables
47 pages
Year 11 Mathematics Test on Statistics
No ratings yet
Year 11 Mathematics Test on Statistics
8 pages
Chapter 1 - Multivariate
100% (1)
Chapter 1 - Multivariate
30 pages
Methamphetamine Dependence in Nigerian Students
No ratings yet
Methamphetamine Dependence in Nigerian Students
55 pages
Impact of Crime on Auto Insurance Rates
No ratings yet
Impact of Crime on Auto Insurance Rates
8 pages
Reliable Accuracy Estimates From K-Fold Cross Validation
No ratings yet
Reliable Accuracy Estimates From K-Fold Cross Validation
9 pages
Statistics Review for Final Exam 2011
No ratings yet
Statistics Review for Final Exam 2011
8 pages
McDonald's Waiting Time Analysis
No ratings yet
McDonald's Waiting Time Analysis
3 pages
Critical Success Factors in TPM Implementation
No ratings yet
Critical Success Factors in TPM Implementation
15 pages
Shelf Life Estimation via Linear Regression
No ratings yet
Shelf Life Estimation via Linear Regression
7 pages
Simple Linear Regression Exercises Guide
No ratings yet
Simple Linear Regression Exercises Guide
11 pages
Module 5 - Notes
No ratings yet
Module 5 - Notes
39 pages
Statistics Homework Instructions 2025-2026
No ratings yet
Statistics Homework Instructions 2025-2026
2 pages
Evaluation Metrics for ML Models
No ratings yet
Evaluation Metrics for ML Models
16 pages
Linear Regression Interview Questions Guide
No ratings yet
Linear Regression Interview Questions Guide
59 pages
Probability Formulas for MAT 210
No ratings yet
Probability Formulas for MAT 210
5 pages
Biostatistics Exam Guide for Pharmacy
No ratings yet
Biostatistics Exam Guide for Pharmacy
4 pages
ANOVA and Experimental Design Overview
No ratings yet
ANOVA and Experimental Design Overview
30 pages
Correlation Analysis in Social Sciences
No ratings yet
Correlation Analysis in Social Sciences
5 pages
Effects of Sample Size on Similarity Indices
No ratings yet
Effects of Sample Size on Similarity Indices
7 pages
Diaz Lara2016
No ratings yet
Diaz Lara2016
9 pages
MA611 Statistics and Probabilities Syllabus
No ratings yet
MA611 Statistics and Probabilities Syllabus
3 pages
Central Limit Theorem Applications in Statistics
100% (1)
Central Limit Theorem Applications in Statistics
8 pages
2026 Level I Key Facts and Formulas Sheet
100% (1)
2026 Level I Key Facts and Formulas Sheet
17 pages