National Economic University
Lecture 1
Instrumental Variable
Method
Dr. Phung Minh Duc
Contents
1. Endogeneity Problem
2. Instrumental Variables
3. IV Estimation
4. 2SLS Estimation
5. Testing for Endogeneity
6. Testing for instrumental variables
7. Commands on Stata
8. Practice
Endogeneity Problem
Endogeneity Problem
𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢
Endogeneity problem is when the independent variable is correlated with the
error term:
𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑢𝑢) ≠ 0
Endogeneity is a frequent problem in economic and econometrics.
Endogeneity Problem
Endogeneity Problem
In the regression model: 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢, we have
𝑐𝑐𝑐𝑐𝑐𝑐 𝑌𝑌, 𝑋𝑋 = 𝑐𝑐𝑐𝑐𝑐𝑐 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢, 𝑋𝑋 = 𝛽𝛽1 𝑐𝑐𝑐𝑐𝑐𝑐 𝑋𝑋, 𝑋𝑋 + 𝑐𝑐𝑐𝑐𝑐𝑐 𝑢𝑢, 𝑋𝑋
𝑐𝑐𝑐𝑐𝑐𝑐 𝑌𝑌,𝑋𝑋 𝑐𝑐𝑐𝑐𝑐𝑐 𝑢𝑢,𝑋𝑋
⇒ 𝛽𝛽1 = −
𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋)
�
�1 = 𝑐𝑐𝑐𝑐𝑐𝑐(𝑌𝑌,𝑋𝑋)
In OLS regression, 𝛽𝛽 � 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋)
𝑐𝑐𝑐𝑐𝑐𝑐 𝑌𝑌,𝑋𝑋
If 𝑐𝑐𝑐𝑐𝑐𝑐 𝑢𝑢, 𝑋𝑋 = 0 then 𝛽𝛽1 = ⇒ 𝐸𝐸 𝛽𝛽̂1 = 𝛽𝛽1
𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋)
𝑐𝑐𝑐𝑐𝑐𝑐 𝑌𝑌,𝑋𝑋 𝑐𝑐𝑐𝑐𝑐𝑐 𝑢𝑢,𝑋𝑋
If 𝑐𝑐𝑐𝑐𝑐𝑐 𝑢𝑢, 𝑋𝑋 ≠ 0 then 𝛽𝛽1 = − ⇒ 𝐸𝐸 𝛽𝛽̂1 ≠ 𝛽𝛽1
𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋)
Biased
Endogeneity Problem
Sources of endogeneity
Omitted variable: Independent variables are not observed and end up
in the error term, so the error term is correlated with the independent
variables used in the model
In the model: 𝑄𝑄 = 𝛽𝛽0 + 𝛽𝛽1 𝑃𝑃 + 𝑢𝑢, where 𝑄𝑄 is the yield of rice and 𝑃𝑃 is the
amount of fertilizer used.
P is correlated with the variable 𝑍𝑍 which is the "natural quality of the
soil", while there is often no data for 𝑍𝑍 so 𝑐𝑐𝑐𝑐𝑐𝑐 𝑃𝑃, 𝑢𝑢 ≠ 0 => 𝑃𝑃 is a
endogenous variable.
Endogeneity Problem
Sources of endogeneity
Omitted variable: Independent variables are not observed and end up
in the error term, so the error term is correlated with the independent
variables used in the model
In the model: 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝑢𝑢, where 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 is the loga of
wage, and 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 is the the level of education of the woker.
Educ is correlated with the variable 𝑍𝑍 which is the "intelligence", while
there is often no data for 𝑍𝑍 so 𝑐𝑐𝑐𝑐𝑐𝑐(𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒, 𝑢𝑢) ≠ 0 and 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 is a
endogenous variable.
Endogeneity Problem
Sources of endogeneity
Measurement: Measurement error can cause correlation between
the mismeasured variable and the error term
In the model: 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢 (1), suppose that 𝑋𝑋 is wrongly
measured as 𝑋𝑋 ∗ , that is, 𝑋𝑋 ∗ = 𝑋𝑋 + 𝑣𝑣, model (1) becomes
𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 ∗ + 𝑢𝑢� , in which 𝑢𝑢� = 𝑢𝑢 − 𝛽𝛽1 𝑣𝑣 (2)
If the error 𝑣𝑣 is larger, the value of 𝑋𝑋 ∗ is also larger, so 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋 ∗ , 𝑢𝑢� ) ≠ 0
and 𝑋𝑋 ∗ is a endogenous variable.
Endogeneity Problem
Sources of endogeneity
Measurement: Measurement error can cause correlation between
the mismeasured variable and the error term
In the model: 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 𝛽𝛽0 + 𝛽𝛽1 𝑖𝑖𝑖𝑖𝑖𝑖 + 𝑢𝑢 (1), where 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 is the
consumption and 𝑖𝑖𝑖𝑖𝑖𝑖 is the income of a household.
Usually, households don't remember the exact income, so inc is often
wrongly measured as 𝑖𝑖𝑖𝑖𝑐𝑐 ∗ , that is, 𝑖𝑖𝑖𝑖𝑖𝑖 ∗ = 𝑖𝑖𝑖𝑖𝑖𝑖 + 𝑣𝑣, so 𝑖𝑖𝑖𝑖𝑐𝑐 ∗ is a
endogenous variable.
Endogeneity Problem
Sources of endogeneity
Simultaneity:
Assume that
𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢 (1)
�
𝑋𝑋 = 𝛼𝛼0 + 𝛼𝛼1 𝑌𝑌 + 𝑣𝑣 (2)
We have: 𝑋𝑋 = 𝛼𝛼0 + 𝛼𝛼1 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢 + 𝑣𝑣
𝛼𝛼0 +𝛼𝛼1 𝛽𝛽0 𝛼𝛼1 𝑢𝑢+𝑣𝑣
If 1 − 𝛼𝛼1 𝛽𝛽1 ≠ 0 then 𝑋𝑋 = + ⇒ 𝑐𝑐𝑐𝑐𝑐𝑐 (𝑋𝑋, 𝑢𝑢) ≠ 0
1−𝛼𝛼1 𝛽𝛽1 1−𝛼𝛼1 𝛽𝛽1
Endogeneity Problem
Solution for endogeneity
Find and include the unobserved variable in the model
Find and include a proxy variable in the model
Use fixed effects estimator with panel data, by eliminating individual specific
effects
Use instrumental variable (IV) method which replaces the endogenous
variable with a predicted value that has only exogenous information
Instrumental Variable
Instrumental Variable - Definition
An instrumental variable (or instrument or IV) is a variable that is used in a
regression model to correct for the endogeneity problem
The instrument 𝑍𝑍 is said to be consistent with the endogenous variable 𝑋𝑋 if
two conditions are satisfied:
1. Instrument relevance: 𝑐𝑐𝑐𝑐𝑐𝑐(𝑍𝑍, 𝑋𝑋) ≠ 0 (𝑍𝑍 is correlated to the
endogenous variable 𝑋𝑋, but 𝑍𝑍 does not belong in the model)
2. Instrument exogeneity: 𝑐𝑐𝑐𝑐𝑣𝑣 𝑍𝑍, 𝑢𝑢 = 0 (𝑍𝑍 is not correlated with the
error term 𝑢𝑢)
Instrumental Variable
Model for log wages (𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙) explained by education 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒, which is endogenous
𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝑢𝑢
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 (father’s education) is a good instrument for 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 because it has
three properties:
The instrument 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 doe not appear in the original model
The instrument 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 is correlated with the endogenous variable 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒, so
𝑐𝑐𝑐𝑐𝑐𝑐(𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓, 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒) ≠ 0
The instrument 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 is uncorrelated with the error term 𝑢𝑢, so
𝑐𝑐𝑐𝑐𝑐𝑐 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓, 𝑢𝑢 = 0
Other potential instruments: mother education, the number of siblings, the
month of bird,…
Instrumental Variable
Example
The model: 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = 𝛽𝛽0 + 𝛽𝛽1 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 + 𝑢𝑢, in which 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 is the final
exam score; 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 is the total number of lectures missed during the
semester
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 is correlated with other factors in u: more able, highly motivated
students might miss fewer classes => 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 is an endogeneity variable
Can distance (the distance between living quarters and campus) act as an
instrumental variable for 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠?
𝑐𝑐𝑐𝑐𝑐𝑐(𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑, 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠) ≠ 0?
𝑐𝑐𝑐𝑐𝑣𝑣(𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑, 𝑢𝑢) ≠ 0?
Instrumental Variable
Instrumental Variable - Definition
The instrument 𝑍𝑍 is said to be a strong instrument if it is highly correlated
with the endogenous variable 𝑋𝑋 and called a weak instrument otherwise.
An instrumental variable is called a valid instrument if it is both a strong
instrumental variable and an exogenous variable.
Note: The condition of the exogenous of the instrumental variable 𝑍𝑍 is related
to the covariance between 𝑍𝑍 and the error term, so it is generally impossible to
test. In most cases, the researcher must consider the economic nature of the
problem to account for the exogenous of the chosen instrumental variable.
IV Estimation
𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢 (1)
The instrument 𝑍𝑍 such that: 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑍𝑍) ≠ 0 and 𝑐𝑐𝑐𝑐𝑐𝑐 𝑍𝑍, 𝑢𝑢 = 0
𝐸𝐸 𝑢𝑢 = 0 𝐸𝐸 𝑢𝑢 = 0 𝐸𝐸 𝑌𝑌 − 𝛽𝛽0 − 𝛽𝛽1 𝑋𝑋 = 0
� ⇒� ⇒�
𝑐𝑐𝑐𝑐𝑐𝑐 𝑍𝑍, 𝑢𝑢 = 0 𝑐𝑐𝑐𝑐𝑐𝑐 𝑍𝑍. 𝑢𝑢 = 0 𝑐𝑐𝑐𝑐𝑐𝑐 𝑍𝑍 𝑌𝑌 − 𝛽𝛽0 − 𝛽𝛽1 𝑋𝑋 = 0
1
∑ 𝑌𝑌𝑖𝑖 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑋𝑋𝑖𝑖 = 0
𝑛𝑛
In the sample: �1 we have IV estimator:
∑(𝑍𝑍𝑖𝑖 (𝑌𝑌𝑖𝑖 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑋𝑋𝑖𝑖 ) = 0
𝑛𝑛
� 𝑖𝑖 − 𝑍𝑍)̅
∑(𝑌𝑌𝑖𝑖 − 𝑌𝑌)(𝑍𝑍 𝑐𝑐𝑐𝑐𝑐𝑐�𝑌𝑌, 𝑍𝑍
𝛽𝛽̂1 𝐼𝐼𝐼𝐼 = =
� 𝑖𝑖 − 𝑍𝑍)̅
∑(𝑋𝑋𝑖𝑖 − 𝑋𝑋)(𝑍𝑍 � 𝑍𝑍)
𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋,
IV Estimation
𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢 (1)
The instrument 𝑍𝑍 such that: 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑍𝑍) ≠ 0 and 𝑐𝑐𝑐𝑐𝑐𝑐 𝑍𝑍, 𝑢𝑢 = 0
� 𝑖𝑖 − 𝑍𝑍)̅
∑(𝑌𝑌𝑖𝑖 − 𝑌𝑌)(𝑍𝑍 𝑐𝑐𝑐𝑐𝑐𝑐�𝑌𝑌, 𝑍𝑍
𝛽𝛽̂1 𝐼𝐼𝐼𝐼 = =
� 𝑖𝑖 − 𝑍𝑍)̅
∑(𝑋𝑋𝑖𝑖 − 𝑋𝑋)(𝑍𝑍 � 𝑍𝑍)
𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋,
We have:
𝑐𝑐𝑐𝑐𝑐𝑐 𝑌𝑌, 𝑍𝑍 = 𝑐𝑐𝑐𝑐𝑐𝑐 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢, 𝑍𝑍 = 𝛽𝛽1 𝑐𝑐𝑐𝑐𝑐𝑐 𝑋𝑋, 𝑍𝑍 + 𝑐𝑐𝑐𝑐𝑐𝑐(𝑢𝑢, 𝑍𝑍)
Because 𝑐𝑐𝑐𝑐𝑐𝑐 𝑢𝑢, 𝑍𝑍 = 0 we have:
𝑐𝑐𝑐𝑐𝑐𝑐 𝑌𝑌,𝑍𝑍 � 𝟏𝟏 ) = 𝜷𝜷𝟏𝟏
𝛽𝛽1 = then 𝑬𝑬(𝜷𝜷 𝑰𝑰𝑰𝑰
𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋,𝑍𝑍)
Then, the coefficient estimated using the IV formula will be unbiased and consistent
2SLS Estimation
Step 1: Estimating the endogenous variable 𝑋𝑋 according to the instrumental
variable 𝑍𝑍:
𝑋𝑋 = 𝛿𝛿0 + 𝛿𝛿1 𝑍𝑍 + 𝑣𝑣
obtain the estimated result 𝑋𝑋� = 𝛿𝛿̂0 + 𝛿𝛿̂1 𝑍𝑍, which contains only exogenous
information from the instrument 𝑍𝑍.
Step 2: Regression the dependent variable 𝑌𝑌 on the predicted value 𝑋𝑋: �
𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋� + 𝑢𝑢
The coefficient 𝛽𝛽1 estimated with 2SLS will be unbiased because 𝑋𝑋� is exogenous
and uncorrelated with the error term 𝑢𝑢.
2SLS Estimation
Dùng 2SLS khi có nội sinhd9
2SLS Standard Error:
The standard errors from the second stage regression need to be corrected
𝜎𝜎2 𝜎𝜎2 2
In OLS, 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽1 = , but in 2SLS, 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽1 = 2 in which 𝜎𝜎 is the
𝑆𝑆𝑆𝑆𝑇𝑇𝑋𝑋 𝑆𝑆𝑆𝑆𝑇𝑇𝑋𝑋 .𝑅𝑅𝑋𝑋,𝑍𝑍
variance of the error term 𝑢𝑢, and 𝑆𝑆𝑆𝑆𝑇𝑇𝑋𝑋 is the total variation in 𝑋𝑋;
2
𝑅𝑅𝑋𝑋,𝑍𝑍 is 𝑅𝑅2 from the regression of 𝑋𝑋 on 𝑍𝑍, so the variance of coefficients using
the 2SLS estimation will be higher than the variance of coefficients using the
OLS, because the R-square is less than 1.
A weaker the relationship between 𝑿𝑿 and 𝒁𝒁 will results in lower 𝑹𝑹𝟐𝟐𝑿𝑿,𝒁𝒁 , and
the higher variance of the 2SLS coefficient, leading to less significance.
Testing for Endogeneity
Multiple regression model
𝐻𝐻 : 𝑐𝑐𝑐𝑐𝑐𝑐 𝑋𝑋, 𝑢𝑢 = 0
𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝛽𝛽2 𝑊𝑊 + 𝑢𝑢 with � 0
𝐻𝐻1 : 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑢𝑢) ≠ 0
If 𝐻𝐻0 is true, then 𝛽𝛽̂𝑂𝑂𝑂𝑂𝑂𝑂 and 𝛽𝛽̂𝐼𝐼𝐼𝐼 are both consistent, so OLS is the best (BLUE)
If 𝐻𝐻0 is false, then OLS is unconsistent and 2SLS is consistent
Testing for Endogeneity of a single Explanatory Variable
Regression of the independent variable 𝑋𝑋 on the instrumental variable 𝑍𝑍 and the
exogenous independent variable 𝑊𝑊, obtain the residuals 𝑣𝑣�
Add 𝑣𝑣� to the structural equation (which includes 𝑋𝑋) and test for significance of 𝑣𝑣�
using an OLS regression. If the coefficient on 𝑣𝑣� is statistically different from zero,
we conclude that 𝑋𝑋 is indeed endogenous.
Testing for Endogeneity
Hausman Test
The Hausman test compares the difference between 𝛽𝛽̂𝑂𝑂𝑂𝑂𝑂𝑂 and 𝛽𝛽̂𝐼𝐼𝐼𝐼 :
𝐻𝐻0 : 𝑑𝑑 = 𝛽𝛽̂𝑂𝑂𝑂𝑂𝑂𝑂 − 𝛽𝛽̂𝐼𝐼𝐼𝐼 = 0
�
𝐻𝐻1 : 𝑑𝑑 = 𝛽𝛽̂𝑂𝑂𝑂𝑂𝑂𝑂 − 𝛽𝛽̂𝐼𝐼𝐼𝐼 ≠ 0
Test statistics:
′
𝐻𝐻 = 𝛽𝛽̂𝐼𝐼𝐼𝐼 − 𝛽𝛽̂𝑂𝑂𝑂𝑂𝑂𝑂 . 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽̂𝐼𝐼𝐼𝐼 − 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽̂𝑂𝑂𝑂𝑂𝑂𝑂 𝛽𝛽̂𝐼𝐼𝐼𝐼 − 𝛽𝛽̂𝑂𝑂𝑂𝑂𝑂𝑂 ~χ2
If 𝐻𝐻0 is true, then OLS is the best (BLUE)
If 𝐻𝐻0 is false, then IV is consistent
Testing for instrumental variables
Testing for the exogenous of the instrument
Consider the model 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢, in which 𝑋𝑋 is the endogenous variable
The exogenousity test for instrumental variables is performed only if the
number of instrumental variables is greater than or equal to the number of
endogenous independent variables.
If greater: Over-identification
If equal: Exact-identification
Testing for instrumental variables
Testing for the exogenous of the instrument
Consider the model 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢, in which cov(𝑋𝑋, 𝑢𝑢) ≠ 0 and 𝑍𝑍1 , 𝑍𝑍2 are
two instrument variable.
Way 1: (The Sargan test)
Estimate the structural equation by 2SLS using the instrument 𝑍𝑍1 , 𝑍𝑍2 and obtain
the 2SLS residuals 𝑢𝑢�
Regress 𝑢𝑢� 1 on all exogenous variables
𝑢𝑢� = 𝛼𝛼0 + 𝛼𝛼1 𝑍𝑍1 + 𝛼𝛼2 𝑍𝑍2 + 𝑣𝑣
Test the hypothesis 𝐻𝐻0 : 𝛼𝛼1 = 𝛼𝛼2 = 0, if P-value << then at least some of the IV
𝑍𝑍1 , 𝑍𝑍2 are not exogenous
Testing for instrumental variables
Testing for the exogenous of the instrument
Consider the model 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢, in which cov(𝑋𝑋, 𝑢𝑢) ≠ 0 and 𝑍𝑍1 , 𝑍𝑍2 are
two instrument variable.
Way 2: (the Hausman test)
Estimate the structural equation by 2SLS using the instrument 𝑍𝑍1 and obtain the
cofficient 𝛽𝛽̂1
Estimate the structural equation by 2SLS using the instrument 𝑍𝑍2 and obtain the
cofficient 𝛽𝛽̂1 ′
Test the hypothesis 𝐻𝐻0 : 𝑑𝑑 = 𝛽𝛽̂1 − 𝛽𝛽̂1 ′ = 0, if P-value >> then at least some of the
IV 𝑍𝑍1 , 𝑍𝑍2 are not exogenous.
Testing for instrumental variables
Testing for the weakly instrumental
The instrumental variable 𝑍𝑍 is said to be strongly instrumental if it is highly
correlated with the endogenous independent variable and weakly
instrumental otherwise.
Estimating the endogenous variable 𝑋𝑋 according to the instrumental variable
𝑍𝑍1 , 𝑍𝑍2 :
𝑋𝑋 = 𝛿𝛿0 + 𝛿𝛿1 𝑍𝑍1 + 𝑣𝑣
Use the F-statistic of variable 𝑍𝑍 to conclude whether 𝑍𝑍1 , 𝑍𝑍2 is the weak instrument
or not: If the value of F is less than 10, it can be concluded that at least one of the
two instrumental variables 𝑍𝑍1 and 𝑍𝑍2 is weak instrument.
The general model with
endogenous variables
𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋1 + ⋯ + 𝛽𝛽𝑘𝑘 𝑋𝑋𝑘𝑘 + 𝛽𝛽𝑘𝑘+1 𝑊𝑊1 + ⋯ + 𝛽𝛽𝑘𝑘+𝑚𝑚 𝑊𝑊𝑚𝑚 + 𝑢𝑢 (*)
In which, 𝑋𝑋1 , … , 𝑋𝑋𝑘𝑘 are the endogenous variable; 𝑊𝑊1 , … , 𝑊𝑊𝑚𝑚 are the exogeneous variable;
and 𝑍𝑍1 , … , 𝑍𝑍𝑝𝑝 𝑝𝑝 ≥ 𝑘𝑘 are the instrument variables.
Step 1: Regression 𝑋𝑋1 on all instrumental variables 𝑍𝑍1 , … , 𝑍𝑍𝑚𝑚 and al the exogenous
independent variables 𝑊𝑊1 , … , 𝑊𝑊𝑚𝑚 by OLS method, store the estimated value as 𝑋𝑋�1 . Repeat
the same for the remaining endogenous independent variables, saving the estimated
values 𝑋𝑋�1 , 𝑋𝑋�2 , … , 𝑋𝑋�𝑘𝑘 .
Step 2: Regression of equation (*) by OLS method with exogenous independent variables
𝑊𝑊1 , … , 𝑊𝑊𝑚𝑚 , endogenous independent variables 𝑋𝑋1 , … , 𝑋𝑋𝑘𝑘 are replaced by estimated
values 𝑋𝑋�1 , 𝑋𝑋�2 , … , 𝑋𝑋�𝑘𝑘 .
Command on Stata
𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝛽𝛽2 𝑊𝑊 + 𝑢𝑢 (*)
in which 𝑋𝑋 is the endogeneous, 𝑊𝑊 is the exogeneous, and 𝑍𝑍 is the instrument variable
Estimate 2SLS
Ivregress 2sls Y W (X=Z)
Ivregress 2sls Y W (X=Z), small (if the sample is small)
Ivregress 2sls Y W (X=Z), vce(robust) small (if variance is not homogenous)
Command on Stata
𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝛽𝛽2 𝑊𝑊 + 𝑢𝑢 (*)
in which 𝑋𝑋 is the endogeneous, 𝑊𝑊 is the exogeneous, and 𝑍𝑍 is the instrument variable
Test the endogeneity of the variable X:
Way 1: reg X Z W
predict Vhat, residuals
reg Y W Vhat => if P-value of Vhat << then X is endogeneous
Command on Stata
𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝛽𝛽2 𝑊𝑊 + 𝑢𝑢 (*)
in which 𝑋𝑋 is the endogeneous, 𝑊𝑊 is the exogeneous, and 𝑍𝑍 is the instrument variable
Test the endogeneity of the variable X:
Way 2: reg Y X W
est store ls
ivregress 2sls Y W (X=Z)
est store iv
Hausman iv ls, constant sigmamore
If P-value << then X is endogeneous
Command on Stata
𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝛽𝛽2 𝑊𝑊 + 𝑢𝑢 (*)
in which 𝑋𝑋 is the endogeneous, 𝑊𝑊 is the exogeneous, 𝑍𝑍1 , 𝑍𝑍2 are the instrument variables
Test the exogeneous of the instrument variable 𝐙𝐙𝟏𝟏 , 𝐙𝐙𝟐𝟐 :
Way 1:
ivregres 2sls Y (X=Z1 Z2) W, small
estat overid
If P-value << then at least some of the IV 𝑍𝑍1 , 𝑍𝑍2 are not exogenous
Command on Stata
𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝛽𝛽2 𝑊𝑊 + 𝑢𝑢 (*)
in which 𝑋𝑋 is the endogeneous, 𝑊𝑊 is the exogeneous, and 𝑍𝑍 is the instrument variable
Test the exogeneous of the instrument variable 𝐙𝐙𝟏𝟏 , 𝐙𝐙𝟐𝟐 :
Way 2: ivregres 2sls Y (X=Z1) W, small
est storer z1
ivregres 2sls Y (X=Z2) W, small
est storer z2
Hausman z1 z2, constant sigmamore
If P-value << then at least some of the IV 𝑍𝑍1 , 𝑍𝑍2 are not exogenous
Command on Stata
𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝛽𝛽2 𝑊𝑊 + 𝑢𝑢 (*)
in which 𝑋𝑋 is the endogeneous, 𝑊𝑊 is the exogeneous, and 𝑍𝑍 is the instrument variable
Test the weak instrument of variable 𝒁𝒁𝟏𝟏 , 𝒁𝒁𝟐𝟐 :
ivregres 2sls Y (X=Z1 Z2) W, small
estat firststage
If F-statistic value < 10 then at least some of the IV 𝑍𝑍1 , 𝑍𝑍2 is weakly instrument
If F-statistic value > 10 then both 𝑍𝑍1 , 𝑍𝑍2 is strong instrument
Practice