0% found this document useful (0 votes)
28 views32 pages

Instrumental Variable Method Overview

The document discusses endogeneity problems in regression analysis and introduces instrumental variable methods as a solution. It defines what constitutes a valid instrumental variable and provides examples of how instrumental variables can be used to address issues like omitted variables, measurement error, and simultaneity that cause endogeneity. Potential instrumental variables are identified based on whether they are correlated with the endogenous explanatory variable but not directly with the error term.

Uploaded by

Hưng Phạm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views32 pages

Instrumental Variable Method Overview

The document discusses endogeneity problems in regression analysis and introduces instrumental variable methods as a solution. It defines what constitutes a valid instrumental variable and provides examples of how instrumental variables can be used to address issues like omitted variables, measurement error, and simultaneity that cause endogeneity. Potential instrumental variables are identified based on whether they are correlated with the endogenous explanatory variable but not directly with the error term.

Uploaded by

Hưng Phạm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

National Economic University

Lecture 1
Instrumental Variable
Method

Dr. Phung Minh Duc


Contents
1. Endogeneity Problem
2. Instrumental Variables
3. IV Estimation
4. 2SLS Estimation
5. Testing for Endogeneity
6. Testing for instrumental variables
7. Commands on Stata
8. Practice
Endogeneity Problem

Endogeneity Problem
𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢
 Endogeneity problem is when the independent variable is correlated with the
error term:
𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑢𝑢) ≠ 0
 Endogeneity is a frequent problem in economic and econometrics.
Endogeneity Problem

Endogeneity Problem
 In the regression model: 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢, we have
𝑐𝑐𝑐𝑐𝑐𝑐 𝑌𝑌, 𝑋𝑋 = 𝑐𝑐𝑐𝑐𝑐𝑐 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢, 𝑋𝑋 = 𝛽𝛽1 𝑐𝑐𝑐𝑐𝑐𝑐 𝑋𝑋, 𝑋𝑋 + 𝑐𝑐𝑐𝑐𝑐𝑐 𝑢𝑢, 𝑋𝑋
𝑐𝑐𝑐𝑐𝑐𝑐 𝑌𝑌,𝑋𝑋 𝑐𝑐𝑐𝑐𝑐𝑐 𝑢𝑢,𝑋𝑋
⇒ 𝛽𝛽1 = −
𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋)

�1 = 𝑐𝑐𝑐𝑐𝑐𝑐(𝑌𝑌,𝑋𝑋)
In OLS regression, 𝛽𝛽 � 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋)
𝑐𝑐𝑐𝑐𝑐𝑐 𝑌𝑌,𝑋𝑋
 If 𝑐𝑐𝑐𝑐𝑐𝑐 𝑢𝑢, 𝑋𝑋 = 0 then 𝛽𝛽1 = ⇒ 𝐸𝐸 𝛽𝛽̂1 = 𝛽𝛽1
𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋)
𝑐𝑐𝑐𝑐𝑐𝑐 𝑌𝑌,𝑋𝑋 𝑐𝑐𝑐𝑐𝑐𝑐 𝑢𝑢,𝑋𝑋
 If 𝑐𝑐𝑐𝑐𝑐𝑐 𝑢𝑢, 𝑋𝑋 ≠ 0 then 𝛽𝛽1 = − ⇒ 𝐸𝐸 𝛽𝛽̂1 ≠ 𝛽𝛽1
𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋)

Biased
Endogeneity Problem

Sources of endogeneity
 Omitted variable: Independent variables are not observed and end up
in the error term, so the error term is correlated with the independent
variables used in the model
 In the model: 𝑄𝑄 = 𝛽𝛽0 + 𝛽𝛽1 𝑃𝑃 + 𝑢𝑢, where 𝑄𝑄 is the yield of rice and 𝑃𝑃 is the
amount of fertilizer used.
 P is correlated with the variable 𝑍𝑍 which is the "natural quality of the
soil", while there is often no data for 𝑍𝑍 so 𝑐𝑐𝑐𝑐𝑐𝑐 𝑃𝑃, 𝑢𝑢 ≠ 0 => 𝑃𝑃 is a
endogenous variable.
Endogeneity Problem

Sources of endogeneity
 Omitted variable: Independent variables are not observed and end up
in the error term, so the error term is correlated with the independent
variables used in the model
 In the model: 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝑢𝑢, where 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 is the loga of
wage, and 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 is the the level of education of the woker.
 Educ is correlated with the variable 𝑍𝑍 which is the "intelligence", while
there is often no data for 𝑍𝑍 so 𝑐𝑐𝑐𝑐𝑐𝑐(𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒, 𝑢𝑢) ≠ 0 and 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 is a
endogenous variable.
Endogeneity Problem

Sources of endogeneity
 Measurement: Measurement error can cause correlation between
the mismeasured variable and the error term
 In the model: 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢 (1), suppose that 𝑋𝑋 is wrongly
measured as 𝑋𝑋 ∗ , that is, 𝑋𝑋 ∗ = 𝑋𝑋 + 𝑣𝑣, model (1) becomes
𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 ∗ + 𝑢𝑢� , in which 𝑢𝑢� = 𝑢𝑢 − 𝛽𝛽1 𝑣𝑣 (2)
 If the error 𝑣𝑣 is larger, the value of 𝑋𝑋 ∗ is also larger, so 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋 ∗ , 𝑢𝑢� ) ≠ 0
and 𝑋𝑋 ∗ is a endogenous variable.
Endogeneity Problem

Sources of endogeneity
 Measurement: Measurement error can cause correlation between
the mismeasured variable and the error term
 In the model: 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 𝛽𝛽0 + 𝛽𝛽1 𝑖𝑖𝑖𝑖𝑖𝑖 + 𝑢𝑢 (1), where 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 is the
consumption and 𝑖𝑖𝑖𝑖𝑖𝑖 is the income of a household.
 Usually, households don't remember the exact income, so inc is often
wrongly measured as 𝑖𝑖𝑖𝑖𝑐𝑐 ∗ , that is, 𝑖𝑖𝑖𝑖𝑖𝑖 ∗ = 𝑖𝑖𝑖𝑖𝑖𝑖 + 𝑣𝑣, so 𝑖𝑖𝑖𝑖𝑐𝑐 ∗ is a
endogenous variable.
Endogeneity Problem

Sources of endogeneity
 Simultaneity:
 Assume that
𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢 (1)

𝑋𝑋 = 𝛼𝛼0 + 𝛼𝛼1 𝑌𝑌 + 𝑣𝑣 (2)
We have: 𝑋𝑋 = 𝛼𝛼0 + 𝛼𝛼1 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢 + 𝑣𝑣
𝛼𝛼0 +𝛼𝛼1 𝛽𝛽0 𝛼𝛼1 𝑢𝑢+𝑣𝑣
If 1 − 𝛼𝛼1 𝛽𝛽1 ≠ 0 then 𝑋𝑋 = + ⇒ 𝑐𝑐𝑐𝑐𝑐𝑐 (𝑋𝑋, 𝑢𝑢) ≠ 0
1−𝛼𝛼1 𝛽𝛽1 1−𝛼𝛼1 𝛽𝛽1
Endogeneity Problem

Solution for endogeneity


 Find and include the unobserved variable in the model
 Find and include a proxy variable in the model
 Use fixed effects estimator with panel data, by eliminating individual specific
effects
 Use instrumental variable (IV) method which replaces the endogenous
variable with a predicted value that has only exogenous information
Instrumental Variable

Instrumental Variable - Definition


 An instrumental variable (or instrument or IV) is a variable that is used in a
regression model to correct for the endogeneity problem
 The instrument 𝑍𝑍 is said to be consistent with the endogenous variable 𝑋𝑋 if
two conditions are satisfied:
1. Instrument relevance: 𝑐𝑐𝑐𝑐𝑐𝑐(𝑍𝑍, 𝑋𝑋) ≠ 0 (𝑍𝑍 is correlated to the
endogenous variable 𝑋𝑋, but 𝑍𝑍 does not belong in the model)
2. Instrument exogeneity: 𝑐𝑐𝑐𝑐𝑣𝑣 𝑍𝑍, 𝑢𝑢 = 0 (𝑍𝑍 is not correlated with the
error term 𝑢𝑢)
Instrumental Variable

Model for log wages (𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙) explained by education 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒, which is endogenous
𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝑢𝑢
 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 (father’s education) is a good instrument for 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 because it has
three properties:
 The instrument 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 doe not appear in the original model
 The instrument 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 is correlated with the endogenous variable 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒, so
𝑐𝑐𝑐𝑐𝑐𝑐(𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓, 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒) ≠ 0
 The instrument 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 is uncorrelated with the error term 𝑢𝑢, so
𝑐𝑐𝑐𝑐𝑐𝑐 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓, 𝑢𝑢 = 0
 Other potential instruments: mother education, the number of siblings, the
month of bird,…
Instrumental Variable

Example
 The model: 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = 𝛽𝛽0 + 𝛽𝛽1 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 + 𝑢𝑢, in which 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 is the final
exam score; 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 is the total number of lectures missed during the
semester
 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 is correlated with other factors in u: more able, highly motivated
students might miss fewer classes => 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 is an endogeneity variable
 Can distance (the distance between living quarters and campus) act as an
instrumental variable for 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠?
 𝑐𝑐𝑐𝑐𝑐𝑐(𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑, 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠) ≠ 0?
 𝑐𝑐𝑐𝑐𝑣𝑣(𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑, 𝑢𝑢) ≠ 0?
Instrumental Variable

Instrumental Variable - Definition


 The instrument 𝑍𝑍 is said to be a strong instrument if it is highly correlated
with the endogenous variable 𝑋𝑋 and called a weak instrument otherwise.
 An instrumental variable is called a valid instrument if it is both a strong
instrumental variable and an exogenous variable.
Note: The condition of the exogenous of the instrumental variable 𝑍𝑍 is related
to the covariance between 𝑍𝑍 and the error term, so it is generally impossible to
test. In most cases, the researcher must consider the economic nature of the
problem to account for the exogenous of the chosen instrumental variable.
IV Estimation

𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢 (1)


The instrument 𝑍𝑍 such that: 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑍𝑍) ≠ 0 and 𝑐𝑐𝑐𝑐𝑐𝑐 𝑍𝑍, 𝑢𝑢 = 0

𝐸𝐸 𝑢𝑢 = 0 𝐸𝐸 𝑢𝑢 = 0 𝐸𝐸 𝑌𝑌 − 𝛽𝛽0 − 𝛽𝛽1 𝑋𝑋 = 0
 � ⇒� ⇒�
𝑐𝑐𝑐𝑐𝑐𝑐 𝑍𝑍, 𝑢𝑢 = 0 𝑐𝑐𝑐𝑐𝑐𝑐 𝑍𝑍. 𝑢𝑢 = 0 𝑐𝑐𝑐𝑐𝑐𝑐 𝑍𝑍 𝑌𝑌 − 𝛽𝛽0 − 𝛽𝛽1 𝑋𝑋 = 0
1
∑ 𝑌𝑌𝑖𝑖 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑋𝑋𝑖𝑖 = 0
𝑛𝑛
 In the sample: �1 we have IV estimator:
∑(𝑍𝑍𝑖𝑖 (𝑌𝑌𝑖𝑖 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑋𝑋𝑖𝑖 ) = 0
𝑛𝑛
� 𝑖𝑖 − 𝑍𝑍)̅
∑(𝑌𝑌𝑖𝑖 − 𝑌𝑌)(𝑍𝑍 𝑐𝑐𝑐𝑐𝑐𝑐�𝑌𝑌, 𝑍𝑍
𝛽𝛽̂1 𝐼𝐼𝐼𝐼 = =
� 𝑖𝑖 − 𝑍𝑍)̅
∑(𝑋𝑋𝑖𝑖 − 𝑋𝑋)(𝑍𝑍 � 𝑍𝑍)
𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋,
IV Estimation

𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢 (1)


The instrument 𝑍𝑍 such that: 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑍𝑍) ≠ 0 and 𝑐𝑐𝑐𝑐𝑐𝑐 𝑍𝑍, 𝑢𝑢 = 0

� 𝑖𝑖 − 𝑍𝑍)̅
∑(𝑌𝑌𝑖𝑖 − 𝑌𝑌)(𝑍𝑍 𝑐𝑐𝑐𝑐𝑐𝑐�𝑌𝑌, 𝑍𝑍
𝛽𝛽̂1 𝐼𝐼𝐼𝐼 = =
� 𝑖𝑖 − 𝑍𝑍)̅
∑(𝑋𝑋𝑖𝑖 − 𝑋𝑋)(𝑍𝑍 � 𝑍𝑍)
𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋,
We have:
𝑐𝑐𝑐𝑐𝑐𝑐 𝑌𝑌, 𝑍𝑍 = 𝑐𝑐𝑐𝑐𝑐𝑐 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢, 𝑍𝑍 = 𝛽𝛽1 𝑐𝑐𝑐𝑐𝑐𝑐 𝑋𝑋, 𝑍𝑍 + 𝑐𝑐𝑐𝑐𝑐𝑐(𝑢𝑢, 𝑍𝑍)
Because 𝑐𝑐𝑐𝑐𝑐𝑐 𝑢𝑢, 𝑍𝑍 = 0 we have:
𝑐𝑐𝑐𝑐𝑐𝑐 𝑌𝑌,𝑍𝑍 � 𝟏𝟏 ) = 𝜷𝜷𝟏𝟏
𝛽𝛽1 = then 𝑬𝑬(𝜷𝜷 𝑰𝑰𝑰𝑰
𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋,𝑍𝑍)

Then, the coefficient estimated using the IV formula will be unbiased and consistent
2SLS Estimation

Step 1: Estimating the endogenous variable 𝑋𝑋 according to the instrumental


variable 𝑍𝑍:
𝑋𝑋 = 𝛿𝛿0 + 𝛿𝛿1 𝑍𝑍 + 𝑣𝑣
obtain the estimated result 𝑋𝑋� = 𝛿𝛿̂0 + 𝛿𝛿̂1 𝑍𝑍, which contains only exogenous
information from the instrument 𝑍𝑍.
Step 2: Regression the dependent variable 𝑌𝑌 on the predicted value 𝑋𝑋: �
𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋� + 𝑢𝑢
The coefficient 𝛽𝛽1 estimated with 2SLS will be unbiased because 𝑋𝑋� is exogenous
and uncorrelated with the error term 𝑢𝑢.
2SLS Estimation

Dùng 2SLS khi có nội sinhd9


2SLS Standard Error:
 The standard errors from the second stage regression need to be corrected
𝜎𝜎2 𝜎𝜎2 2
 In OLS, 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽1 = , but in 2SLS, 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽1 = 2 in which 𝜎𝜎 is the
𝑆𝑆𝑆𝑆𝑇𝑇𝑋𝑋 𝑆𝑆𝑆𝑆𝑇𝑇𝑋𝑋 .𝑅𝑅𝑋𝑋,𝑍𝑍
variance of the error term 𝑢𝑢, and 𝑆𝑆𝑆𝑆𝑇𝑇𝑋𝑋 is the total variation in 𝑋𝑋;
2
 𝑅𝑅𝑋𝑋,𝑍𝑍 is 𝑅𝑅2 from the regression of 𝑋𝑋 on 𝑍𝑍, so the variance of coefficients using
the 2SLS estimation will be higher than the variance of coefficients using the
OLS, because the R-square is less than 1.
 A weaker the relationship between 𝑿𝑿 and 𝒁𝒁 will results in lower 𝑹𝑹𝟐𝟐𝑿𝑿,𝒁𝒁 , and
the higher variance of the 2SLS coefficient, leading to less significance.
Testing for Endogeneity

 Multiple regression model


𝐻𝐻 : 𝑐𝑐𝑐𝑐𝑐𝑐 𝑋𝑋, 𝑢𝑢 = 0
𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝛽𝛽2 𝑊𝑊 + 𝑢𝑢 with � 0
𝐻𝐻1 : 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑢𝑢) ≠ 0
 If 𝐻𝐻0 is true, then 𝛽𝛽̂𝑂𝑂𝑂𝑂𝑂𝑂 and 𝛽𝛽̂𝐼𝐼𝐼𝐼 are both consistent, so OLS is the best (BLUE)
 If 𝐻𝐻0 is false, then OLS is unconsistent and 2SLS is consistent
 Testing for Endogeneity of a single Explanatory Variable
 Regression of the independent variable 𝑋𝑋 on the instrumental variable 𝑍𝑍 and the
exogenous independent variable 𝑊𝑊, obtain the residuals 𝑣𝑣�
 Add 𝑣𝑣� to the structural equation (which includes 𝑋𝑋) and test for significance of 𝑣𝑣�
using an OLS regression. If the coefficient on 𝑣𝑣� is statistically different from zero,
we conclude that 𝑋𝑋 is indeed endogenous.
Testing for Endogeneity

Hausman Test
 The Hausman test compares the difference between 𝛽𝛽̂𝑂𝑂𝑂𝑂𝑂𝑂 and 𝛽𝛽̂𝐼𝐼𝐼𝐼 :
𝐻𝐻0 : 𝑑𝑑 = 𝛽𝛽̂𝑂𝑂𝑂𝑂𝑂𝑂 − 𝛽𝛽̂𝐼𝐼𝐼𝐼 = 0

𝐻𝐻1 : 𝑑𝑑 = 𝛽𝛽̂𝑂𝑂𝑂𝑂𝑂𝑂 − 𝛽𝛽̂𝐼𝐼𝐼𝐼 ≠ 0
 Test statistics:

𝐻𝐻 = 𝛽𝛽̂𝐼𝐼𝐼𝐼 − 𝛽𝛽̂𝑂𝑂𝑂𝑂𝑂𝑂 . 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽̂𝐼𝐼𝐼𝐼 − 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽̂𝑂𝑂𝑂𝑂𝑂𝑂 𝛽𝛽̂𝐼𝐼𝐼𝐼 − 𝛽𝛽̂𝑂𝑂𝑂𝑂𝑂𝑂 ~χ2
 If 𝐻𝐻0 is true, then OLS is the best (BLUE)
 If 𝐻𝐻0 is false, then IV is consistent
Testing for instrumental variables

Testing for the exogenous of the instrument


Consider the model 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢, in which 𝑋𝑋 is the endogenous variable
 The exogenousity test for instrumental variables is performed only if the
number of instrumental variables is greater than or equal to the number of
endogenous independent variables.
 If greater: Over-identification
 If equal: Exact-identification
Testing for instrumental variables

Testing for the exogenous of the instrument


Consider the model 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢, in which cov(𝑋𝑋, 𝑢𝑢) ≠ 0 and 𝑍𝑍1 , 𝑍𝑍2 are
two instrument variable.
 Way 1: (The Sargan test)
 Estimate the structural equation by 2SLS using the instrument 𝑍𝑍1 , 𝑍𝑍2 and obtain
the 2SLS residuals 𝑢𝑢�
 Regress 𝑢𝑢� 1 on all exogenous variables
𝑢𝑢� = 𝛼𝛼0 + 𝛼𝛼1 𝑍𝑍1 + 𝛼𝛼2 𝑍𝑍2 + 𝑣𝑣
 Test the hypothesis 𝐻𝐻0 : 𝛼𝛼1 = 𝛼𝛼2 = 0, if P-value << then at least some of the IV
𝑍𝑍1 , 𝑍𝑍2 are not exogenous
Testing for instrumental variables

Testing for the exogenous of the instrument


Consider the model 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢, in which cov(𝑋𝑋, 𝑢𝑢) ≠ 0 and 𝑍𝑍1 , 𝑍𝑍2 are
two instrument variable.
 Way 2: (the Hausman test)
 Estimate the structural equation by 2SLS using the instrument 𝑍𝑍1 and obtain the
cofficient 𝛽𝛽̂1
 Estimate the structural equation by 2SLS using the instrument 𝑍𝑍2 and obtain the
cofficient 𝛽𝛽̂1 ′
 Test the hypothesis 𝐻𝐻0 : 𝑑𝑑 = 𝛽𝛽̂1 − 𝛽𝛽̂1 ′ = 0, if P-value >> then at least some of the
IV 𝑍𝑍1 , 𝑍𝑍2 are not exogenous.
Testing for instrumental variables

Testing for the weakly instrumental


 The instrumental variable 𝑍𝑍 is said to be strongly instrumental if it is highly
correlated with the endogenous independent variable and weakly
instrumental otherwise.
 Estimating the endogenous variable 𝑋𝑋 according to the instrumental variable
𝑍𝑍1 , 𝑍𝑍2 :
𝑋𝑋 = 𝛿𝛿0 + 𝛿𝛿1 𝑍𝑍1 + 𝑣𝑣
 Use the F-statistic of variable 𝑍𝑍 to conclude whether 𝑍𝑍1 , 𝑍𝑍2 is the weak instrument
or not: If the value of F is less than 10, it can be concluded that at least one of the
two instrumental variables 𝑍𝑍1 and 𝑍𝑍2 is weak instrument.
The general model with
endogenous variables

𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋1 + ⋯ + 𝛽𝛽𝑘𝑘 𝑋𝑋𝑘𝑘 + 𝛽𝛽𝑘𝑘+1 𝑊𝑊1 + ⋯ + 𝛽𝛽𝑘𝑘+𝑚𝑚 𝑊𝑊𝑚𝑚 + 𝑢𝑢 (*)
In which, 𝑋𝑋1 , … , 𝑋𝑋𝑘𝑘 are the endogenous variable; 𝑊𝑊1 , … , 𝑊𝑊𝑚𝑚 are the exogeneous variable;
and 𝑍𝑍1 , … , 𝑍𝑍𝑝𝑝 𝑝𝑝 ≥ 𝑘𝑘 are the instrument variables.
Step 1: Regression 𝑋𝑋1 on all instrumental variables 𝑍𝑍1 , … , 𝑍𝑍𝑚𝑚 and al the exogenous
independent variables 𝑊𝑊1 , … , 𝑊𝑊𝑚𝑚 by OLS method, store the estimated value as 𝑋𝑋�1 . Repeat
the same for the remaining endogenous independent variables, saving the estimated
values 𝑋𝑋�1 , 𝑋𝑋�2 , … , 𝑋𝑋�𝑘𝑘 .
Step 2: Regression of equation (*) by OLS method with exogenous independent variables
𝑊𝑊1 , … , 𝑊𝑊𝑚𝑚 , endogenous independent variables 𝑋𝑋1 , … , 𝑋𝑋𝑘𝑘 are replaced by estimated
values 𝑋𝑋�1 , 𝑋𝑋�2 , … , 𝑋𝑋�𝑘𝑘 .
Command on Stata

𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝛽𝛽2 𝑊𝑊 + 𝑢𝑢 (*)


in which 𝑋𝑋 is the endogeneous, 𝑊𝑊 is the exogeneous, and 𝑍𝑍 is the instrument variable
 Estimate 2SLS
Ivregress 2sls Y W (X=Z)
Ivregress 2sls Y W (X=Z), small (if the sample is small)
Ivregress 2sls Y W (X=Z), vce(robust) small (if variance is not homogenous)
Command on Stata

𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝛽𝛽2 𝑊𝑊 + 𝑢𝑢 (*)


in which 𝑋𝑋 is the endogeneous, 𝑊𝑊 is the exogeneous, and 𝑍𝑍 is the instrument variable
 Test the endogeneity of the variable X:
 Way 1: reg X Z W
predict Vhat, residuals
reg Y W Vhat => if P-value of Vhat << then X is endogeneous
Command on Stata

𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝛽𝛽2 𝑊𝑊 + 𝑢𝑢 (*)


in which 𝑋𝑋 is the endogeneous, 𝑊𝑊 is the exogeneous, and 𝑍𝑍 is the instrument variable
 Test the endogeneity of the variable X:
 Way 2: reg Y X W
est store ls
ivregress 2sls Y W (X=Z)
est store iv
Hausman iv ls, constant sigmamore
If P-value << then X is endogeneous
Command on Stata

𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝛽𝛽2 𝑊𝑊 + 𝑢𝑢 (*)


in which 𝑋𝑋 is the endogeneous, 𝑊𝑊 is the exogeneous, 𝑍𝑍1 , 𝑍𝑍2 are the instrument variables
 Test the exogeneous of the instrument variable 𝐙𝐙𝟏𝟏 , 𝐙𝐙𝟐𝟐 :
 Way 1:
ivregres 2sls Y (X=Z1 Z2) W, small
estat overid
If P-value << then at least some of the IV 𝑍𝑍1 , 𝑍𝑍2 are not exogenous
Command on Stata

𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝛽𝛽2 𝑊𝑊 + 𝑢𝑢 (*)


in which 𝑋𝑋 is the endogeneous, 𝑊𝑊 is the exogeneous, and 𝑍𝑍 is the instrument variable
 Test the exogeneous of the instrument variable 𝐙𝐙𝟏𝟏 , 𝐙𝐙𝟐𝟐 :
 Way 2: ivregres 2sls Y (X=Z1) W, small
est storer z1
ivregres 2sls Y (X=Z2) W, small
est storer z2
Hausman z1 z2, constant sigmamore
If P-value << then at least some of the IV 𝑍𝑍1 , 𝑍𝑍2 are not exogenous
Command on Stata

𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝛽𝛽2 𝑊𝑊 + 𝑢𝑢 (*)


in which 𝑋𝑋 is the endogeneous, 𝑊𝑊 is the exogeneous, and 𝑍𝑍 is the instrument variable
 Test the weak instrument of variable 𝒁𝒁𝟏𝟏 , 𝒁𝒁𝟐𝟐 :
ivregres 2sls Y (X=Z1 Z2) W, small
estat firststage
If F-statistic value < 10 then at least some of the IV 𝑍𝑍1 , 𝑍𝑍2 is weakly instrument
If F-statistic value > 10 then both 𝑍𝑍1 , 𝑍𝑍2 is strong instrument
Practice

You might also like