0% found this document useful (0 votes)
14 views80 pages

Fixed Effects and DiD Methods Explained

The document discusses evidence-based policy analysis methods, focusing on Fixed Effects, Difference-in-Differences, and Synthetic Control Method. It explains the advantages of panel data, the fixed effects model, and the identification assumptions necessary for causal interpretation. Additionally, it provides examples, such as estimating the union wage premium and historical applications like John Snow's cholera study, to illustrate these econometric techniques.

Uploaded by

ila.ber.ekerr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views80 pages

Fixed Effects and DiD Methods Explained

The document discusses evidence-based policy analysis methods, focusing on Fixed Effects, Difference-in-Differences, and Synthetic Control Method. It explains the advantages of panel data, the fixed effects model, and the identification assumptions necessary for causal interpretation. Additionally, it provides examples, such as estimating the union wage premium and historical applications like John Snow's cholera study, to illustrate these econometric techniques.

Uploaded by

ila.ber.ekerr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Evidence-Based Policy Analysis

Topic 6: Fixed Effects, Difference-in-Differences and


Synthetic Control Method

Prof. Dr. Michael Kvasnicka


Chair of Applied Economics

Otto-von-Guericke-Universität Magdeburg

1 / 80
Outline

1. Introduction

2. Fixed Effects Model

3. Difference-in-Differences

4. Synthetic Control Method

Readings

2 / 80
1. Introduction
A Brief Review of Panel Data and Fixed-Effects Models

I What is panel data?


I A cross-section (of people, countries, etc.) is observed over time.
I Panel data provide observations on the same units in several time
periods (unlike independently pooled cross sections).
I Panel data often consist of a very large number of cross section units
that are observed over a small number of time periods.

I Advantages of panel data:


I examine issues that cannot be studied using either time series or
cross-sectional data.
I deal with unobserved time-invariant heterogeneity in the micro units
(omitted variable bias!).
I analyze dynamics with only a short time series.
I increase the efficiency of estimation.

3 / 80
1. Introduction
Example: Estimating the Union Wage Premium

I Research question: Does union membership increase wages?

I Suppose that we estimate the following regression:

Yit = β0 + β1 Dit + Xit0 β2 + εit

where:

Yit = wage rate



1 if individual i is member of a union in year t
Dit =
0 otherwise
Xit = vector of (exogenous and time-varying) control variables

4 / 80
1. Introduction
Example: Estimating the Union Wage Premium

I Suppose, however, that the true model reads:

Yit = γ0 + γ1 Dit + Xit0 γ2 + γ3 Mi + εit

where Mi is an unobserved and time-invariant individual-specific


characteristic (e.g. ability, motivation).

I Mi is called an (individual-specific) fixed effect.

I Note, if:
1. M has an effect on Y , and
2. M and D are correlated,
⇒ we face classical omitted variable bias (see Topic 3).

5 / 80
2. Fixed Effects Model

I With panel data, the union effect can be consistently estimated


without actually observing the fixed effect (FE)!

I The fixed effects model uses a transformation to remove the


unobserved fixed effect (prior to estimation).
I There are 3 approaches to estimate the FE model:
1. Time-Demeaning (FE/Within Estimator, Analysis of Covariance)
= estimate in deviations from the mean
2. First-Differencing (First Differences Estimator)
= estimate in first differences
3. Dummy Variables Regression (Dummy Variables Estimator)
= include a dummy for every cross-sectional unit i

6 / 80
2. Fixed Effects Model
Which FE Transformation to Use?

I Which fixed effect estimator should one choose?


I if T = 2, FE, FD and Dummy Variables Estimators are identical.
I if T > 2, the preferred estimator depends on the assumptions about
the errors εit :
In practice, it is more common to assume that the errors εit (and not
their differences ∆εit ) are serially uncorrelated.
In this case, FE is efficient and hence more appropriate than FD
(otherwise, FD would be efficient and the best choice).
I if one is interested in the FE (which sometimes is the case), one
would want to use the Dummy Variables Estimator.

7 / 80
2. Fixed Effects Model
Estimation of FE Models in Stata

1. Fixed Effects Estimator:


xtreg Y D X, fe

2. First Difference Estimator:


reg D.(Y D X)

3. Dummy Variable Estimator:


xi: reg Y D X [Link]

8 / 80
2. Fixed Effects Model
Interpretation of FE Models

I Estimates of γ1 reflect the within-unit change in treatment (D):


I each unit i serves as its own control group
I empirically, need within-unit variation / only ’switchers’ matter

I Controls for (un-)observed time-invariant heterogeneity:


I removes a large number of potential time-invariant confounders
I linear and additive treatment

I However, time-varying unobserved heterogeneity (omitted variables)


may also matter:
I what causes the change in treatment?
I may this have an impact on the outcome?

9 / 80
2. Fixed Effects Model
Identifying Assumption

I Conditional Independence Assumption, CIA:


Conditional on removing the (1) time-invariant FEs, and (2)
time-varying observables X , treatment is independent of potential
outcomes:

(Y0it , Y1it )⊥Dit |Mi , Xit


i.e., it holds that:

E [Y0it |Mi , Xit , Dit = 1] = E [Y0it |Mi , Xit , Dit = 0] = E [Y0it |Mi , Xit ]
E [Y1it |Mi , Xit , Dit = 1] = E [Y1it |Mi , Xit , Dit = 0] = E [Y1it |Mi , Xit ]

10 / 80
2. Fixed Effects Model
Identifying Assumption

I Comparisons of avg. outcomes by treatment status then have a


causal interpretation:

E [Yit |Mi , Xit , Dit = 1] − E [Yit |Mi , Xit , Dit = 0]


= E [Y1it |Mi , Xit , Dit = 1] − E [Y0it |Mi , Xit , Dit = 0]
= E [Y1it |Mi , Xit ] − E [Y0it |Mi , Xit ]
= E [Y1it − Y0it |Mi , Xit ]

11 / 80
2. Fixed Effects Model
Example: Estimating the Union Wage Premium (ctd.)

I Freeman (1984)1 analyzed union wage effects by comparing OLS


and FE models for a number of datasets:

Survey OLS estimate FE estimate


May CPS, 1974-75 .19 .09
National Longitudinal .28 .19
Survey of Young Men, 1970-78
Michigan PSID, 1970-79 .23 .14
QES, 1973-77 .14 .16
Note: The cross section estimates include controls for demographic and human capital variables.

I The fact that the FE estimates are generally smaller points to a


positive selection bias in the cross section OLS estimates.
1
Richard B. Freeman (1984), ”Longitudinal Analysis of the Effect of Trade
Unions”, Journal of Labor Economics, 2(1), 1-26.
12 / 80
2. Fixed Effects Model
Caveats of the FE Estimator

I The FE estimator (within estimator) uses only the time variation


within cross-sectional units i.

I As a consequence:
I it discards variation across cross-sections (between variation).
I it does not allow us to estimate the coefficients of time-invariant
regressors (gender, education, ...).
I does not solve the problem of time-varying omitted variables.
I demeaned regressors may be more susceptible to measurement error.

13 / 80
2. Fixed Effects Model
Caveats of the FE Estimator: Measurement Error

I If OLS estimates > FE estimates, then this may, but need not,
indicate selection bias.

I Alternative: attenuation bias from measurement error (ME)


I economic variables are often persistent (”once a member, always a
member”). ME error, however, often changes from year to year (union
status may be misreported/miscoded this year but not next year).
I so, while union status may be misreported/miscoded for only a few
workers in any single year, the observed year-to-year changes in union
status may be mostly noise. I.e., there is more ME in the differenced/
demeaned regressors in a FE regression than in the levels of the
regressors used in an OLS regression.
⇒ as the signal to noise ratio is smaller with FE (as we just use the
deviations from the mean as signal), ME is typically a more important
problem in FE models.
14 / 80
2. Fixed Effects Model
Caveats of the FE Estimator: Measurement Error (An Example)

I Suppose that we have data on two individuals:

Union Status [Actual (Measured)]


Individual 2010 2011 2012 2013
1 1 1 1(0) 1
2 0 0 0 0

I If we ran a pooled OLS regression of union status on wages, 1/8th


of the observations would be mismeasured.

I Suppose you estimate instead a FE model, so identification comes


from changes in union status within individuals:
I individual 2 does not contribute to the FE estimation.
I the variation in individual 1’s union status is only measurement error
⇒ 100% of the variation is noise!
15 / 80
2. Fixed Effects Model
Extension: Two-Way-Effects Model

I A standard extension of the fixed effects model is to allow variation


in the intercept not only over individuals but also over time:

Yit = γ0 + γ1 Dit + Xit0 γ2 + γ3 Mi + λt + εit

I The time effects λt shift the intercept over time and affect all
micro-units uniformly.

I Examples:
I business cycle movements
I common trend in wages
I ...

I Usually, a full set of time dummies is included among the controls.

16 / 80
2. Fixed Effects Model
Examples of Special Panels (Households, Twins)

I Some important panels used in labor and family economics consider


multiple individuals from the same households.

I Example I: Siblings within a family


I can control for time-invariant family FE (or mother FE)

I Example II: Twins (natural experiment)


I can control for time-invariant individual ability

17 / 80
3. Difference-in-Differences

"Difference-in-differences methods have been an important


tool for empirical researchers since the early 1990s."
Athey and Imbens (2017)2 , p. 9

I FE strategy requires panel data (repeat obs. on same units).


I Often, however, the regressor of interest varies only at a more
aggregate or group level, such as state or cohort:
I e.g., state policies regarding health care benefits for pregnant workers
may change over time but are fixed across workers within states.
I the source of OVB when evaluating these policies must therefore be
unobserved variables at the state and year level.
I the difference-in-differences (DiD) identification strategy captures
group-level omitted variables by group-level FE.
2
Athey, S. and G. W. Imbens (2017), ”The State of Applied Econometrics:
Causality and Policy Evaluation”, Journal of Economic Perspectives, 31(2), 3-32.
18 / 80
3. Difference-in-Differences

I The DiD estimator is a version of FE estimation, in which the


time-invariant factor applies to a group rather than to an individual.

I DiD strategies are simple panel-data methods applied to sets of


group means in cases when certain groups are exposed to a causing
variable of interest (treatment) and others are not.

⇒ Intuitive and transparent approach widely used to estimate the effect


of changes in the economic environment or changes in government
policy!

19 / 80
3. Difference-in-Differences
In a Nutshell

I Idea:
I observed changes over time for the control group provide the
counterfactual trend for the treatment group.

I Main selling point:


I DiD accounts for unobserved differences b/w treatment and control
group as long as they are constant over time.
I DiD estimator allows us to control for unobserved time-invariant
factors that may induce selection bias.

20 / 80
3. Difference-in-Differences
In a Nutshell

I Identifying Assumption:
I outcomes in treatment and control move in parallel in the absence of
treatment (Common Trend Assumption).

! Note: Identification relies on exogenous variation that alters


treatment status.

! Note: Even if pre-trends are the same, one still has to worry about
other policies changing at the same time (”contemporaneous
shocks”).

I The DiD estimator measures the ATE.

21 / 80
3. Difference-in-Differences
In a Nutshell

I Main steps:
I collect data on treated and non-treated before and after treatment.
I compute the differences in the outcome for treated and non-treated
before and after the treatment.
I treatment effect = difference b/w these two differences.

22 / 80
3. Difference-in-Differences
Example: The Causes of Cholera

I In the early 19th century, it was widely believed that the cause of
cholera was miasma (airborne desease).
I However, physician John Snow, who was to become the father of
modern epidemiology, correctly believed that it was waterborne.
I In his study on cholera, Snow (1855)3 pioneered the DiD idea:
I Snow studied a neighborhood in Northwest Soho London, where 2
private companies supplied households with water (Lambeth,
Southwark & Vauxhall)
I until 1852: both companies drew water downstream from Thames
River, which was contaminated with sewage from London.
I in 1852: Lambeth moved its water works upriver to an area relatively
free of sewage.
3
John Snow (1855): On the Mode of Communication of Cholera, London: John
Churchill, New Burlington Street, England.
23 / 80
3. Difference-in-Differences
Example: The Causes of Cholera

This map was used by Dr. Snow to describe the ”grand experiment” of 1854 comparing cholera mortality among persons
consuming downstream contaminated water (Southwark and Vauxhall Company - blue, but faded to green on the map) versus
upstream cleaner water of the Thames (Lambeth Company - red). The overlapping area (purple but faded to gray-red on the
map) is where John Snow analyzed the results of the natural experiment. [Source: Map 2 in John Snow’s book.]

24 / 80
3. Difference-in-Differences
Example: The Causes of Cholera

I Death rates per 10,000 residents by water company:


1849 1853/54 Difference
Lambeth 150 10 -140
Southwark & Vauxhall 125 150 25
Difference 25 -140 -165

⇒ Calculate:
Pre-exper. difference: 150 (treated) - 125 (control)= 25
Post-exper. difference: 10 (treated) - 150 (control)= -140
Difference-in-differences: -140 (treated) - 125 (control)= -165

I Idea:
I pre-exper. difference captures normal difference
I post-exper. difference captures normal difference plus causal effect!
25 / 80
3. Difference-in-Differences
Example: Monetary Intervention and the Great Depression

I Banks failed throughout the Great Depression.

I Their demise contributed to the disruption of financial


intermediation.

I Research question: Could a more expansionary monetary policy have


mitigated banking panics?4

4
Gary Richardson and William Troost (2009), ”Monetary Intervention Mitigated
Banking Panics during the Great Depression: Quasi-Experimental Evidence from a
Federal Reserve District Border, 1929-1933”, Journal of Political Economy, 117(6),
1031-1073.
26 / 80
3. Difference-in-Differences
Example: Monetary Intervention and the Great Depression

I US Federal Reserve System is organized into 12 districts, each run


by a regional Federal Reserve Bank.

I Depression-era heads of regional Feds had considerable policy


independence:
I Atlanta Fed (6th district) favoured lending to troubled banks.
I St Louis Fed (8th district) restricted credit in a recession.

I Border between 6th and 8th districts runs through the state of
Mississippi.

27 / 80
3. Difference-in-Differences
Example: Monetary Intervention and the Great Depression

I Treatment: ’Easy money’


I Outcomes: Number of banks (economic activity)
I Treatment group: 6th district (increased lending)
I Control group: 8th district (restricted lending)

28 / 80
3. Difference-in-Differences
Example: Monetary Intervention and the Great Depression

I Naive evaluation I: Compare outcomes of treatment group before


and after the crisis:

Y6,1931 − Y6,1930 = 121 − 135 = −14

I Naive evaluation II: Compare outcomes of treatment and control


groups after the crisis.

Y6,1931 − Y8,1931 = 121 − 132 = −11

I These naive comparisons (what’s wrong with them?) are inputs into
the difference-in-differences (DiD) estimator.

29 / 80
3. Difference-in-Differences
Example: Monetary Intervention and the Great Depression

⇒ Difference-in-differences:

1. Compare change in outcomes b/w treatment and control group:

δDD = (Y6,1931 − Y6,1930 ) − (Y8,1931 − Y8,1930 )


= (121 − 135) − (132 − 165)
= −14 − (−33) = 19

2. Compare post-treatment difference in outcomes to the


pre-treatment difference in outcomes:

δDD = (Y6,1931 − Y8,1931 ) − (Y6,1930 − Y8,1930 )


= (121 − 132) − (135 − 165)
= −11 − (−30) = 19

30 / 80
3. Difference-in-Differences
Example: Monetary Intervention and the Great Depression

⇒ Common trend assumption:

I Absent any policy differences, number of banks in the 8th and 6th
district would have evolved in parallel.
I Under this assumption, the trend in 8th district is informative about
what would have happened in 6th district without easy money.
I This assumption can not be tested (why?).
I But we can check its plausibility by comparing pre-treatment trends
b/w treatment and control.

31 / 80
3. Difference-in-Differences
Example: Monetary Intervention and the Great Depression
184 Chapter 5

Figure 5.1
Bank failures in the Sixth and Eighth Federal Reserve Districts

Eighth District
Number of banks in business 160

140
Sixth District

120
Sixth District counterfactual
Treatment effect

100

1929 1930 1931 1932


Year

Notes: This figure shows the number of banks in operation in Mississippi in


the Sixth and Eighth Federal Reserve Districts in 1930 and 1931. The dashed
line depicts the counterfactual evolution of the number of banks in the Sixth
District if the same number of banks had failed in that district in this period
as did in the Eighth. From Mastering ‘Metrics: The Path from Cause to Effect. © 2015 Princeton University Press. Used by permission.
All rights reserved.

! Note: DiD attributes any difference in trends between treatment and control
The DD tool amounts to a comparison of slopes or trends
that occur atacross
the time of the
districts. intervention
The dotted to that
line in Figure 5.1 isintervention.
the counter- 32 / 80
3. Difference-in-Differences
Example: Monetary Intervention and the Great Depression
Differences-in-Differences 185

Figure 5.2
Trends in bank failures in the Sixth and Eighth Federal
Reserve Districts
180
Number of banks in business

160

Eighth District

140

Sixth District
120

100

1929 1930 1931 1932 1933 1934


Year

Note: This figure shows the number of banks in operation in Mississippi in


the Sixth and Eighth Federal Reserve Districts between 1929 and 1934.
From Mastering ‘Metrics: The Path from Cause to Effect. © 2015 Princeton University Press. Used by permission.
All rights reserved.

Sixth. Although strong, the common trends assumption seems


like a reasonable starting point, one that takes account of pre- 33 / 80
3. Difference-in-Differences
Example: Monetary Intervention and the Great Depression
186 Chapter 5

Figure 5.3
Trends in bank failures in the Sixth and Eighth Federal Reserve
Districts, and the Sixth District’s DD counterfactual
180

160 Eighth District


Number of banks in business

Sixth District
140

120

100
Sixth District counterfactual

80

1929 1930 1931 1932 1933 1934


Year

Notes: This figure adds DD counterfactual outcomes to the banking data


plotted in Figure 5.2. The dashed line depicts the counterfactual evolution of
the number of banks in the Sixth District if the same number of banks had
failed in that district after 1930 as did in the Eighth.
From Mastering ‘Metrics: The Path from Cause to Effect. © 2015 Princeton University Press. Used by permission.
All rights reserved.

between actual and counterfactual Sixth District banking ac- 34 / 80


3. Difference-in-Differences
Example: Monetary Intervention and the Great Depression

I We can estimate the DiD estimator in a regression framework.

I Advantages:
1. easy to calculate standard errors
2. we can add additional control variables:
I to increase precision by reducing residual variance
I to control for observable group-specific trends

3. easy to extend to multiple periods


4. we can study treatments with different treatment intensity (rather
than binary treatment)

35 / 80
3. Difference-in-Differences
Example: Monetary Intervention and the Great Depression

I The simple DiD regression is as follows:

Yit = α + βTREATi + γPOSTt + δrDD (TREATi × POSTt ) + eit

where:
I Yit : Outcome of interest for unit i at time t.
I TREATi : Dummy for (units in the) treatment group.
I POSTt : Dummy for observations from the post-treatment period.
I TREATi × POSTt : Interaction term, indicates observations in
treatment group from post-treatment period.

36 / 80
3. Difference-in-Differences
Example: Monetary Intervention and the Great Depression

I The simple DiD regression is as follows:

Ydt = α + βTREATd + γPOSTt + δrDD (TREATd × POSTt ) + edt

where:
I Ydt : number of banks in district d at time t.
I TREATd : dummy for 6th district.
I POSTt : dummy for observations from 1931 onwards.
I TREATd × POSTt : interaction term, indicates observations for 6th
district from 1931 onwards.

37 / 80
3. Difference-in-Differences
Example: Monetary Intervention and the Great Depression

I The simple DiD regression is as follows:

Ydt = α + βTREATd + γPOSTt + δrDD (TREATd × POSTt ) + edt

where:
I Ydt : number of banks in district d at time t.
I TREATd : controls for time-constant differences b/w 6th and 8th
district.
I POSTt : controls for changes in the post-treatment period common
to control and treatment group.
I TREATd × POSTt : treatment dummy; δrDD is coefficient of interest.

38 / 80
3. Difference-in-Differences
Example: Monetary Intervention and the Great Depression

I The simple DiD regression is as follows:

Ydt = α + βTREATd + γPOSTt + δrDD (TREATd × POSTt ) + edt

I Illustration of DiD regression:

1930 1931 1931-1930


6th district α+β α+β+γ+ γ + δrDD
δrDD
8th district α α+γ γ
6th district - β β + δrDD δrDD
8th district

! Note: With two periods and two districts, δrDD = δDD .


39 / 80
3. Difference-in-Differences
Example: Monetary Intervention and the Great Depression

I Easy money and real economic activity:


190 Chapter 5

Table 5.1
Wholesale firm failures and sales in 1929 and 1933

Difference
1929 1933 (1933–1929)
Panel A. Number of wholesale firms
Sixth Federal Reserve District (Atlanta) 783 641 −142
Eighth Federal Reserve District (St. Louis) 930 607 −323
Difference (Sixth–Eighth) −147 34 181

Panel B. Net wholesale sales ($ million)


Sixth District Federal Reserve (Atlanta) 141 60 −81
Eighth District Federal Reserve (St. Louis) 245 83 −162
Difference (Sixth–Eighth) −104 −23 81
Notes: This table presents a DD analysis of Federal Reserve liquidity effects on
the number of wholesale firms and the dollar value of their sales, paralleling the
DD analysis of liquidity effects on bank activity in Figure 5.1.
From Mastering ‘Metrics: The Path from Cause to Effect. © 2015 Princeton University Press. Used by permission.
All rights reserved.

simple DD analysis of Federal Reserve liquidity effects on the 40 / 80


3. Difference-in-Differences
Example: MLDA and Death Rates of Young Adults

I Research question: Does a higher minimum legal drinking age


(MLDA) reduce death rates of young adults in the US?

I Research strategy: Exploit state-level variation in the MLDA from


1970 to 1983.

41 / 80
3. Difference-in-Differences
Example: MLDA and Death Rates of Young Adults

I In a first step, we could study two states, say Alabama and Arkansas
(2-state DiD).

I While Alabama lowered its MLDA to 19 in 1975, the MLDA in


Arkansas remained at 21 over the sample period.

I How would the DiD regression look like?

42 / 80
3. Difference-in-Differences
Example: MLDA and Death Rates of Young Adults

I The simple DiD regression is as follows:

Yst = α + βTREATs + γPOSTt + δrDD (TREATs × POSTt ) + est

where:
I Yst : Death rates of 18-20-years-old for state s at time t.
I TREATs : Dummy for Alabama.
I POSTt : Dummy for observations from 1975 onwards.
I TREATs × POSTt : Interaction term, indicates observations for
Alabama from 1975 onwards (what does δrDD measure?).

43 / 80
3. Difference-in-Differences
Example: MLDA and Death Rates of Young Adults

I There are many such MLDA experiments in the data, with various
states switching to lower MLDAs over time.

I We can combine these experiments into one multi-state DiD


regression.

I This multi-state DID regression uses panel data for 14 years and 51
states (⇒ 14 × 51 = 714 observations).

44 / 80
3. Difference-in-Differences
Example: MLDA and Death Rates of Young Adults

I The multi-state DiD regression is as follows:


Wyoming
X 1983
X
Yst = α+ βk STATEks + γj YEARjt +δrDD LEGALst +est
k=Alaska j=1971

where:
I Yst : Death rates of 18-20-years-old for state s at time t.
I STATEks : Dummies for each state (state fixed effects).
I YEARjt : Dummies for each year (year fixed effects).
I LEGALst : Treatment variable.

45 / 80
3. Difference-in-Differences
Example: MLDA and Death Rates of Young Adults

State fixed effects:


I Dummies for each state in the sample except one.
I Replace the single dummy for the treatment group in the two-state
analysis.
I Control for state-specific factors that are constant over time (such
as?).

46 / 80
3. Difference-in-Differences
Example: MLDA and Death Rates of Young Adults

Year fixed effects:


I Dummies for each year in the sample except one.
I Replace the single dummy for the post-treatment period group in
the two-state analysis.
I Control for year-specific factors that are common to all states (such
as?).

47 / 80
3. Difference-in-Differences
Example: MLDA and Death Rates of Young Adults

Treatment variable:
I Usually a dummy indicating whether unit/state was exposed to
treatment in a specific year.
I In our example, continuous measure of the proportion of
18-20-years-old allowed to drink in state s and year t.
I Variable captures that MLDA varies from 18 to 21.

48 / 80
3. Difference-in-Differences
Example: MLDA and Death Rates of Young Adults
Results: 196 Chapter 5

Table 5.2
Regression DD estimates of MLDA effects on death rates

Dependent variable (1) (2) (3) (4)


All deaths 10.80 8.47 12.41 9.65
(4.59) (5.10) (4.60) (4.64)
Motor vehicle accidents 7.59 6.64 7.50 6.46
(2.50) (2.66) (2.27) (2.24)
Suicide .59 .47 1.49 1.26
(.59) (.79) (.88) (.89)
All internal causes 1.33 .08 1.89 1.28
(1.59) (1.93) (1.78) (1.45)

State trends No Yes No Yes


Weights No No Yes Yes
Notes: This table reports regression DD estimates of minimum legal
drinking age (MLDA) effects on the death rates (per 100,000) of 18–20-
year-olds. The table shows coefficients on the proportion of legal drinkers
by state and year from models controlling for state and year effects. The
models used to construct the estimates in columns (2) and (4) include state-
specific linear time trends. Columns (3) and (4) show weighted least squares
estimates, weighting by state population. The sample size is 714. Standard
errors are reported in parentheses.
From Mastering ‘Metrics: The Path from Cause to Effect. © 2015 Princeton University Press. Used by permission.
All rights reserved.

⇒ Legal alcohol access


[Link] about
The regression DD 11 additional
evidence for an effectdeaths
on suicideper
is 100,000
18-20-years-olds!weaker than the corresponding RD evidence in Table 4.1. At
the same time, both strategies suggest any increase in numbers 49 / 80
3. Difference-in-Differences
Example: Min. Wages and Employment

I Card and Krueger (1994)5 analyze the effect of a rise in the state
minimum wage in New Jersey from $4.25 to $5.05 on April 1, 1992
on employment in fast-food restaurants:
Research question:
”Did the minimum wage increase harm employment in New Jersey?”
Treatment group:
Fast-food restaurants in the US state of New Jersey (NJ).
Control group:
Fast-food restaurants in eastern part of neighbouring Pennsylvania
(PA) where minimum wage remained constant at $4.25.
! Recall: The quality of the comparison group determines the quality
of the evaluation!
5
Card, D. and A.B. Krueger (1994), ”Minimum wages and employment: A case
study of the fast-food industry in New Jersey and Pennsylvania”, American Economic
Review, 84(4), 772-793.
50 / 80
3. Difference-in-Differences
Example: Min. Wages and Employment

[Source: Wikipedia.] 51 / 80
3. Difference-in-Differences
Example: Min. Wages and Employment

I Data:
Survey of about 400 fast food stores in NJ and PA before / after
the minimum wage increase:
I before: February 1992
I after: November 1992

... collecting information on the number of employees and further


establishment-level characteristics.

I Identification Strategy (DiD):

Compare the change in employment in NJ to the change in


employment in PA b/w February and November 1992.

52 / 80
3. Difference-in-Differences
Example: Min. Wages and Employment

I Findings:

Variable PA NJ Difference
FTE employment before 23.33 20.44 − 2.89
(1.35) (0.51) (1.44)
FTE employment after 21.17 21.03 − 0.14
(0.94) (0.52) (1.07)
Change in FTE employment − 2.16 0.59 2.76
(0.94) (0.54) (1.33)
[Source: adapted from Card and Krueger (1994), Table 3.]

⇒ Surprisingly, employment rose in NJ relative to PA after the


minimum wage increase in NJ!

53 / 80
3. Difference-in-Differences
Example: Min. Wages and Employment

I Regression DiD:

Yist = β0 + β1 NJs + β2 Tt + β3 (Tt × NJs ) + εist

where:

Yist = employment in establishment i in state s at time t


NJs = dummy, = 1 if obs. is from NJ
Tt = dummy, = 1 if obs. is from November (post treatment)

I Card and Krueger (1994) also add further control variables Xit to the
baseline regression. β3 , however, remains positive and significant.

54 / 80
3. Difference-in-Differences
Example: Min. Wages and Employment

I Is this convincing evidence against standard labor demand theory?


The key identifying assumption is the common trend assumption:
DiD assumes that employment trends are the same in the absence of
treatment:
I although NJ and PA might differ, any differences are assumed to be
time-invariant.
I hence, there must not be any differential trends in NJ and PA.

I How convincing is this identifying assumption?


I let’s look at administrative payroll data for restaurants in PA and a
number of counties in NJ, obtained by Card and Krueger (2000)6 in
an update of their original minimum wage study ...
6
Card, D. and A.B. Krueger (2000),”Minimum wages and employment: A case
study of the fast-food industry in New Jersey and Pennsylvania: Reply”, American
Economic Review, 90(5), 1397-1420.
55 / 80
3. Difference-in-Differences
Example: Min. Wages and Employment

I Does PA provide a good measure of counterfactual employment


rates in NJ in the absence of the minimum wage increase? Maybe
not!

[Source: Card and Krueger (2000), p. 1406. As in the original Card and Krueger survey, the administrative data show a slight
decline in employment from February to November 1992 in PA, and little change in NJ over the same period. However, the
data also reveal substantial year-to-year employment variation in other periods. These swings often seem to differ
substantially in the two states. In particular, while employment levels in NJ and PA were similar at the end of 1991,
employment in PA fell relative to employment in NJ over the next 3 years (especially in the 14-county group), mostly before
the 1996 increase in the federal minimum wage (to $4.75, which affected PA, but not NJ). So PA may not provide a very
good measure of counterfactual employment rates in NJ in the absence of a minimum wage change.] 56 / 80
3. Difference-in-Differences
Sensitivity Analysis

1. Look at pre-treatment trends of treatment and control groups:


⇒ They should be parallel!

2. Perform a ”placebo” DiD, i.e., use a ”fake” treatment group:


I pretend treatment was given in previous years (e.g. years −2, −1).
I use as a treatment group a population you know was NOT affected.
⇒ DiD estimates should be zero (otherwise trends are not parallel)!

3. Use an outcome variable Y which you know is NOT affected by the


treatment:
I use the same comparison group and treatment year.
⇒ DiD estimate should be zero (otherwise trends may not be parallel)!

57 / 80
3. Difference-in-Differences
Standard Errors in DiD Strategies

"Most papers that employ Differences-in-Differences (DD)


estimation use many years of data and focus on serially
correlated outcomes [...] because of serial correlation,
conventional DD standard errors may grossly understate the
standard deviation of the estimated treatment effects,
leading to serious overestimation of t-statistics and
significance levels."
Betrand et. al. (2004)7 , p. 249/273

7
Bertrand, M., E. Duflo and S. Mullainathan (2004), ”How Much Should We Trust
Differences-In-Differences Estimates?”, The Quarterly Journal of Economics, 119(1),
249-275.
58 / 80
3. Difference-in-Differences
Standard Errors in DiD Strategies

I Many papers using a DiD strategy use data from many years (not
only 1 pre and 1 post period):
I the variables of interest in many of these setups only vary at a group
level (say state) and outcome variables are often serially correlated.
⇒ Bertrand, Duflo, and Mullainathan (2004) point out that, as a
consequence, conventional standard errors often severely understate
the standard deviation of the estimators.

I Example: Card and Krueger (1994) minimum wage study


I very likely that employment in each state (NJ, PA) is not only
correlated within the state but also serially correlated.

59 / 80
3. Difference-in-Differences
Standard Errors in DiD Strategies
I Bertrand et al. (2004) propose the following solutions:
1. block bootstrapping standard errors (if you analyze states the block
should be the states and you would sample whole states with
replacing for the bootstrapping).
2. clustering standard errors at the group level (in STATA one would
simply add cl(state) to the regression equation if one analyzes
state level variation).
3. aggregating the data into one pre and one post period. Literally
works only if there is only one treatment date. With staggered
treatment dates one should adopt the following procedure:
1. regress Yst on state FE, year FE, and relevant covariates.
2. obtain residuals from the treatment states only and divide them into
2 groups: pre- and post-treatment.
3. regress the two groups of residuals on a post dummy.
! Note: correct treatment of standard errors sometimes makes the number
of groups very small, e.g. in Card and Krueger (1994), only have 2 groups.
60 / 80
3. Difference-in-Differences
Example: Fukushima and House Prices

I We next consider an example of a study that makes use of


regression DiD including leads and lags.
I Including leads into the DiD regression model is an easy way to
analyze differences in pre-treatment trends (but it requires multiple
pre-treatment observations):
I To do so, interact dummies for pre-treatment periods with the
treatment group dummy.
I Coefficients on these dummies should be zero!

I Similarly, you can add lags to analyze whether the treatment effect
changes over time after the treatment.

61 / 80
3. Difference-in-Differences
Example: Fukushima and House Prices

I The estimated regression would be:


−1
X m
X
Yit = α + Tt β + δPi + ρt=τ (Tt=τ × Pi ) + ρt=τ (Tt=τ × Pi ) +ηit
τ =−q τ =0
| {z } | {z }
=leads =lags

where:
I treatment occurs in year τ = 0
I regression includes q leads
I regression includes m lags

62 / 80
3. Difference-in-Differences
Example: Fukushima and House Prices

I Bauer et al. (2017)8 analyze the effect of the March 2011


Fukushima accident in Japan on the prices of real estate located
close to German nuclear power plants (NPPs):
I Germany’s coalition government closed 8/17 NNPs in August 2011.
I phase out of remaining 9 NPPs by 2022.

I They compare differences in prices of:


I houses that are located close to a NPP (treatment group) and farther
away from a NPP (control group), before and after Fukushima.

8
Bauer, T.K., S. Braun and M. Kvasnicka (2017), ”Nuclear Power Plant Closures
and Local Housing Values: Evidence from Fukushima and the German Housing
Market”, Journal of Urban Economics, 99, 94-106.
63 / 80
3. Difference-in-Differences
Example: Fukushima and House Prices

            
             
             
             
      
          
          
          
      

[Source: Bauer et al. (2017), p. 96.]


64 / 80
3. Difference-in-Differences
Example: Fukushima and House Prices
 
  

   

 
      
   
       
   
       
   
         
 
         
 
         
 
         
 

     


       
     

               
               
                   
                
              
            
                 
                       
             
             
                
             
              
               
                  
            

[Source: Bauer et al. (2017), p. 99.]


65 / 80
3. Difference-in-Differences
Example: Fukushima and House Prices

I Key assumption:
I Prices of houses in the treatment and control group would have
followed the same time trend in the absence of the Fukushima
accident.

⇒ To test for potential differences in pre-treatment trends:


I they interact dummies for the year -3, -2, and -1 before the
Fukushima accident with the treatment group dummy.

⇒ To test whether the treatment effect changed over time:


I they split the post-Fukushima period in two sub-periods (year +1, +2
after Fukushima).

66 / 80
3. Difference-in-Differences
Example: Fukushima and House Prices
 
         

    

 
        
   
       
  
       
  
 
       
  
       
   
       
   
 
      


         

                
                   
           
              
                  
                 
                
                
                 
                 
               
        

[Source: Bauer et al. (2017), p. 101.]


67 / 80
3. Difference-in-Differences
Example: Fukushima and House Prices

        


             
             

[Source: Bauer et al. (2017), p. 102.]
68 / 80
3. Difference-in-Differences
Example: Fukushima and House Prices
 
        

 

 
       
 
        
 
         
 
        
   
             
               
            
         
             
            
          
              
          
              
    
[Source: Bauer et al. (2017), p. 102.]
69 / 80
4. Synthetic Control Method

"The synthetic control approach [...] is arguably the most


important innovation in the policy evaluation literature in
the last 15 years."
Athey and Imbens (2017)9 , p. 9

I Treatment and control groups sometimes do not follow parallel


trends:
I In this case, using standard DiD methods leads to biased estimates!

I The synthetic control method (SCM) builds on DiD estimation, but


uses systematically more attractive comparisons:
I the SCM constructs and uses a weighted average of the set of
controls (i.e. a ’synthetic control unit’) that best resembles pre-trends
in the outcome of the treated unit/group.
9
Athey, S. and G.W. Imbens (2017), ”The State of Applied Econometrics:
Causality and Policy Evaluation”, Journal of Economic Perspectives, 31(2), 3-32.
70 / 80
4. Synthetic Control Method
Example: Terror and Growth

I Abadie and Gardeazabal (2003)10 pioneered the SCM when


estimating the effects of the terrorist conflict in the Basque country
on growth using other Spanish regions as a comparison group:
I none of the other Spanish regions followed same time trend as
Basque country, so they used a weighted average of other Spanish
regions as a synthetic control group.
I note that Card (1990) implicitly used very similar approach in Mariel
boatlift paper investigating the effect of immigration on employment
of natives.

Basic idea:
I a combination of units (often) provides a better comparison for the
unit exposed to the intervention than any single unit alone.
10
Abadie, A. and J. Gardeazabal (2003), ”The Economic Costs of Conflict: A Case
Study of the Basque Country”, American Economic Review, 93(1), 113-132.
71 / 80
4. Synthetic Control Method
Example: Terror and Growth

I The Basque country is different from the rest of Spain:

[Source: Abadie and Gardeazabal (2003), p. 116.]

72 / 80
4. Synthetic Control Method
Example: Terror and Growth

I The implementation of the SCM method requires a specific choice


for the weights to construct the synthetic control unit.

I They have J available control regions (the 16 Spanish regions other


than the Basque country).

I They want to assign weights W = (w1 , ..., wJ )0 a (J


P × 1) vector to
each control region (they ensure that wj ≥ 0 and wj = 1; this
ensures that there is no extrapolation outside the support of the
growth predictors for the control regions).

I The weights are chosen so that the synthetic Basque country most
closely resembles the actual Basque country before terrorism.

73 / 80
4. Synthetic Control Method
Example: Terror and Growth

I Let:
I X1 be a (K × 1) vector of pre-terrorism K economic growth
predictors in the Basque country (i.e. the values in the previous table:
investment ratio, population density, ...).
I X0 be a (K × J) matrix which contains the values of the same
variables for the J possible control regions.
I V be a diagonal matrix with nonnegative components reflecting the
relative importance of the different growth predictors.

I The vector of weights W ∗ is then chosen to minimize:


(X1 − X0 W )0 V (X1 − X0 W )
I They choose the matrix V such that the real per capita GDP path
for the Basque country during the 1960s (pre terrorism) is best
reproduced by the resulting synthetic Basque country.
74 / 80
4. Synthetic Control Method
Example: Terror and Growth
I The optimal weights they get are:
I Catalonia: 0.8508
I Madrid: 0.1492
I all other regions: 0

I Alternatively, they could have just chosen the weights W ∗ to


reproduce only the pre-terrorism growth path for the Basque country
(and not the growth predictors as well). In that case they would
have minimized:
(Z1 − Z0 W )0 (Z1 − Z0 W )
where:
I Z1 is the (10 × 1) vector of pre-terrorism (1960-1969) GDP values for
the Basque country.
I Z0 is the (10 × J) matrix of pre-terrorism (1960-1969) GDP values for
the J potential control regions. 75 / 80
4. Synthetic Control Method
Example: Terror and Growth

I The synthetic Basque country looks similar to the actual Basque


country:

[Source: Abadie and Gardeazabal (2003), p. 116.]


76 / 80
4. Synthetic Control Method
Example: Terror and Growth

I Constructing the counterfactual using the weights:


I Y1 is a (T × 1) vector whose elements are the values of real per
capital GDP values for T years in the Basque country.
I Y0 is a (T × J) matrix whose elements are the values of real per
capital GDP values for T years in the control regions.
⇒ they constructed the counterfactual GDP (in the absence of
terrorism) as: Y1∗ = Y0 W ∗

77 / 80
4. Synthetic Control Method
Example: Terror and Growth

I Growth in the Basque country with and without terrorism:

[Source: Abadie and Gardeazabal (2003), p. 118.]

78 / 80
4. Synthetic Control Method
Example: Terror and Growth

I Terrorist activity and estimated GDP gap:

[Source: Abadie and Gardeazabal (2003), p. 118.]

79 / 80
Readings

*** [MHE] Angrist, J.D., and J.S. Pischke (2009), Mostly Harmless
Econometrics: An Empiricist’s Companion, Princeton University
Press.
[Chapter 5]

*** [MM] Angrist, J.D., and J.S. Pischke (2014), Mastering ’Metrics:
The Path from Cause to Effect, Princeton University Press.
[Chapter 5]

*** Wooldridge J.M. (2016), Introductory Econometrics, South-Western


Cengage Learning.
[Chapters 13 and 14]

80 / 80

You might also like