0% found this document useful (0 votes)
7 views31 pages

Chap 2 Panel Data

The document discusses panel data models, focusing on fixed effects and difference-in-differences (DiD) methods for estimating causal effects in economics. It explains how fixed effects control for unobserved, time-invariant characteristics and how DiD combines before-and-after comparisons to account for time trends and treatment effects. Additionally, it briefly introduces random effects models and their key assumptions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views31 pages

Chap 2 Panel Data

The document discusses panel data models, focusing on fixed effects and difference-in-differences (DiD) methods for estimating causal effects in economics. It explains how fixed effects control for unobserved, time-invariant characteristics and how DiD combines before-and-after comparisons to account for time trends and treatment effects. Additionally, it briefly introduces random effects models and their key assumptions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Lecture 2: Panel Data Models

Dr. Viet Ha Pham


Faculty of Development Economics
University of Economics and Business, Vietnam National University, Hanoi

Semester II, 2025-2026

1 / 30
Fixed Effects Models

2x2 Difference-in-difference

Bonus: Random Effects Models

2 / 30
Fixed Effects Models

3 / 30
Panel Data

▶ Multiple units for multiple periods


▶ For the same unit, you can observe them in many periods of time
▶ Can you give me an example?
▶ Anyone remember dummy variables? (Someone informs me that it is called "biến
giả"in Vietnamese)

4 / 30
Panel Data and Control for the Unobservable
▶ Consider the univariate OLS model:

yit = β0 + β1 xit + εit ,

▶ There are i = 1, 2, ..., N units and t = 1, 2, ..., T time periods.


▶ OLS estimates would be biased if the error term εit is correlated with the
explanatory variable xit
▶ If we have panel data, we can "zoom"into how units change over time.
▶ We are controlling for anything that are fixed over time by doing this
▶ Mechanically, we can include a dummy variables which is equal 1 for the specific
unit.
▶ The name "fixed"effects also suggests that we are controlling for anything that is
fixed over time.

5 / 30
Fixed Effects Model
▶ More formally, consider a panel data model with a single explanatory variable:

yit = β1 xit + ai + uit ,

▶ Can be easily expanded to multiple explanatory variables


▶ εit is broken into two terms
▶ ai captures unobserved, time-invariant characteristics for individual i
▶ Example:
▶ The econometrician observe labor outcome (wages, output, productivity,...) of a
fixed set of workers over 10 years.
▶ ai captures unobserved characteristics of the individuals which does not change over
time (Can you give me an example?)
▶ ai does not capture characteristics which change over time (Can you give me an
example?)

6 / 30
Fixed Effects Estimation

▶ Averaging over time for each individual i, we obtain:

ȳi = β1 x̄i + ai + ūi ,

where bars denote time averages.


▶ Subtracting the time-averaged equation from the original equation:

yit − ȳi = β1 (xit − x̄i ) + (uit − ūi ).

▶ This is known as the within transformation.


▶ The unobserved effect ai is eliminated.

7 / 30
Fixed Effects Estimation

▶ Under a strict exogeneity assumption, the fixed effects estimator is unbiased.


▶ The FE estimator allows arbitrary correlation between ai and the explanatory
variables.
▶ Any explanatory variable that is constant over time is removed by the fixed effects
transformation.

8 / 30
Fixed Effects Estimation

▶ The composite error is:


εit = ai + uit .
▶ Since ai is constant over time, εit is serially correlated.
▶ Correct standard errors must account for the panel (grouped) structure of the
data.
▶ Manually transforming the data (e.g. demeaning for fixed effects) yields correct
coefficient estimates but not correct standard errors.
▶ Standard errors therefore need to be clustered, allowing error terms to be
correlated within each unit over time.
▶ In practice, we use built-in commands and library (e.g., xtreg, fe in Stata,
fixest in R or pyfixest in Python).

9 / 30
Question

▶ In a fixed effects regression, the fixed effect term plays a role similar to a familiar
component in a standard OLS regression.
▶ What is this component, and in what sense are they similar?

10 / 30
Answer

▶ A fixed effects model can be viewed as an OLS regression in which each unit has
its own intercept, capturing all time-invariant differences across units.
▶ Fun fact: Each ai has no meaning in levels, because all fixed effects are only
identified up to an additive constant.

11 / 30
One more Questions :)

How do we interpret the results of FE models once we have estimated it?

12 / 30
Answer :)

How do we interpret the results of FE models once we have estimated it?


▶ Answer: Within the same unit, how is variation in X related to variation in Y.

13 / 30
2x2 Difference-in-difference

14 / 30
Motivation

▶ One of the most widely used empirical methods in applied economics (and other
fields).
▶ Goal: estimate the causal effect of a treatment when randomized experiments are
not available.

15 / 30
Classic Example (Snow (1856))
▶ Prevailing belief at the time: cholera was caused by bad air (the miasma theory).

Hình: An 1831 color lithograph by Robert Seymour depicts cholera as a deadly cloud.

16 / 30
Classic Example (Snow (1856))

▶ John Snow’s hypothesis: cholera is transmitted through contaminated drinking


water.
▶ Setting: Cholera outbreak in London, 1854. London households received water
from different private water companies. One water company changed its water
intake upstream higher up the Thames in 1849.
▶ Control group: Households supplied by a water company drawing water
downstream from the Thames.
▶ Treatment group: Households supplied by companies drawing water upstream.
▶ Key insight: Households were geographically mixed, but water sources differed.
▶ Outcome: Cholera mortality rates. Compare changes in mortality:
▶ Before vs. after 1849,
▶ Between households with contaminated vs. clean water sources.

17 / 30
Classic Example (Snow (1856))

18 / 30
Motivation and Introduction

▶ Many policy questions involve outcomes observed for the same units over time.
▶ Simple before–after comparisons are misleading due to time trends.
▶ Simple treated-control comparisons are misleading due to permanent differences.
▶ Difference-in-Differences combines both comparisons.
▶ DiD is a panel data method that removes:
▶ time-invariant differences across units
▶ common shocks over time

19 / 30
Difference-in-Differences: The 2×2 Panel Setup

▶ Observe an outcome Yit for two groups over two periods.


▶ Groups:
▶ Treated group (Di = 1),
▶ Control group (Di = 0).
▶ Periods:
▶ Pre-treatment (t = 0),
▶ Post-treatment (t = 1).
▶ Parallel trends assumption: Had no treatment occurred, the gap between
treated and untreated groups would have remained constant.
▶ Policy impact is identified from changes in Yit over time between the two groups.

20 / 30
Difference-in-Differences: The 2×2 Panel Setup

▶ Observe an outcome Yit for two groups over two periods.


▶ For the treated group (Di = 1), the change in outcomes is:

∆YT = ȲT ,post − ȲT ,pre .

▶ For the control group (Di = 0), the change in outcomes is:

∆YC = ȲC ,post − ȲC ,pre .

▶ The Difference-in-Differences estimator is:

d = ∆YT − ∆YC .
DiD

21 / 30
DiD Graphics

22 / 30
DiD estimation

▶ The 2×2 Difference-in-Differences regression can be written as:

Yit = β0 + β1 Treati + β2 Postt + β3 (Treati × Postt ) + εit .

▶ Treati : indicator for units that are ever treated.


▶ Postt : indicator for post-treatment periods.
▶ β3 : the DiD estimator, identified from within-unit changes over time.
▶ DiD is a building block for many modern panel data methods.

23 / 30
Understanding 2x2 DiD

Treatment = 0 Treatment = 1 Difference


Post = 0 β0 β0 + β 1 β1
Post = 1 β0 + β2 β0 + β 1 + β 2 + β 3 β1 + β3
Difference β2 β2 + β 3 β3

24 / 30
Questions, comments, and discussions are welcome.
Email: viethap@[Link]

25 / 30
Bonus: Random Effects Models

26 / 30
Random Effects Models

We start from the same panel data model as in Fixed Effects:

yit = βxit + ai + uit ,

where:
▶ ai is an individual-specific effect
▶ uit is an idiosyncratic error term

27 / 30
Key Assumption (Random Effects)

The crucial assumption in the Random Effects (RE) model is:

E[ai | xi1 , xi2 , . . . , xiT ] = 0.

▶ The individual effect ai is uncorrelated with the regressors


▶ This is stronger than the Fixed Effects assumption
▶ We can estimate β using one crosssection of the data, or used pooled OLS
▶ But we could do better

28 / 30
Serial Correlation in Random Effects

Consider the panel model:


yit = βxit + ai + uit .
▶ The composite error is:
vit = ai + uit .
▶ Because ai is common across time for unit i, the composite errors are serially
correlated:
σ2
Corr(vit , vis ) = 2 a 2 , t ̸= s.
σa + σ u
▶ Pooled OLS ignores this correlation:
▶ Coefficients are unbiased
▶ Standard errors and test statistics are incorrect

29 / 30
Quasi-Demeaning Transformation

The GLS-transformed equation is:

yit − λȳi = β0 (1 − λ) + β(xit − λx̄i ) + (vit − λv̄i ),

where bars denote time averages.


▶ This is called quasi-demeaning
▶ Compare:
▶ Fixed Effects: subtract full time average
▶ Random Effects: subtract a fraction of the time average
▶ The fraction λ depends on:
σa2 , σu2 , and T .
▶ GLS = pooled OLS on the transformed equation

30 / 30
Snow, J. (1856). Cholera and the water supply in the south districts of london in 1854.
Journal of Public Health, and Sanitary Review, 2(7):239.

30 / 30

You might also like