0% found this document useful (0 votes)
3 views38 pages

06 Panel Data Approaches

The document discusses panel data approaches for causal analysis, emphasizing the advantages of repeated observations over time to control for unobserved heterogeneity. It outlines key terminology related to panel data, including fixed and random effects models, and various estimation methods such as within estimator and first-difference specifications. The document highlights the importance of understanding assumptions related to exogeneity and error components in panel data analysis.

Uploaded by

chhavi jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views38 pages

06 Panel Data Approaches

The document discusses panel data approaches for causal analysis, emphasizing the advantages of repeated observations over time to control for unobserved heterogeneity. It outlines key terminology related to panel data, including fixed and random effects models, and various estimation methods such as within estimator and first-difference specifications. The document highlights the importance of understanding assumptions related to exogeneity and error components in panel data analysis.

Uploaded by

chhavi jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

EC 402 : Lec-06

Panel Data Approaches

Sugata Bag

Delhi School of Economics

March 18, 2024

1 / 38
Panel Data Approach To Causal Analysis – Motivation
- Reference: Mixtape. Ch-7 (Panel Data); Wooldridge (2010) – ch. 10
- For better comprehension, please try working with the data exercises in the chapter

- Selection on observables is rarely convincing in cross-sectional data


- Treatments are not randomly assigned
- Self-selection is complex; too many unknown confounders

- Does observing a single unit of analysis at multiple time points help?

- To allow for selection on (some) unobservables, we can leverage repeated


observations for the same unit over time using panel data
- IQ: How do outcomes change when treatment changes?
- Assuming POs to be stable over time is heroic.
- But, if POs evolve predicable way, then panel may help in constructing counterfactuals
- This doesn’t resolve the fundamental problem of causal inference but helps controls
for confounders that are time-invariant
2 / 38
Panel – Terminology
- A panel combines cross-section along time periods.
- Let i denote the cross-section unit (firm, individual etc.) where i = 1, ..n and t denote the
time period, where t = 1, ..., T . The total sample size is therefore ∑i n · T .

- Longitudinal data another term for panel data

- Aside: Repeated cross section, also known as Pooled-Cross-section or Pseudo panels,


is not a panel, since we don’t observe the same individuals in multiple time periods.
- The idea is to use repeated cross-section data to construct a panel.
- The text-book example of this is to use cohorts (a la Deaton)
- E.g. take all 20 − 25 year olds in data set from T = 0, T = 1, .. etc., and average over
outcomes for them.
- Treat these averages for each cohort as if they were a panel.
- This type of cohort analysis is commonly used to examine questions of how outcomes
change for a cohort and what factors seem to be driving these changes. E.g. NFHS data.

3 / 38
Panel – Terminology ... (2)
- Fixed panels refer to the same set of individuals/units observed throughout the
period under consideration. E.g. IHDS-I & II (85%)
In contrast, there exists rotating panels: e.g. Cost of Cultivation Studies, and PLFS.

- Balanced panel if each of n individuals is observed all T times


- Unbalanced panel when some individuals are not observed in every period.
- Sometimes unbalanced panels result from sampling designs, and sometimes they are a
result of entry/exit or birth/death

- Sparse Panel – there is very little overlap between (i, j ). Think about matched
firm-worker datasets (nobody works at every firm!)

- Wide Panel has many individuals (n >> T ); for Long Panel (T >> n ).
Note that the asymptotic properties of an estimator can be different when n → ∞ as
opposed to a situation when T → ∞

4 / 38
Panel – Terminology ...Some Comments
- Rotating panels allow analysis of both short-term and long-term effects.
- E.g. a sample, which was randomly selected in year 1 to be representative of the
underlying population, may no longer be representative of the population say 10 years
later (because of in/out migration; changing cropping patterns etc.).
- Estimation methods include fixed effects, random effects, and pooled-OLS.
- I will not deal wit p-OLS, focus on other estimators; please see Cunningham for P-OLS

- When panels are unbalanced, need to think about why they are so, and its implications
for estimation (e.g. attrition).
- Even for balanced panels, the usual case is that n >> T so there is greater emphasis on
the cross-sectional variation, relative to variation over time.
- The error mechanisms that are relevant in longer-term dynamic models (with a large T)
are not as relevant here. E.g. country-time panels meet the n > T criterion, but are not
necessarily short, though they are wide.
- This matters for inference; but we are not covering these aspects

5 / 38
Basic Panel Set up
Often interested in a regression of the form:

Yit = β′i Xit + αi + ϵit ; i = 1, . . . , n; t = 1, . . . , T

- β i is of interest.
- Constant effect: when β i = β for all i. → (our maintained assumption throughout)
- αi is individual-specific additive “unobserved heterogeneity” but time-invariant.
- Denote: X i = (Xi1 , . . . , XiT ), Y i = (Yi1 , . . . , YiT ), ϵit = (ϵi1 , . . . , ϵiT ).
- X i may contain a treatment/policy var, we maybe particularly interested in coeff that
- Every (i, t ) is observed, resulting in a balanced panel (maintained assumpn);
In case of missing data: unbalanced panel.
- Precludes Xit ≡ Yi,t −1 as one of the RHS variables — see ”dynamic panel” models.

6 / 38
Basic Panel Set up ... (2)
Alternatively, include some Z :

Yit = β′i Xit + γ′ Zi + αi + ϵit ; i = 1, . . . , n; t = 1, . . . , T

where Z :: a constant term + a set of individual or group specific terms, such as sex, race,
location, industry etc. characteristics (maybe unobserved) , but are constant over time.
- Full homogeneity (Pooled): β i = β & αi = α for all i.
- Individual Effects: βi = β but αi varies for all i.
- Full heterogeneity: if (β i , αi ) are all different (potentially).

- With observations over time periods for the same individual the assumption that ϵit is
I.I.D. is unrealistic → will need to adjust standard errors.
- Why? Autocorrelation –This year’s outcome is likely related to last year’s outcome...
- Despite that, with repeated observations of an individual we can control for a great
deal of unobserved heterogeneity or omitted variables.
7 / 38
Assumptions
- ✓ Strict Exogeneity: Current disturbances are uncorrelated with the independent
variables in every time period & unit-specific heterogeneity

E [ϵit |xi1 , xi2 , ...xiT , αi ] = E [ϵit |Xi , αi ] = 0

This implies that there exists no time-varying confounders.


- Weak Exogeneity: A less restrictive assumption

E [Yit |xi1 , xi2 , ...xiT , αi ] = E [Yit |xit , αi ] = xit′ β + αi

This implies Contemporaneous exogeneity.


- Aside: Does not work for dynamic models like the one below –

Yit = wit′ β + λYi ,t −1 + ci + ϵit ; where,Xit = (wit , yi ,t −1 )

- ✓ Error var-cov: E (ϵit ϵjs |xi1 , ..., xiT ) = σϵ2 if i = j and t = s, and zero otherwise.

8 / 38
1. One-way Error Components
The model is given by:

Yit = Xit′ β + |{z}


α + uit , uit : composite error
|{z}
uit =ci +ϵit

- where one or more of the β are the parameters of interest. The sample size is nT .
We also assume that the X include time-invariant variables (e.g. gender, caste etc.).
- The one way error components implies uit = ci + ϵit where ci varies by unit, but
is fixed across time (e.g. innate ability; managerial efficiency etc.).
- The choice of estimation method depends on whether these unobserved
time-invariant factors (i.e. ci or αi ) are correlated with the X or not.
(When correlated, then use fixed effects specification.)
- Let αi = α + ci , we have:

Yit = Xit′ β + αi + ϵit , where ϵit = idiosyncratic error

9 / 38
Key Concept – Random Effect Vs. Fixed Effect
- The main assumption is that there are time-invariant unobserved effects; these
unobserved effects only vary with i.

- Fixed effects Model — these αi can be thought of an unit-specific error term that must
be estimated, that captures any unit-specific omitted variables (time-invariant).
- If selection on αi is allowed, E[αi |Xi ] ̸= 0 (i.e. ̸= const ) , ⇒ αi has Fixed effect

- Random Effects Model — when the ci are uncorrelated with the x, thus so are αi .
These αi come from a distribution whose parameters need to be estimated. Random
effects influence estimation through the covariance structure of the error term.
- If E[αi |Xi ] = const (i.e. no selection on unobservables) ⇒ αi is a random effect

- These labels are not about whether αi is stochastic (think of random sample of
(αi , Xi , ϵi , Yi )ni=1 )

- Given the context of impact assessment and causal inference more generally, it is the
correlated case (i.e. FE) that appears most often
10 / 38
Random Vs. Fixed Effects ... (2)

- FE Model: To estimate β, remove αi in different ways:


1. Within Estimator /“FE regression”:
Dummy variable regression = within transformation variables for each cross-sectional
unit; i.e. demean and estimate using least squares
2. Least Squares Dummy Variables (LSDV) i.e. use dummy
3. First differences
4. Long differences
For T=2, the first three methods, when applied, are equivalent

- Random effect model allows for more methods


- “Pooled OLS” regression of Yit on Xit across i, t identifies β
- “Random effects” Generalized Least Squares estimator

11 / 38
1.1. Within Estimator FE – by Dummy Variables Regression
- View {αi } as a set of nuisance parameters multiplying dummies for each unit:
I
Yit = β′ Xit + ∑ αj Wji + ϵit , Wji ≡ 1[i = j ], (indicator ) (∗)
j =1

Note: don’t write e.g. Yit = β′ Xit + ∑j αj + ϵit ,

- By FWL, OLS estimation of (*) is identical to OLS after within transformation:


T
1
Ẏit = β′ Ẋit + ϵ̇it , where, V̇it = Vit − V̄i = Vit −
T ∑ Vis (∗∗)
s =1

Exercise: prove this

- By strict exogeneity, E [Ẋit ϵ̇it ] = E [(Xit − 1


T ∑Ts=1 Xis )(ϵit − 1
T ∑Ts=1 ϵis )] = 0

- This is faster than using Ind. dummies! Use “reghdfe” in STATA, “fixest” in R.
12 / 38
1.1. Within Estimator FE – Simpler Retake
- The idea is to take average across T for each i to obtain ȳi , x̄i etc. Then subtract the
within-unit average from each observation. [See, MixTape how to do it.] That is:

yeit = yit − ȳi


xeit = xit − x̄i
αei = αi − αi = 0

- The transformed regression is:

y˜it = x˜it ′ β + ϵ˜it

- The Within estimator of the fixed effects models involves running a pooled-OLS
regression on these transformed regressors.

- Note that there is no intercept in the X . And that the time-invariant αi are gone.
13 / 38
1.2. LSDV
- The model is given by
Yit = Xit′ β + αi + ϵit

- The dummy variable formulation has a set of n dummy variables (suppressing the
overall intercept), making the αi an additional set of parameters to be estimated.

- The within (or FE) estimator and LSDV estimators of β are numerically equivalent,
irrespective of T .

- The structure of variance covariance matrix of errors is important.


OLS can be used with LSDV if E [ϵϵ′ |X ] = σ2 I.
- But if errors are (auto-)correlated and/or there are non-homoskedastic errors then one
needs to use either GLS or FGLS.

- Since by construction we have T > 1, First Differences can be another way to account
for unobserved heterogeneity and account for unit-specific unobserved variables.
(Next slide)
14 / 38
1.3 First-Difference (FD) Specification
- The model is given by Yit = Xit′ β + αi + ϵit ,
- The first difference (over time) is given by

Yi,t − Yi,t −1 = (Xi,t − Xi,t −1 )′ β + (αi − αi ) + (ϵi,t − ϵi,t −1 ),



Or, ∆Yi,t = ∆Xi,t β + ∆ϵi,t ;
where ∆ refers to the time first difference.
Once again, the αi are differenced out with E [∆Xit · ∆ε it ] = 0, by strict exogeneity.
- When T = 2 and we have a balanced model (linear) then the above
three estimation methods are equivalent.
- When T > 2, OLS on the first-differences will not necessarily yield numerically the
same as the FE/Within and LSDV estimators (depends on structure of variance
covariance matrix).

15 / 38
1.4 Long-Difference (LD) Specification

- Long-Difference Specification – T >> 2, for h > 1,

Yit − Yi,t −h = β′ (Xit − Xi,t −h ) + (ε it − ε i,t −h ), t = h + 1, . . . , T

with E [(Xit − Xi,t −h ) · (ε it − ε i,t −h )] = 0.

- Exercise: With T = 2, OLS of estimators FE and FD equations are identical, even in


unbalanced panels.

- Note: irrespective of estimator, the endogeneity arising out of the correlation between ci and ϵit is taken
care of, and the estimated β can be interpreted as being causal in nature. However, a potential
disadvantage is that taking care of time-invariant unobserved heterogeneity also means that coefficients
associated with any time-invariant observed variables (e.g. gender and caste etc.) cannot be estimated.

16 / 38
Asymptotic Sequences

- Conventional asymptotic: short panel


- A growing sample of I units for a fixed number of periods, T

- Alternative asymptotic: long panel


- The sample grows by increasing both I and T (at the same or different rates)

- More appropriate when I ≈ T

17 / 38
Cluster-robust Inference (1)

- Heteroskedasticity-robust SEs rely on


" # " #
h i
Var ∑ Xit ϵit =E ∑ Xit Xit′ ϵit2 ; as E Xit Xjs′ ϵit ϵjs = 0 for it ̸= js
it it

- Fails in panel data because of serial correlation


- Serial correlation in ϵit is prevalent
- In short panels, ϵ̇it (note the dot) are serially correlated, even if ϵit are not
- ∆ϵit is correlated with ∆ϵi,t −1 , unless ϵit follows a random walk

- Solution: cluster-robust (“clustered”) SEs

18 / 38
Cluster-robust Inference (2)

- For Pooled OLS of Yit on Xit without FEs:


! −1 !
β̂ = ∑ ∑ Xit Xit′ ∑ ∑ Xit Yit
i t i t
! −1 !
√ 1 1
I · ( β̂ − β) =
I ∑(∑ Xit Xit′ ) · √
I
∑(∑ Xit ϵit )
i t i t
| {z } | {z }
p −1 D

→E[ ] −
→N (0,Var [∑t Xit ϵit ])
∑t Xit Xit′
  −1 " #" #′ !   −1
1 ′ 1 1 ′
I − dim(X ) ∑ ∑ Xit ϵ̂it ∑ Xit ϵ̂it · I X X
I · Var
d ( β̂) = X X ·
I i t t

- Note: I = n =cross-section dimension

19 / 38
Cluster-robust inference (3)
- Alternative derivation for the sandwich ”meat”:
!
Var ∑(∑ Xit′ ϵit ) = ∑ ∑ E Xit Xis′ ϵit ϵis
 
i it it js
T
=∑ ∑ E Xit Xis′ ϵit ϵis
 
i t,s =1

because all terms with i ̸= j are zero.

- With FEs, same with (Xit , ϵit ) replaced by (Ẋit , ϵ̇it ).

- In FDs, replaced by (∆Xit , ∆ϵit ).

- Warning: the approximation requires I to be large. Not enough to have large IT .

20 / 38
Tips: Choosing between estimators
Efficiency:
- FE estimator is efficient when ϵit are serially uncorrelated.

- FD estimator is efficient when ϵit follow a random walk.

- It’s not about persistence in the data (which could come from αi ) but about differential
persistence of observations close in time.

Robustness to model misspecification:


- E.g. violations of strict exogeneity, measurement error.

Fads: FE estimation is more popular; it appears that you’ve controlled for more.
- False, and FD are more transparent (especially in more complex situations).

- If you use a FE specification, always rewrite it in FD.

21 / 38
2. Two-way fixed effects (TWFE)
- The LSDV approach can be extended to include a set of time-specific effects as well.
Besides αi , we include (additive) period effects γt to capture shocks that affect all
units:
Yit = β′ Xit + αi + γt + ϵit (#)
Unlike αi , period-FEs are non-stochastic parameters (on period dummies).

Thus, Eqn.(#) requires an innocuous normalization on {αi } or {γt }.


- Note that suppressing the overall intercept will not suffice:

- Either one must only include (T − 1) time dummies;

- Or alternatively, include a full set of n and T dummy variables but impose the
restriction that ∑i αi = ∑t γt = 0.

- With only balanced data, least squares estimation can be undertaken

22 / 38
2. Two-way fixed effects (TWFE) ...(2)

- With only balanced panel data, least squares estimation can be undertaken

- FE estimator: In balanced panels, OLS from double-differenced specification:

Ÿit = β′ Ẍit + ϵ̈it , where, V̈it = (Vit − V̄i · ) − (V̄·t − V̄¯ )

Follows from FWL, because unit and period dummies are exactly orthogonal.
- Regress Ÿit on Ẍit
- These formulations are often seen in DiD specifications, but are not confined to them.
- See MixTape, for calculations of V̈it , V̄i · , V̄·t , V̄¯

- FD estimator OLS from ∆Yit = β′ ∆Xit + ∆γt + ∆ϵit with period FEs ∆γt .

23 / 38
3. Random Effect Specification
The formulation here is αi = α + ui so that

yit = xit′ β + α + ui + ε it

Note that the ui are random but time invariant. As with pooled OLS, a random effects
analysis puts ui into the error term.
We need the following set of assumptions to estimate the model.
Assumptions:
1. E [ε it |X ] = E [ui |X ] = 0
2. E [ε2it |X ] = σε2 ; E [ui2 |X ] = σu2
3. E [ε it uj |X ] = 0 ∀ i, j, t
4. E [ε it ε js |X ] = 0 ∀t ̸= s, i ̸= j
5. E [ui uj |X ] = 0 ∀i ̸ = j
24 / 38
3. Random Effect Specification...II
Say, νit = uit + ϵit . This version is often called the error component model, but note that
these imply (suppressing the conditionality on X ) for T observations on unit i:
1. E [vit2 ] = E [ui2 + ε2it + 2ui ε it ] = σu2 + σε2
2. E [vit vis ] = E [(ui + ε it )(ui + ε is ] = σu2 ∀t ̸ = s
3. E [vit vjs ] = 0 ∀t, s if i ̸= j
This implies that error structures are non-homoscedastic:
 2
σu + σε2 σu2 σu2

...
 σ2 σu2 + σε2 . . . σu2 
u
Σ= .. .. .. ..  = σε2 I T + σu2 i T i ′ T
 
 . . . . 
σu2 σu2 . . . σu2 + σε2
Because the cross-sectional units i and j are independent
⇒ Ω = In ⊗ Σ ⇒ β̂ GLS = (X ′ Ω−1 X )−1 X ′ Ω−1 y = β̂ RE
25 / 38
3. Random Effect Specification...III
Given that σε and σu are known, define
r
σε
θ = 1−
σε2 + T σu2

and transform data so that  


yi1 − θ ȳi
1 yi2 − θ ȳi 

Σ1/2 · yi = ..
.
 
σε  
yit − θ ȳi
and similarly for the X’s. And running OLS on this transformed model = GLS
Note 1: when θ = 1 we are back to the fixed effects version
Note 2: Also that if Ω is unknown, we need to use FGLS.

26 / 38
4.1. Hausman’s test for FE vs RE
- Under the null hypothesis that the covariates Xit and the αi are uncorrelated, both the
FE and RE are consistent.
But FE may be inefficient relative to RE (reasons not yet covered).

- Then under H0 , we have ( β̂ FE − β̂ RE )′ (VFE − VRE )−1 ( β̂ FE − β̂ RE ) ∼ χ2K .


A rejection of H0 would favour the FE,
while failure to reject the H0 will lead to use of the RE.

- Stata Command:
xtreg dependent var independent vars, fe (Fixed Eff)
xtreg dependent var independent vars, re (Random Eff)
hausman (Perform Hausman Test)

27 / 38
Aside Attrition and unbalanced panels (See, Wooldridge 2010)
Panel data are often used, either with or without a RCT (especially longer panels). But
panels (even within an RCT) are often unbalanced. This ”lack of balance” maybe due to
attrition, which can be characterized as a type of selection–one that occurs over time.
- There may attrition from unit non-response
- Attrition that is an absorbing state. Once the unit exits the panel it does not reappear

- Wave non-response: Some units may come in and go out of a particular year or wave of
the panel (for panels with time periods greater than 2).

- Studies look at absorbing state and wave non-response separately

- Additionally, there may also be item non-response. Generally, multiple imputation


techniques are used to fill the missing info

- Will estimates still be unbiased with unit (both absorbing state and wave)
non-response? Not necessarily!
28 / 38
Aside-1. Characterizing missingness
Need to characterize the pattern of missingness in the data.
(Whether there is selection (on un/observables) or not)

- MCAR: Missing completely random. [:≡ No selection]


The missing data are a random subsample of underlying population. This has no
implications for bias, only efficiency affected.

- MAR: Missing at random. [:≡ selection on observables]


Conditional on observables, the non-response/response is random. Differences in
response are explainable by observed characteristics; should be taken into account.

- NMAR: Not missing at random. Systematic differences remain after conditioning on


observables. This is non-ignorable missingness; should be taken into account.

Example of bias with unit attrition (need pics)

29 / 38
Stylized figure – Attrition bias due to unit non-response

In a first difference framework. Unit attrition leads to patterned bias due to non-MACR
situation, despite fixed effects 30 / 38
Aside-2. Testing for non-random selection in unbalanced panels
More formally, if the model takes the following form:

yit = zi′ α + xit′ β + ui + ϵit

Under unit non-response, let rit = 1 if (yit , xit , zi ) is observed in period t and zero
otherwise. Testing if missingness is problematic involves:
- Running a logit or probit of ri on baseline characteristics

- Including functions of ri in the outcome equation and seeing if its inclusion matters.
E.g. the Nijman & Verbeek test involves including 3 additional dummy variables –
(a) if unit is present in wave t + 1,
(b) if unit is present in the sample for all waves and
(c) the number of waves for which unit is present.
Not all of the functions of r can be used in fixed effects as they’ll drop out.

31 / 38
Aside-3. Accounting for attrition
Aside 3.1 Panel data, attrition, FE and RE
Let ri ≡ [ri1 , ..., riT ].

- If E [ui + ϵit |zi , xit , ri ] = E [ϵit + ui |zi , xit ] = 0,


this is a case of MAR (selection on observables) and random effects will be consistent

- If E [ϵit |zi , xit , ui , ri ] = E [ϵit |zi , xit , ui ] = 0,


this is also a case of MAR (selection on observables) and fixed effects will be
consistent.
Notice this is weaker than above, because it allows ri to depend on ui .
The idea is that as long as selection into the panel through time-invariant factors, they
get differenced out.
Lee bounds for attrition to deal with random assignment. Will do it later

32 / 38
Aside-3. Accounting for attrition
Aside 3.2 Inverse probability weights

A common approach when selection is based on observables, is to use inverse probability


weights. First model selection as usual.
Suppse ω are variables that are strong predictors of attrition. If information in the
first period can be observed for all observations, then estimate (notice subscript on ω)

P (rit = 1|yit , xit , ωi1 ) = P (rit = 1|ωi1 ), t = 2, ..., T

This is nothing but a selection model. This can be used to obtain predicted probability p̂it
and then estimate in a second step using p̂1it as sampling weights in outcome equation
estimated in first differences:
∆yit = ∆xit′ β + ∆ϵit
In time period 3, then the same process would be followed, with sampling weights 1 1
p̂i2 p̂i3
and so on until time period T .

33 / 38
5. Modeling Effect Dynamics

- Distributed lags model can be accommodated with no change:

Yit = β′0 Xit + β′1 Xi,t −1 + ...β′L Xi,t −L + αi + ϵit

- Lagged dependent variable on the RHS: dynamic panel model

Yit = ρYi,t −1 + β′ Xit + αi + ϵit

- Violates strict exogeneity: ϵit can’t be mean-independent of Yi ,s −1 for s = t + 1.

- Then FE estimation is not consistent in short panels: “Nickell bias”.

- Can use Arellano-Bond GMM estimator.

34 / 38
7. Nonlinear panel data models

- Nonlinear models with fixed effects are much more complicated.


Consider binary choice models:

Pr (Yit = 1|Xit , αi ) = F ( β′ Xit + αi ), where, F = probit or logistic

- Likelihood estimation of β along with {αi } results in the incidental parameter


problem: β is inconsistent in short panels.

- For logistic regression (but not probit), conditional logit estimation yields consistent β̂.
- But not for average partial effects which depend on αi .

- More progress with long panels; see Fernˊandez-Val and Weidner (Annual Review of
Economics, 2018).

35 / 38
FEs beyond panel data

There are other types of data with repeated observations:

Twin studies = repeated observations in the same family.


- E.g. Ashenfelter and Rouse (1998) estimate returns to schooling for twins.

- Family FEs control for genetic differences.

- Warning: why does the treatment (e.g. education) vary between twins?

- Bound and Solon (1999): first-borns have higher weight, IQ, schooling.

36 / 38
FEs beyond panel data (2)

There are other types of data with repeated observations:

- County-level cross-section = repeated observations for the same state.


- State FEs control for additive state-level unobservables.

- In a county-level panel, can include state-by-year FEs:

Yit = β′ Xit + αi + γs (i ),t + ϵit

- Note: ∑s′ ,t ′ ·γs′ t ′ · 1[s (i ) = s ′ ] × 1[t = t ′ ] is correct; γs (i ) × δt is wrong.

- Including γs (i ),t demeans Yit and Xit by state-year-specific means.

37 / 38
FEs beyond panel data (3)

There are other types of data with repeated observations:

Repeated cross-sections / Psedo Panel:


in each year a new random sample from each state.
- Can’t control for individual heterogeneity.

- But state FEs control for additive state-level unobservables.

- Cluster at the state-level if Xit only varies by state.

- Dyadic data: e.g. how does distance Xij between exporting country i and importing
country j affect log trade flow Yij ? (Gravity equation)

38 / 38

You might also like