0% found this document useful (0 votes)

3 views38 pages

06 Panel Data Approaches

The document discusses panel data approaches for causal analysis, emphasizing the advantages of repeated observations over time to control for unobserved heterogeneity. It outlines key terminology related to panel data, including fixed and random effects models, and various estimation methods such as within estimator and first-difference specifications. The document highlights the importance of understanding assumptions related to exogeneity and error components in panel data analysis.

Uploaded by

chhavi jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views38 pages

06 Panel Data Approaches

Uploaded by

chhavi jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

EC 402 : Lec-06

Panel Data Approaches

Sugata Bag

Delhi School of Economics

March 18, 2024

1 / 38
Panel Data Approach To Causal Analysis – Motivation
- Reference: Mixtape. Ch-7 (Panel Data); Wooldridge (2010) – ch. 10
- For better comprehension, please try working with the data exercises in the chapter

- Selection on observables is rarely convincing in cross-sectional data

- Treatments are not randomly assigned
- Self-selection is complex; too many unknown confounders

- Does observing a single unit of analysis at multiple time points help?

- To allow for selection on (some) unobservables, we can leverage repeated

observations for the same unit over time using panel data
- IQ: How do outcomes change when treatment changes?
- Assuming POs to be stable over time is heroic.
- But, if POs evolve predicable way, then panel may help in constructing counterfactuals
- This doesn’t resolve the fundamental problem of causal inference but helps controls
for confounders that are time-invariant
2 / 38
Panel – Terminology
- A panel combines cross-section along time periods.
- Let i denote the cross-section unit (ﬁrm, individual etc.) where i = 1, ..n and t denote the
time period, where t = 1, ..., T . The total sample size is therefore ∑i n · T .

- Longitudinal data another term for panel data

- Aside: Repeated cross section, also known as Pooled-Cross-section or Pseudo panels,

is not a panel, since we don’t observe the same individuals in multiple time periods.
- The idea is to use repeated cross-section data to construct a panel.
- The text-book example of this is to use cohorts (a la Deaton)
- E.g. take all 20 − 25 year olds in data set from T = 0, T = 1, .. etc., and average over
outcomes for them.
- Treat these averages for each cohort as if they were a panel.
- This type of cohort analysis is commonly used to examine questions of how outcomes
change for a cohort and what factors seem to be driving these changes. E.g. NFHS data.

3 / 38
Panel – Terminology ... (2)
- Fixed panels refer to the same set of individuals/units observed throughout the
period under consideration. E.g. IHDS-I & II (85%)
In contrast, there exists rotating panels: e.g. Cost of Cultivation Studies, and PLFS.

- Balanced panel if each of n individuals is observed all T times

- Unbalanced panel when some individuals are not observed in every period.
- Sometimes unbalanced panels result from sampling designs, and sometimes they are a
result of entry/exit or birth/death

- Sparse Panel – there is very little overlap between (i, j ). Think about matched
ﬁrm-worker datasets (nobody works at every ﬁrm!)

- Wide Panel has many individuals (n >> T ); for Long Panel (T >> n ).
Note that the asymptotic properties of an estimator can be diﬀerent when n → ∞ as
opposed to a situation when T → ∞

4 / 38
Panel – Terminology ...Some Comments
- Rotating panels allow analysis of both short-term and long-term effects.
- E.g. a sample, which was randomly selected in year 1 to be representative of the
underlying population, may no longer be representative of the population say 10 years
later (because of in/out migration; changing cropping patterns etc.).
- Estimation methods include fixed effects, random effects, and pooled-OLS.
- I will not deal wit p-OLS, focus on other estimators; please see Cunningham for P-OLS

- When panels are unbalanced, need to think about why they are so, and its implications
for estimation (e.g. attrition).
- Even for balanced panels, the usual case is that n >> T so there is greater emphasis on
the cross-sectional variation, relative to variation over time.
- The error mechanisms that are relevant in longer-term dynamic models (with a large T)
are not as relevant here. E.g. country-time panels meet the n > T criterion, but are not
necessarily short, though they are wide.
- This matters for inference; but we are not covering these aspects

5 / 38
Basic Panel Set up
Often interested in a regression of the form:

Yit = β′i Xit + αi + ϵit ; i = 1, . . . , n; t = 1, . . . , T

- β i is of interest.
- Constant effect: when β i = β for all i. → (our maintained assumption throughout)
- αi is individual-specific additive “unobserved heterogeneity” but time-invariant.
- Denote: X i = (Xi1 , . . . , XiT ), Y i = (Yi1 , . . . , YiT ), ϵit = (ϵi1 , . . . , ϵiT ).
- X i may contain a treatment/policy var, we maybe particularly interested in coeff that
- Every (i, t ) is observed, resulting in a balanced panel (maintained assumpn);
In case of missing data: unbalanced panel.
- Precludes Xit ≡ Yi,t −1 as one of the RHS variables — see ”dynamic panel” models.

6 / 38
Basic Panel Set up ... (2)
Alternatively, include some Z :

Yit = β′i Xit + γ′ Zi + αi + ϵit ; i = 1, . . . , n; t = 1, . . . , T

where Z :: a constant term + a set of individual or group specific terms, such as sex, race,
location, industry etc. characteristics (maybe unobserved) , but are constant over time.
- Full homogeneity (Pooled): β i = β & αi = α for all i.
- Individual Effects: βi = β but αi varies for all i.
- Full heterogeneity: if (β i , αi ) are all different (potentially).

- With observations over time periods for the same individual the assumption that ϵit is
I.I.D. is unrealistic → will need to adjust standard errors.
- Why? Autocorrelation –This year’s outcome is likely related to last year’s outcome...
- Despite that, with repeated observations of an individual we can control for a great
deal of unobserved heterogeneity or omitted variables.
7 / 38
Assumptions
- ✓ Strict Exogeneity: Current disturbances are uncorrelated with the independent
variables in every time period & unit-speciﬁc heterogeneity

E [ϵit |xi1 , xi2 , ...xiT , αi ] = E [ϵit |Xi , αi ] = 0

This implies that there exists no time-varying confounders.

- Weak Exogeneity: A less restrictive assumption

E [Yit |xi1 , xi2 , ...xiT , αi ] = E [Yit |xit , αi ] = xit′ β + αi

This implies Contemporaneous exogeneity.

- Aside: Does not work for dynamic models like the one below –

Yit = wit′ β + λYi ,t −1 + ci + ϵit ; where,Xit = (wit , yi ,t −1 )

- ✓ Error var-cov: E (ϵit ϵjs |xi1 , ..., xiT ) = σϵ2 if i = j and t = s, and zero otherwise.

8 / 38
1. One-way Error Components
The model is given by:

Yit = Xit′ β + |{z}

α + uit , uit : composite error
|{z}
uit =ci +ϵit

- where one or more of the β are the parameters of interest. The sample size is nT .
We also assume that the X include time-invariant variables (e.g. gender, caste etc.).
- The one way error components implies uit = ci + ϵit where ci varies by unit, but
is fixed across time (e.g. innate ability; managerial efficiency etc.).
- The choice of estimation method depends on whether these unobserved
time-invariant factors (i.e. ci or αi ) are correlated with the X or not.
(When correlated, then use fixed effects specification.)
- Let αi = α + ci , we have:

Yit = Xit′ β + αi + ϵit , where ϵit = idiosyncratic error

9 / 38
Key Concept – Random Effect Vs. Fixed Effect
- The main assumption is that there are time-invariant unobserved effects; these
unobserved effects only vary with i.

- Fixed effects Model — these αi can be thought of an unit-specific error term that must
be estimated, that captures any unit-specific omitted variables (time-invariant).
- If selection on αi is allowed, E[αi |Xi ] ̸= 0 (i.e. ̸= const ) , ⇒ αi has Fixed effect

- Random Effects Model — when the ci are uncorrelated with the x, thus so are αi .
These αi come from a distribution whose parameters need to be estimated. Random
effects influence estimation through the covariance structure of the error term.
- If E[αi |Xi ] = const (i.e. no selection on unobservables) ⇒ αi is a random effect

- These labels are not about whether αi is stochastic (think of random sample of
(αi , Xi , ϵi , Yi )ni=1 )

- Given the context of impact assessment and causal inference more generally, it is the
correlated case (i.e. FE) that appears most often
10 / 38
Random Vs. Fixed Eﬀects ... (2)

- FE Model: To estimate β, remove αi in diﬀerent ways:

1. Within Estimator /“FE regression”:
Dummy variable regression = within transformation variables for each cross-sectional
unit; i.e. demean and estimate using least squares
2. Least Squares Dummy Variables (LSDV) i.e. use dummy
3. First differences
4. Long differences
For T=2, the first three methods, when applied, are equivalent

- Random eﬀect model allows for more methods

- “Pooled OLS” regression of Yit on Xit across i, t identiﬁes β
- “Random eﬀects” Generalized Least Squares estimator

11 / 38
1.1. Within Estimator FE – by Dummy Variables Regression
- View {αi } as a set of nuisance parameters multiplying dummies for each unit:
I
Yit = β′ Xit + ∑ αj Wji + ϵit , Wji ≡ 1[i = j ], (indicator ) (∗)
j =1

Note: don’t write e.g. Yit = β′ Xit + ∑j αj + ϵit ,

- By FWL, OLS estimation of (*) is identical to OLS after within transformation:

T
1
Ẏit = β′ Ẋit + ϵ̇it , where, V̇it = Vit − V̄i = Vit −
T ∑ Vis (∗∗)
s =1

Exercise: prove this

- By strict exogeneity, E [Ẋit ϵ̇it ] = E [(Xit − 1

T ∑Ts=1 Xis )(ϵit − 1
T ∑Ts=1 ϵis )] = 0

- This is faster than using Ind. dummies! Use “reghdfe” in STATA, “fixest” in R.
12 / 38
1.1. Within Estimator FE – Simpler Retake
- The idea is to take average across T for each i to obtain ȳi , x̄i etc. Then subtract the
within-unit average from each observation. [See, MixTape how to do it.] That is:

yeit = yit − ȳi

xeit = xit − x̄i
αei = αi − αi = 0

- The transformed regression is:

y˜it = x˜it ′ β + ϵ˜it

- The Within estimator of the ﬁxed eﬀects models involves running a pooled-OLS
regression on these transformed regressors.

- Note that there is no intercept in the X . And that the time-invariant αi are gone.
13 / 38
1.2. LSDV
- The model is given by
Yit = Xit′ β + αi + ϵit

- The dummy variable formulation has a set of n dummy variables (suppressing the
overall intercept), making the αi an additional set of parameters to be estimated.

- The within (or FE) estimator and LSDV estimators of β are numerically equivalent,
irrespective of T .

- The structure of variance covariance matrix of errors is important.

OLS can be used with LSDV if E [ϵϵ′ |X ] = σ2 I.
- But if errors are (auto-)correlated and/or there are non-homoskedastic errors then one
needs to use either GLS or FGLS.

- Since by construction we have T > 1, First Differences can be another way to account
for unobserved heterogeneity and account for unit-specific unobserved variables.
(Next slide)
14 / 38
1.3 First-Difference (FD) Specification
- The model is given by Yit = Xit′ β + αi + ϵit ,
- The first difference (over time) is given by

Yi,t − Yi,t −1 = (Xi,t − Xi,t −1 )′ β + (αi − αi ) + (ϵi,t − ϵi,t −1 ),

′
Or, ∆Yi,t = ∆Xi,t β + ∆ϵi,t ;
where ∆ refers to the time first difference.
Once again, the αi are differenced out with E [∆Xit · ∆ε it ] = 0, by strict exogeneity.
- When T = 2 and we have a balanced model (linear) then the above
three estimation methods are equivalent.
- When T > 2, OLS on the first-differences will not necessarily yield numerically the
same as the FE/Within and LSDV estimators (depends on structure of variance
covariance matrix).

15 / 38
1.4 Long-Diﬀerence (LD) Speciﬁcation

- Long-Diﬀerence Speciﬁcation – T >> 2, for h > 1,

Yit − Yi,t −h = β′ (Xit − Xi,t −h ) + (ε it − ε i,t −h ), t = h + 1, . . . , T

with E [(Xit − Xi,t −h ) · (ε it − ε i,t −h )] = 0.

- Exercise: With T = 2, OLS of estimators FE and FD equations are identical, even in

unbalanced panels.

- Note: irrespective of estimator, the endogeneity arising out of the correlation between ci and ϵit is taken
care of, and the estimated β can be interpreted as being causal in nature. However, a potential
disadvantage is that taking care of time-invariant unobserved heterogeneity also means that coeﬃcients
associated with any time-invariant observed variables (e.g. gender and caste etc.) cannot be estimated.

16 / 38
Asymptotic Sequences

- Conventional asymptotic: short panel

- A growing sample of I units for a ﬁxed number of periods, T

- Alternative asymptotic: long panel

- The sample grows by increasing both I and T (at the same or diﬀerent rates)

- More appropriate when I ≈ T

17 / 38
Cluster-robust Inference (1)

- Heteroskedasticity-robust SEs rely on

" # " #
h i
Var ∑ Xit ϵit =E ∑ Xit Xit′ ϵit2 ; as E Xit Xjs′ ϵit ϵjs = 0 for it ̸= js
it it

- Fails in panel data because of serial correlation

- Serial correlation in ϵit is prevalent
- In short panels, ϵ̇it (note the dot) are serially correlated, even if ϵit are not
- ∆ϵit is correlated with ∆ϵi,t −1 , unless ϵit follows a random walk

- Solution: cluster-robust (“clustered”) SEs

18 / 38
Cluster-robust Inference (2)

- For Pooled OLS of Yit on Xit without FEs:

! −1 !
β̂ = ∑ ∑ Xit Xit′ ∑ ∑ Xit Yit
i t i t
! −1 !
√ 1 1
I · ( β̂ − β) =
I ∑(∑ Xit Xit′ ) · √
I
∑(∑ Xit ϵit )
i t i t
| {z } | {z }
p −1 D
−
→E[ ] −
→N (0,Var [∑t Xit ϵit ])
∑t Xit Xit′
−1 " #" #′ ! −1
1 ′ 1 1 ′
I − dim(X ) ∑ ∑ Xit ϵ̂it ∑ Xit ϵ̂it · I X X
I · Var
d ( β̂) = X X ·
I i t t

- Note: I = n =cross-section dimension

19 / 38
Cluster-robust inference (3)
- Alternative derivation for the sandwich ”meat”:
!
Var ∑(∑ Xit′ ϵit ) = ∑ ∑ E Xit Xis′ ϵit ϵis

i it it js
T
=∑ ∑ E Xit Xis′ ϵit ϵis

i t,s =1

because all terms with i ̸= j are zero.

- With FEs, same with (Xit , ϵit ) replaced by (Ẋit , ϵ̇it ).

- In FDs, replaced by (∆Xit , ∆ϵit ).

- Warning: the approximation requires I to be large. Not enough to have large IT .

20 / 38
Tips: Choosing between estimators
Eﬃciency:
- FE estimator is eﬃcient when ϵit are serially uncorrelated.

- FD estimator is eﬃcient when ϵit follow a random walk.

- It’s not about persistence in the data (which could come from αi ) but about diﬀerential
persistence of observations close in time.

Robustness to model misspeciﬁcation:

- E.g. violations of strict exogeneity, measurement error.

Fads: FE estimation is more popular; it appears that you’ve controlled for more.
- False, and FD are more transparent (especially in more complex situations).

- If you use a FE speciﬁcation, always rewrite it in FD.

21 / 38
2. Two-way fixed effects (TWFE)
- The LSDV approach can be extended to include a set of time-specific effects as well.
Besides αi , we include (additive) period effects γt to capture shocks that affect all
units:
Yit = β′ Xit + αi + γt + ϵit (#)
Unlike αi , period-FEs are non-stochastic parameters (on period dummies).

Thus, Eqn.(#) requires an innocuous normalization on {αi } or {γt }.

- Note that suppressing the overall intercept will not suﬃce:

- Either one must only include (T − 1) time dummies;

- Or alternatively, include a full set of n and T dummy variables but impose the
restriction that ∑i αi = ∑t γt = 0.

- With only balanced data, least squares estimation can be undertaken

22 / 38
2. Two-way ﬁxed eﬀects (TWFE) ...(2)

- With only balanced panel data, least squares estimation can be undertaken

- FE estimator: In balanced panels, OLS from double-diﬀerenced speciﬁcation:

Ÿit = β′ Ẍit + ϵ̈it , where, V̈it = (Vit − V̄i · ) − (V̄·t − V̄¯ )

Follows from FWL, because unit and period dummies are exactly orthogonal.
- Regress Ÿit on Ẍit
- These formulations are often seen in DiD speciﬁcations, but are not conﬁned to them.
- See MixTape, for calculations of V̈it , V̄i · , V̄·t , V̄¯

- FD estimator OLS from ∆Yit = β′ ∆Xit + ∆γt + ∆ϵit with period FEs ∆γt .

23 / 38
3. Random Eﬀect Speciﬁcation
The formulation here is αi = α + ui so that

yit = xit′ β + α + ui + ε it

Note that the ui are random but time invariant. As with pooled OLS, a random effects
analysis puts ui into the error term.
We need the following set of assumptions to estimate the model.
Assumptions:
1. E [ε it |X ] = E [ui |X ] = 0
2. E [ε2it |X ] = σε2 ; E [ui2 |X ] = σu2
3. E [ε it uj |X ] = 0 ∀ i, j, t
4. E [ε it ε js |X ] = 0 ∀t ̸= s, i ̸= j
5. E [ui uj |X ] = 0 ∀i ̸ = j
24 / 38
3. Random Effect Specification...II
Say, νit = uit + ϵit . This version is often called the error component model, but note that
these imply (suppressing the conditionality on X ) for T observations on unit i:
1. E [vit2 ] = E [ui2 + ε2it + 2ui ε it ] = σu2 + σε2
2. E [vit vis ] = E [(ui + ε it )(ui + ε is ] = σu2 ∀t ̸ = s
3. E [vit vjs ] = 0 ∀t, s if i ̸= j
This implies that error structures are non-homoscedastic:
 2
σu + σε2 σu2 σu2

...
 σ2 σu2 + σε2 . . . σu2 
u
Σ= .. .. .. ..  = σε2 I T + σu2 i T i ′ T
 
 . . . . 
σu2 σu2 . . . σu2 + σε2
Because the cross-sectional units i and j are independent
⇒ Ω = In ⊗ Σ ⇒ β̂ GLS = (X ′ Ω−1 X )−1 X ′ Ω−1 y = β̂ RE
25 / 38
3. Random Effect Specification...III
Given that σε and σu are known, define
r
σε
θ = 1−
σε2 + T σu2

and transform data so that  

yi1 − θ ȳi
1 yi2 − θ ȳi 

Σ1/2 · yi = ..
.
 
σε  
yit − θ ȳi
and similarly for the X’s. And running OLS on this transformed model = GLS
Note 1: when θ = 1 we are back to the ﬁxed eﬀects version
Note 2: Also that if Ω is unknown, we need to use FGLS.

26 / 38
4.1. Hausman’s test for FE vs RE
- Under the null hypothesis that the covariates Xit and the αi are uncorrelated, both the
FE and RE are consistent.
But FE may be ineﬃcient relative to RE (reasons not yet covered).

- Then under H0 , we have ( β̂ FE − β̂ RE )′ (VFE − VRE )−1 ( β̂ FE − β̂ RE ) ∼ χ2K .

A rejection of H0 would favour the FE,
while failure to reject the H0 will lead to use of the RE.

- Stata Command:
xtreg dependent var independent vars, fe (Fixed Eﬀ)
xtreg dependent var independent vars, re (Random Eﬀ)
hausman (Perform Hausman Test)

27 / 38
Aside Attrition and unbalanced panels (See, Wooldridge 2010)
Panel data are often used, either with or without a RCT (especially longer panels). But
panels (even within an RCT) are often unbalanced. This ”lack of balance” maybe due to
attrition, which can be characterized as a type of selection–one that occurs over time.
- There may attrition from unit non-response
- Attrition that is an absorbing state. Once the unit exits the panel it does not reappear

- Wave non-response: Some units may come in and go out of a particular year or wave of
the panel (for panels with time periods greater than 2).

- Studies look at absorbing state and wave non-response separately

- Additionally, there may also be item non-response. Generally, multiple imputation

techniques are used to ﬁll the missing info

- Will estimates still be unbiased with unit (both absorbing state and wave)
non-response? Not necessarily!
28 / 38
Aside-1. Characterizing missingness
Need to characterize the pattern of missingness in the data.
(Whether there is selection (on un/observables) or not)

- MCAR: Missing completely random. [:≡ No selection]

The missing data are a random subsample of underlying population. This has no
implications for bias, only eﬃciency aﬀected.

- MAR: Missing at random. [:≡ selection on observables]

Conditional on observables, the non-response/response is random. Diﬀerences in
response are explainable by observed characteristics; should be taken into account.

- NMAR: Not missing at random. Systematic diﬀerences remain after conditioning on

observables. This is non-ignorable missingness; should be taken into account.

Example of bias with unit attrition (need pics)

29 / 38
Stylized ﬁgure – Attrition bias due to unit non-response

In a first difference framework. Unit attrition leads to patterned bias due to non-MACR
situation, despite fixed effects 30 / 38
Aside-2. Testing for non-random selection in unbalanced panels
More formally, if the model takes the following form:

yit = zi′ α + xit′ β + ui + ϵit

Under unit non-response, let rit = 1 if (yit , xit , zi ) is observed in period t and zero
otherwise. Testing if missingness is problematic involves:
- Running a logit or probit of ri on baseline characteristics

- Including functions of ri in the outcome equation and seeing if its inclusion matters.
E.g. the Nijman & Verbeek test involves including 3 additional dummy variables –
(a) if unit is present in wave t + 1,
(b) if unit is present in the sample for all waves and
(c) the number of waves for which unit is present.
Not all of the functions of r can be used in ﬁxed eﬀects as they’ll drop out.

31 / 38
Aside-3. Accounting for attrition
Aside 3.1 Panel data, attrition, FE and RE
Let ri ≡ [ri1 , ..., riT ].

- If E [ui + ϵit |zi , xit , ri ] = E [ϵit + ui |zi , xit ] = 0,

this is a case of MAR (selection on observables) and random eﬀects will be consistent

- If E [ϵit |zi , xit , ui , ri ] = E [ϵit |zi , xit , ui ] = 0,

this is also a case of MAR (selection on observables) and fixed effects will be
consistent.
Notice this is weaker than above, because it allows ri to depend on ui .
The idea is that as long as selection into the panel through time-invariant factors, they
get differenced out.
Lee bounds for attrition to deal with random assignment. Will do it later

32 / 38
Aside-3. Accounting for attrition
Aside 3.2 Inverse probability weights

A common approach when selection is based on observables, is to use inverse probability

weights. First model selection as usual.
Suppse ω are variables that are strong predictors of attrition. If information in the
ﬁrst period can be observed for all observations, then estimate (notice subscript on ω)

P (rit = 1|yit , xit , ωi1 ) = P (rit = 1|ωi1 ), t = 2, ..., T

This is nothing but a selection model. This can be used to obtain predicted probability p̂it
and then estimate in a second step using p̂1it as sampling weights in outcome equation
estimated in ﬁrst diﬀerences:
∆yit = ∆xit′ β + ∆ϵit
In time period 3, then the same process would be followed, with sampling weights 1 1
p̂i2 p̂i3
and so on until time period T .

33 / 38
5. Modeling Eﬀect Dynamics

- Distributed lags model can be accommodated with no change:

Yit = β′0 Xit + β′1 Xi,t −1 + ...β′L Xi,t −L + αi + ϵit

- Lagged dependent variable on the RHS: dynamic panel model

Yit = ρYi,t −1 + β′ Xit + αi + ϵit

- Violates strict exogeneity: ϵit can’t be mean-independent of Yi ,s −1 for s = t + 1.

- Then FE estimation is not consistent in short panels: “Nickell bias”.

- Can use Arellano-Bond GMM estimator.

34 / 38
7. Nonlinear panel data models

- Nonlinear models with ﬁxed eﬀects are much more complicated.

Consider binary choice models:

Pr (Yit = 1|Xit , αi ) = F ( β′ Xit + αi ), where, F = probit or logistic

- Likelihood estimation of β along with {αi } results in the incidental parameter

problem: β is inconsistent in short panels.

- For logistic regression (but not probit), conditional logit estimation yields consistent β̂.
- But not for average partial eﬀects which depend on αi .

- More progress with long panels; see Fernˊandez-Val and Weidner (Annual Review of
Economics, 2018).

35 / 38
FEs beyond panel data

There are other types of data with repeated observations:

Twin studies = repeated observations in the same family.

- E.g. Ashenfelter and Rouse (1998) estimate returns to schooling for twins.

- Family FEs control for genetic diﬀerences.

- Warning: why does the treatment (e.g. education) vary between twins?

- Bound and Solon (1999): ﬁrst-borns have higher weight, IQ, schooling.

36 / 38
FEs beyond panel data (2)

There are other types of data with repeated observations:

- County-level cross-section = repeated observations for the same state.

- State FEs control for additive state-level unobservables.

- In a county-level panel, can include state-by-year FEs:

Yit = β′ Xit + αi + γs (i ),t + ϵit

- Note: ∑s′ ,t ′ ·γs′ t ′ · 1[s (i ) = s ′ ] × 1[t = t ′ ] is correct; γs (i ) × δt is wrong.

- Including γs (i ),t demeans Yit and Xit by state-year-speciﬁc means.

37 / 38
FEs beyond panel data (3)

There are other types of data with repeated observations:

Repeated cross-sections / Psedo Panel:

in each year a new random sample from each state.
- Can’t control for individual heterogeneity.

- But state FEs control for additive state-level unobservables.

- Cluster at the state-level if Xit only varies by state.

- Dyadic data: e.g. how does distance Xij between exporting country i and importing
country j aﬀect log trade ﬂow Yij ? (Gravity equation)

38 / 38

Notes13 PDF
No ratings yet
Notes13 PDF
9 pages
Understanding Panel Data Models
No ratings yet
Understanding Panel Data Models
19 pages
Understanding Panel Data Regression
No ratings yet
Understanding Panel Data Regression
5 pages
Understanding Panel Data Models
No ratings yet
Understanding Panel Data Models
20 pages
Part 8 Panel Regression DP 2025
No ratings yet
Part 8 Panel Regression DP 2025
36 pages
CH 6 Panel Data
No ratings yet
CH 6 Panel Data
28 pages
Comprehensive Guide to Panel Data Analysis
No ratings yet
Comprehensive Guide to Panel Data Analysis
105 pages
Panel Data Models and Estimation Techniques
No ratings yet
Panel Data Models and Estimation Techniques
105 pages
CH 4 Panel Data
No ratings yet
CH 4 Panel Data
19 pages
Static Panel Data Analysis and Models
No ratings yet
Static Panel Data Analysis and Models
21 pages
Understanding Panel Data Models
No ratings yet
Understanding Panel Data Models
20 pages
Panel Data Analysis in Financial Econometrics
No ratings yet
Panel Data Analysis in Financial Econometrics
29 pages
Econometric Panel Data Methods Overview
No ratings yet
Econometric Panel Data Methods Overview
20 pages
Panel Data Analysis Techniques Explained
No ratings yet
Panel Data Analysis Techniques Explained
25 pages
Understanding Panel Data Models
No ratings yet
Understanding Panel Data Models
9 pages
Static Panel Data Models Overview
No ratings yet
Static Panel Data Models Overview
19 pages
Panel Data Analysis: Fixed vs Random Effects
No ratings yet
Panel Data Analysis: Fixed vs Random Effects
8 pages
Understanding Fixed Effects in Panel Data
100% (1)
Understanding Fixed Effects in Panel Data
11 pages
Panel Data Analysis: Fixed vs Random Effects
No ratings yet
Panel Data Analysis: Fixed vs Random Effects
8 pages
Panel Data Presentation Slide
No ratings yet
Panel Data Presentation Slide
29 pages
Panel Data Analysis Basics
No ratings yet
Panel Data Analysis Basics
21 pages
Understanding Panel Data and Regression Techniques
No ratings yet
Understanding Panel Data and Regression Techniques
17 pages
Panel Data Regression Models Explained
No ratings yet
Panel Data Regression Models Explained
25 pages
Addressing Endogeneity with Panel Data
No ratings yet
Addressing Endogeneity with Panel Data
35 pages
Panel Data Regression Techniques Explained
No ratings yet
Panel Data Regression Techniques Explained
32 pages
Pooled vs. Panel Data Analysis
No ratings yet
Pooled vs. Panel Data Analysis
9 pages
Panel Data
100% (2)
Panel Data
5 pages
Panel Data Analysis Course Overview
No ratings yet
Panel Data Analysis Course Overview
48 pages
Unobserved Effects in Panel Data Models
No ratings yet
Unobserved Effects in Panel Data Models
142 pages
Panel Data Analysis: Fixed vs Random Effects
No ratings yet
Panel Data Analysis: Fixed vs Random Effects
9 pages
Econometric Methods for Panel Data
No ratings yet
Econometric Methods for Panel Data
58 pages
Panel Data Analysis Techniques Explained
No ratings yet
Panel Data Analysis Techniques Explained
14 pages
Understanding Panel Data Analysis
No ratings yet
Understanding Panel Data Analysis
38 pages
Econometrics II CH IV Int To Panel
No ratings yet
Econometrics II CH IV Int To Panel
26 pages
Panel Data Analysis in Empirical Methods
No ratings yet
Panel Data Analysis in Empirical Methods
84 pages
Regression Model Assumptions Explained
No ratings yet
Regression Model Assumptions Explained
29 pages
Panel Data Econometrics Overview
No ratings yet
Panel Data Econometrics Overview
40 pages
Fixed Effects Estimators in Panel Data
No ratings yet
Fixed Effects Estimators in Panel Data
38 pages
Materi Teknik Data Panel
No ratings yet
Materi Teknik Data Panel
30 pages
Advanced Panel Data Methods in Econometrics
100% (1)
Advanced Panel Data Methods in Econometrics
38 pages
Panel Data Analysis Techniques Explained
No ratings yet
Panel Data Analysis Techniques Explained
61 pages
Understanding Panel Data Models
No ratings yet
Understanding Panel Data Models
17 pages
Panel Data Random Effects Model Analysis
No ratings yet
Panel Data Random Effects Model Analysis
25 pages
Panel Data Modelling Overview
No ratings yet
Panel Data Modelling Overview
211 pages
Panel Data Analysis: Fixed vs Random Effects
No ratings yet
Panel Data Analysis: Fixed vs Random Effects
18 pages
Panel Data Models in Econometrics
No ratings yet
Panel Data Models in Econometrics
25 pages
Panel Fixed Effects in Econometrics
No ratings yet
Panel Fixed Effects in Econometrics
18 pages
Chap 2 Panel Data
No ratings yet
Chap 2 Panel Data
31 pages
Understanding Panel Data Regression
No ratings yet
Understanding Panel Data Regression
19 pages
Advanced Panel Data Analysis Techniques
No ratings yet
Advanced Panel Data Analysis Techniques
13 pages
Panel Data Analysis in Econometrics
No ratings yet
Panel Data Analysis in Econometrics
56 pages
Lecture 5 - Hierarchical
No ratings yet
Lecture 5 - Hierarchical
15 pages
Understanding Panel Data Analysis
No ratings yet
Understanding Panel Data Analysis
50 pages
plm Package for Panel Data in R
100% (1)
plm Package for Panel Data in R
52 pages
Part 10 - Advanced Topic On Endogeneity
No ratings yet
Part 10 - Advanced Topic On Endogeneity
64 pages
Panel Data Regression Models Overview
No ratings yet
Panel Data Regression Models Overview
98 pages
Panel Data Econometrics with R's plm Package
No ratings yet
Panel Data Econometrics with R's plm Package
51 pages
09 RDD SRD FRD
No ratings yet
09 RDD SRD FRD
88 pages
08 Synthetic Control
No ratings yet
08 Synthetic Control
24 pages
DSE MA Economics Introductory Econometrics Exam
No ratings yet
DSE MA Economics Introductory Econometrics Exam
3 pages
Understanding Property Tax Implications
No ratings yet
Understanding Property Tax Implications
12 pages
Python for Data Science Syllabus
No ratings yet
Python for Data Science Syllabus
17 pages
Importance of Exploratory Data Analysis
No ratings yet
Importance of Exploratory Data Analysis
7 pages
Big Basket Recommender System Insights
No ratings yet
Big Basket Recommender System Insights
3 pages
Parenting Styles and School Bullying Effects
No ratings yet
Parenting Styles and School Bullying Effects
15 pages
Latin Square Design & ANOVA Guide
No ratings yet
Latin Square Design & ANOVA Guide
30 pages
House Price Prediction Capstone Report
No ratings yet
House Price Prediction Capstone Report
35 pages
Adolescent Mental Health in South Korea
No ratings yet
Adolescent Mental Health in South Korea
19 pages
National Compensation Survey Overview
No ratings yet
National Compensation Survey Overview
22 pages
Data Science with Python: A Complete Guide
80% (5)
Data Science with Python: A Complete Guide
29 pages
Prediction-Powered Inference Methods
No ratings yet
Prediction-Powered Inference Methods
61 pages
Understanding Data Analytics Fundamentals
No ratings yet
Understanding Data Analytics Fundamentals
85 pages
House Price Prediction Model Overview
No ratings yet
House Price Prediction Model Overview
5 pages
Advanced Pandas DataFrame Operations
No ratings yet
Advanced Pandas DataFrame Operations
71 pages
Understanding Estimators in Statistics
No ratings yet
Understanding Estimators in Statistics
28 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
19 pages
Types of Big Data Analytics Explained
No ratings yet
Types of Big Data Analytics Explained
22 pages
Unit 3 Data SC BSC Final Yr Comp Science
No ratings yet
Unit 3 Data SC BSC Final Yr Comp Science
23 pages
Who Owns Redwood City's Rentals?
No ratings yet
Who Owns Redwood City's Rentals?
35 pages
Data Mining Concepts and Attributes
No ratings yet
Data Mining Concepts and Attributes
29 pages
EM Algorithm for Missing Data Analysis
No ratings yet
EM Algorithm for Missing Data Analysis
10 pages
Enhancing Geography Literacy with Talk Moves
No ratings yet
Enhancing Geography Literacy with Talk Moves
17 pages
Understanding Data Science Fundamentals
No ratings yet
Understanding Data Science Fundamentals
132 pages
Predicting At-Risk Students with AI
No ratings yet
Predicting At-Risk Students with AI
24 pages
Helicopter Parenting and FOMO in College
No ratings yet
Helicopter Parenting and FOMO in College
11 pages
Teacher Support and L2 Communication WTC
No ratings yet
Teacher Support and L2 Communication WTC
24 pages
Time Series and Survival Analysis Overview
No ratings yet
Time Series and Survival Analysis Overview
8 pages
Machine Learning Overview and Applications
No ratings yet
Machine Learning Overview and Applications
43 pages
Adolescent Career Decision-Making Issues
No ratings yet
Adolescent Career Decision-Making Issues
13 pages
Predicting Nanoparticle Cytotoxicity
No ratings yet
Predicting Nanoparticle Cytotoxicity
15 pages
University Proximity and Housing Costs in NY
No ratings yet
University Proximity and Housing Costs in NY
20 pages