0% found this document useful (0 votes)
18 views19 pages

CH 4 Panel Data

Chapter Four discusses panel data models, which combine time-series and cross-sectional data to analyze variables over time and across entities. It explains the advantages of using panel data, types of variables, and necessary data formats, as well as two primary estimation techniques: fixed effects and random effects models. The chapter also outlines how to choose between these models using the Hausman test to determine the appropriate approach based on the correlation of unique errors with regressors.

Uploaded by

mohammedfuad578
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views19 pages

CH 4 Panel Data

Chapter Four discusses panel data models, which combine time-series and cross-sectional data to analyze variables over time and across entities. It explains the advantages of using panel data, types of variables, and necessary data formats, as well as two primary estimation techniques: fixed effects and random effects models. The chapter also outlines how to choose between these models using the Hausman test to determine the appropriate approach based on the correlation of unique errors with regressors.

Uploaded by

mohammedfuad578
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CHAPTER FOUR

PANEL DATA MODELS

1
Introduction
• Panel (or longitudinal) data combine time-series and cross-
sectional data in a very specific way.
• Panel data include observations on the same variables from the
same cross-sectional sample from two or more different time
periods.
Panel Data Examples
• Annual unemployment rates of each state over several years
• Quarterly sales of individual stores over several quarters
• Wages for the same worker, working at several different jobs

2
Reasons for using Panel Data
1. Panel data can take explicit account of individual-specific
heterogeneity.
2. By combining data in two dimensions, panel data gives more
data variation, less collinearity and more degrees of
freedom.
3. Panel data is better suited than cross-sectional data for studying
the dynamics of change. For example it is well suited to
understanding transition behaviour – for example company
bankruptcy or merger.
4. Panel data is better at detecting and measuring effects that
cannot be observed in either cross-section or time-series data.
That is it allows researchers to avoid omitted variable problems
that otherwise would cause bias in cross-sectional studies.
5. Panel data enables the study of more complex behavioural
models – for example the effects of technological change, or
economic cycles.
3
Type of variables
• There are four different kinds of variables that we encounter
when we use panel data.
• First, we have variables that can differ between individuals but
don’t change over time, such as gender, ethnicity .
• Second, we have variables that change over time but are the
same for all individuals in a given time period, such as the retail
price index and the national unemployment rate.
• Third, we have variables that vary both over time and between
individuals, such as income and marital status.
• Fourth, we have trend variables that vary in predictable ways
such as an individual’s age.

4
Data formats

• To estimate an equation using panel data, it’s crucial


that the data be in the right format because regression
packages like Stata and Eviews need to identify which
observations belong to which time periods and which
cross-sectional entities.
• Stata, for example, requires that a panel data set
include a date counter and an id number counter, but it
doesn’t require that the data be in any particular order.
• It is important to have an ID variable that distinguishes
one entity from others, such as patient ID, firm ID and
county name.
5
Data formats

• The use of panel data requires a slight expansion of our notation.


• In the past we’ve used the subscript i to indicate the observation
number in a cross-sectional data set, so Yi indicated Y for the ith
cross-sectional observation.
• Similarly, we’ve used the subscript t to indicate the observation
number in a time-series data set, so Yt indicated Y for the tth time-
series observation.
• In a panel data set, however, variables will have both a cross-
sectional and a time-series component, so we’ll use both
subscripts.
• As a result, Yit indicates Y for the ith cross-sectional and tth time-
series observation.
• This notation expansion also applies to independent variables and
error terms. 6
Type of Panel Data

• A panel data set contains n entities, each of which includes T


observations measured at 1 through t time period. Thus, the total
number of observations in the panel data is 𝑛𝑇. Ideally, panel data
are measured at regular time intervals (e.g., year, quarter, and
month). A panel may be long or short, balanced or unbalanced.
• Long versus Short Panel Data:- A short panel has many entities
(large n) but few time periods (small T), while a long panel has
many time periods (large T) but few entities
• Balanced Vs Unbalanced Panel Data:- In a balanced panel, all
entities have measurements in all time periods. Therefore, the total
number of observations is nT. When each entity in a data set has
different numbers of observations, the panel data are not balanced.
Some cells in the contingency table have zero frequency.
Accordingly, the total number of observations is not nT in an
unbalanced panel.
7
Panel Data

8
Panel Data

• Panel data allows you to control for variables you cannot observe or
measure like cultural factors or difference in business practices across
companies; or variables that change over time but not across entities
(i.e. national policies, federal regulations, international agreements,
etc.). This is, it accounts for individual heterogeneity.

• There are several alternative panel data estimation procedures

• In this chapter we focus on two techniques use to analyze panel data:

1. Fixed effects
2. Random effects
9
The Fixed Effects Model
• Most researchers use the fixed effects model, which allows
each cross-sectional unit to have a different intercept:
𝑌𝑖𝑡 = 𝛽0 + 𝛽1𝑋𝑖𝑡 + 𝛽2𝐷2𝑖 + … + 𝛽𝑁𝐷𝑁𝑖 + 𝑣𝑖𝑡 (3)
where:
D2 = intercept dummy equal to 1 for the second cross-sectional
entity and 0 otherwise
DN = intercept dummy equal to 1 for the Nth cross-sectional
entity and 0 otherwise
• Note that Y, X, and v have two subscripts!

16-10
• One major advantage of the fixed effects model is that it avoids
bias due to omitted variables that don’t change over time
– e.g., race or gender
– Such time-invariant omitted variables often are referred to as
unobserved heterogeneity or a fixed effect
• To understand how this works, consider what Equation 4 would
look like with only two years worth of data:
𝑌𝑖𝑡 = 𝛽0 + 𝛽1𝑋𝑖𝑡 + 𝛽2𝐷2𝑖 + 𝑣𝑖𝑡 (4)
• Let’s decompose the error term, vit, into two components, a
classical error term (εit) and the unobserved impact of the time-
invariant omitted variables (ai):
𝑣𝑖𝑡 = 𝜀𝑖𝑡 + 𝑎𝑖 (5)

11
• If we substitute Equation 5 into Equation 4, we get:
𝑌𝑖𝑡 = 𝛽0 + 𝛽1𝑋𝑖𝑡 + 𝛽2𝐷2𝑖 + 𝜀𝑖𝑡 + 𝑎𝑖 (6)
• Next, average Equation 6 over time for each observation i, thus
producing:
Yi = β0 + β1Xi + β2D2i + εi + ai (7)
where the bar over a variable indicates the mean of that variable
across time
• Note that ai, β2D2i, and β0 don’t have bars over them because
they’re constant over time

12
• If we now subtract Equation 7 from Equation 6, we get:

yit − yi = a − a + B ( xit − xi ) + (uit − ui ) (8)


0i 0i 1
• Note that ai, β2, D2i, and β0 are subtracted out because
they’re in both equations
(9)

• We’ve therefore shown that estimating panel data with the


fixed effects model does indeed drop the ai out of the
equation
• Hence, the fixed effects model will not experience bias due to
time-invariant omitted variables!

13
• Use FE whenever you are only interested in analyzing the impact of
variables that vary over time. Each entity has its own individual
characteristics that may or may not influence the predictor
variables.

• When using FE we assume that something within the individual may


impact or bias the predictor or outcome variables and we need to
control for this. This is the rationale behind the assumption of the
correlation between entity’s error term and predictor variables.

• FE remove the effect of those time-invariant characteristics so we


can assess the net effect of the predictors on the outcome variable.
14
The Random Effects Model
• Recall that the fixed effects model is based on the
assumption that each cross-sectional unit has its own
intercept.
• The random effects model instead is based on the
assumption that the intercept for each cross-sectional unit is
drawn from a distribution (that is centered around a mean
intercept).
• Thus each intercept is a random draw from an “intercept
distribution” and therefore is independent of the error term
for any particular observation
– Hence the term random effects model

15
The Random Effects Model
• The rationale behind random effects model is that, unlike the
fixed effects model, the variation across entities is assumed to be
random and uncorrelated with the predictor or independent
variables included in the model:
• The crucial distinction between fixed and random effects is
whether the unobserved individual effect embodies elements
that are correlated with the regressors in the model, not whether
these effects are stochastic or not.
• An advantage of random effects is that you can include time
invariant variables (i.e. gender).
• In the fixed effects model these variables are absorbed by the
intercept.
16
The fixed effects model assumes that each group (firm) has a non-
stochastic group-specific component to y. Including dummy
variables is a way of controlling for unobservable effects on y.

But these unobservable effects may be stochastic (i.e. random). The


Random Effects Model attempts to deal with this:

y it = a 0 + a1 x it + v i +  it (6)

Here the unobservable component, vi , is treated as a component of


the random error term. vi is the element of the error which varies
between groups but not within groups. εit is the element of the
error which varies over group and time. 17
Choosing Between Fixed and Random Effects
▪ To decide between fixed or random effects, you can run a
Hausman test where the null hypothesis is that the preferred
model is random effects vs. the alternative the fixed . It basically
tests whether the unique errors (ui) are correlated with the
regressors, the null hypothesis is they are not.
▪ Run a fixed effects model and save the estimates, then run a
random model and save the estimates, then perform the
Hausman test.
▪ If the p-value of the test is > 0.05,we accept the null(Ho),RE
model is consistent and efficient
▪ If p-value of the test is < 0.05,we reject the Ho, and accept the
alternative(HA),FE model is consistent and efficient

18
Choosing Between Fixed and Random Effects
Pooled ols
reg mrdrte exec unem d90 d93
Fixed effects estimation
xtreg mrdrte exec unem d90 d93, fe
Random effects estimation
xtreg mrdrte exec unem d90 d93, re
compare the two models using hausman's specification test
xtreg mrdrte exec unem d90 d93, fe
estimates store FE
xtreg mrdrte exec unem d90 d93, re
estimates store RE
hausman FE RE

19

You might also like