Policy and Management Evaluation Methods
Lecture 7
Prof. Dr. Fabian Kosse
Chair of Data Science in Business & Economics
1
From last week:
▪ Because randomly assigned treatment and control groups come from
the same underlying population, they are the same in every way,
including their expected 𝑌0𝑖
▪ I.e. 𝐸 𝑌0𝑖 𝐷𝑖 = 0 and 𝐸 𝑌0𝑖 𝐷𝑖 = 1 are the same if treatment 𝐷𝑖 is
randomly assigned. Therefore:
𝐸 𝑌1𝑖 𝐷𝑖 = 1 − 𝐸 𝑌0𝑖 𝐷𝑖 = 0
= 𝐸 𝑌0𝑖 + κ 𝐷𝑖 = 1 − 𝐸 𝑌0𝑖 𝐷𝑖 = 0
= κ + 𝐸 𝑌0𝑖 𝐷𝑖 = 1 − 𝐸 𝑌0𝑖 𝐷𝑖 = 0
=κ
▪ The last step follows from random assignment
▪ Hence, random assignment eliminates selection bias
2
The first 5 to appear in Dirndl or Lederhosen will
receive a free drink
3
Lecture Outline
▪ Another example of randomized experiments in firms
▪ Fixed effects
▪ Panel data
4
Lecture Outline
▪ Another example of randomized experiments in firms
▪ Fixed effects
▪ Panel data
5
Randomized Experiments
▪ Randomized experiments are a powerful tool to understand
causal effects and very useful (but probably underused?) in
firms
▪ Arguments against RCTs
▪ Ethical issue?
▪ Implementable?
▪ Experiments inform firms about:
▪ Customer behaviour (see tutorial: ebay)
▪ Management practices (this week)
▪ …
6
Working From Home
▪ Working from home is becoming more important:
Source: Bloom et. al. 2015
7
Selection Bias if we used Observational Data
▪ Why can’t we simply compare outcomes (e.g. productivity,
promotion prospects,…) of individuals who work from home
(WFH) to those who do not?
▪ Those who selected into working from home may be more or
less productive to start with: classical selection bias
▪ Ctrip (a leading travel agency in China) decided to run an
experiment to understand the causal effect on WFH
▪ Teamed up with economists at Stanford University (Nick Bloom,
James Liang, John Roberts, Zhichung Jenny Ying) to run the
experiment
8
Ctrip
▪ Ctrip is a leading travel agency in China
▪ Worth about $5 billion at the time of the experiment
9
Ctrip – English Website
10
Experimental Design
▪ Shanghai call centre workers were asked whether they wanted to
change their work arrangements from
▪ 5 days a week in the office
▪ 4 days at home and 1 day in the office
▪ 994 workers were asked whether they wanted to work from home; 503
volunteered for the experiment
▪ Why not simply compare the 503 individuals to the rest?
▪ Restricting sample: Among the volunteers only those who had worked
6 months with the company, had broadband internet, and an
independent workspace at home were allowed to participate (249
individuals)
▪ The 249 individuals were randomly allocated to a treatment (WFH)
and control group (continued to work in the office)
11
Working From Home
12
Who Volunteers To WFH?
13
Productivity Over Time
14
How to Evaluate the Experiment?
▪ The simplest way to evaluate the experiment would be to
compare productivity differences between the treatment and
control group during the experiment:
𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑣𝑖𝑡𝑦𝑖 = 𝛽0 + 𝛽1 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑖 + 𝜀𝑖
15
Regression Results
16
What Did the Firm Do After Seeing the Results?
▪ The experimental results indicate that productivity increased by
about 18% of a standard deviation in the treatment group
▪ Moreover, the control group did not do worse than call centre
workers in another location (rules out reduced motivation in the
control group)
▪ WFH caused higher productivity and lower costs for the
company (because office space is expensive in Shanghai)
→ After seeing the results the company rolled out voluntary WFH to
the whole company
17
Lecture Outline
▪ Another example of randomized experiments in firms
▪ Fixed effects
▪ Panel data
18
Fixed Effects
▪ A while ago we introduced indicator/dummy variables
▪ Including indicator/dummy variables as control variables can be
useful because they (like all other control variables):
▪ may reduce omitted variable bias
▪ reduce standard errors (because they reduced residual variation)
▪ Sometimes a large set of indicator variables are called fixed
effects
▪ Let’s look at an example using the World Management Survey
data (Bloom & van Reenen) and estimate the regression with FE
and without country FE
▪ Relation: Management Quality and Sales
19
Regression without Fixed Effects
Without FE
Source | SS df MS Number of obs = 732
-------------+---------------------------------- F(1, 730) = 39.98
Model | 76.4763017 1 76.4763017 Prob > F = 0.0000
Residual | 1396.4909 730 1.91300124 R-squared = 0.0519
-------------+---------------------------------- Adj R-squared = 0.0506
Total | 1472.9672 731 2.01500302 Root MSE = 1.3831
------------------------------------------------------------------------------
ls | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
amanagement | .4287287 .0678073 6.32 0.000 .295608 .5618494
_cons | 10.59861 .2237934 47.36 0.000 10.15925 11.03796
------------------------------------------------------------------------------
▪ Why could country FEs address omitted variable bias?
20
Regression with Country Fixed Effects
Without FE
Source | SS df MS Number of obs = 732
-------------+---------------------------------- F(1, 730) = 39.98
Model | 76.4763017 1 76.4763017 Prob > F = 0.0000
Residual | 1396.4909 730 1.91300124 R-squared = 0.0519
-------------+---------------------------------- Adj R-squared = 0.0506
Total | 1472.9672 731 2.01500302 Root MSE = 1.3831
------------------------------------------------------------------------------
ls | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
amanagement | .4287287 .0678073 6.32 0.000 .295608 .5618494
_cons | 10.59861 .2237934 47.36 0.000 10.15925 11.03796
------------------------------------------------------------------------------
With FE
Left out category → US (base category)
------------------------------------------------------------------------------
ls | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
amanagement | .3520594 .0659144 5.34 0.000 .2226542 .4814647
ccfrance | -.8659832 .1391211 -6.22 0.000 -1.13911 -.5928561
ccgermany | -.0933134 .1320019 -0.71 0.480 -.3524639 .1658371
ccuk | -.8167841 .1346197 -6.07 0.000 -1.081074 -.5524943
_cons | 11.19305 .2321523 48.21 0.000 10.73728 11.64882 21
------------------------------------------------------------------------------
Fixed Effects
▪ The coefficient changes after including the FEs, suggesting that
the regression without FE suffered from omitted variable bias
▪ Standard errors are also lower
22
Lecture Outline
▪ Another example of randomized experiments in firms
▪ Fixed effects
▪ Panel data (& FEs)
23
Panel Data
▪ Panel or longitudinal datasets contain observations that vary both at
the
▪ i: cross-sectional level: e.g. individuals, firms, countries (total: N)
▪ t: time level (total: T)
▪ The individuals (or firms, countries) are observed for multiple time
periods
▪ One distinguishes between different types of panels:
▪ Balanced: all individuals are observed in all time periods → number of observations:
NxT
▪ Unbalanced: (some) individuals are only observed in some time periods
▪ The concern with unbalanced panels is that observations may not be
missing at random (same concern if panel was artificially balanced by
dropping all unbalanced observations)
24
Panel Data – Model
▪ The inclusion of fixed effects allows us to control for all
unobserved factors that vary across cross-sectional units (e.g.
individuals) but do not vary over time
▪ Think of the following model:
𝑦𝑖𝑡 = 𝛽0 + 𝛽1 𝑥𝑖𝑡 + 𝑐𝑖 + 𝜀𝑖𝑡
▪ You have data that vary across individuals (𝑖) over time (𝑡)
▪ You are interested in estimating 𝛽1
▪ 𝜀𝑖𝑡 is an error term that satisfies “Gauss-Markov assumptions”
▪ 𝑐𝑖 is a component that affects the outcome and varies across
individuals but not over time (→ Estimation uses only individual
25
variation over time)
Panel Data – Example
▪ Suppose you wanted to estimate the effect of being married on
wages (why should that have an effect?)
ln(𝑤𝑎𝑔𝑒)𝑖𝑡 = 𝛽0 + 𝛽1 𝑀𝑎𝑟𝑟𝑖𝑒𝑑𝑖𝑡 + γ𝑡 + 𝑐𝑖 + 𝜀𝑖𝑡
▪ In the example we also control for time FE (γ𝑡 ) (= varies over
time but not over individuals) because individuals grow older,
earn more and also get married
▪ Do we expect 𝛽1 to be upward or downward biased if we do not
control for 𝑐𝑖 ?
𝐶𝑜𝑣(𝑥1, 𝑥2 )
▪ OVB formula: 𝐸(𝛽1 ) = 𝛽1 + 𝛽2
መ
𝑉𝑎𝑟(𝑥1 )
▪ E.g. 𝑥2 = Health status! 26
1) Estimating FE with Dummies
▪ Conceptionally the easiest way of estimating the FE model
would be to include one dummy variable for each individual in
the model:
𝑦𝑖𝑡 = 𝛽0 + 𝛽1 𝑥𝑖𝑡 + 1 𝐼𝑛𝑑𝑖𝑣. 1 + 1 𝐼𝑛𝑑𝑖𝑣. 2 + ⋯ + 1 𝐼𝑛𝑑𝑖𝑣. 𝑁 + 𝜀𝑖𝑡
▪ Using the summation sign notation:
𝑁
𝑦𝑖𝑡 = 𝛽0 + 𝛽1 𝑥𝑖𝑡 + 1 𝐼𝑛𝑑𝑖𝑣. 𝑖 +𝜀𝑖𝑡
𝑖=1
27
1) Estimating FE with Dummies - Example
▪ We use data from the National Longitudinal Survey of Youth
1979 which has surveyed 12,686 individuals since 1979
▪ We want to understand the effect of being married on income
▪ First estimate the following model:
ln(𝑤𝑎𝑔𝑒)𝑖𝑡 = 𝛽0 + 𝛽1 𝑀𝑎𝑟𝑟𝑖𝑒𝑑𝑖𝑡 + γ𝑡 + 𝑢𝑖𝑡
28
Regression Results – Only Time FE
Source | SS df MS Number of obs = 63,332
-------------+---------------------------------- F(19, 63312) = 1538.85
Model | 34758.029 19 1829.36995 Prob > F = 0.0000
Residual | 75264.4744 63,312 1.18878687 R-squared = 0.3159
-------------+---------------------------------- Adj R-squared = 0.3157
Total | 110022.503 63,331 1.73726143 Root MSE = 1.0903
------------------------------------------------------------------------------
ln_income | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
married | .198981 .00924 21.53 0.000 .1808706 .2170915
year_d1 | -.203491 .0853035 -2.39 0.017 -.370686 -.0362959
year_d2 | -.0381279 .0846518 -0.45 0.652 -.2040455 .1277896
year_d3 | .0654794 .084215 0.78 0.437 -.0995821 .2305409
year_d4 | .230483 .0839114 2.75 0.006 .0660165 .3949496
year_d5 | .3385567 .0837646 4.04 0.000 .174378 .5027355
▪ What does the regression coefficient on married mean?
29
Regression Results – Now Include Individual FE
------------------------------------------------------------------------------
ln_income | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
married | .1108063 .0098053 11.30 0.000 .0915879 .1300247
year_d1 | -2.523614 .0246953 -102.19 0.000 -2.572017 -2.475211
year_d2 | -2.306119 .0230091 -100.23 0.000 -2.351217 -2.261021
year_d3 | -2.187204 .0218349 -100.17 0.000 -2.230001 -2.144408
year_d4 | -1.985332 .0209646 -94.70 0.000 -2.026423 -1.944241
year_d5 | -1.861583 .0205182 -90.73 0.000 -1.901799 -1.821367
▪ Note: Individual FEs not shown in the table
▪ Was our intuition of the omitted variable bias correct?
30
2) Within Estimator
▪ A second (and the most common) approach to estimate such
models is the so-called within-estimator
▪ Model:
𝑦𝑖𝑡 = 𝛽0 + 𝛽1 𝑥𝑖𝑡 + 𝑐𝑖 + 𝜀𝑖𝑡 (1)
▪ Step 1: Average estimating equation over 𝑡 = 1, … , 𝑇
𝑦ത𝑖 = 𝛽0 + 𝛽1 𝑥ҧ𝑖 + 𝑐𝑖ҧ + 𝜀𝑖ҧ (2)
1 1 1
where 𝑦ത𝑖 = 𝑇 σ𝑇𝑡=1 𝑦𝑖𝑡 , 𝑥ҧ𝑖 = 𝑇 σ𝑇𝑡=1 𝑥𝑖𝑡 , 𝑐𝑖ҧ = 𝑐𝑖 , 𝜀𝑖ҧ = 𝑇 σ𝑇𝑡=1 𝜀𝑖𝑡
▪ Step 2: Subtract equation (2) from (1) to get :
𝑦𝑖𝑡 − 𝑦ത𝑖 = 𝛽1 𝑥𝑖𝑡 − 𝑥ҧ𝑖 +(𝑐𝑖 −𝑐𝑖ҧ ) + (𝜀𝑖𝑡 − 𝜀𝑖ҧ )
31
2) Within Estimator
▪ Step 2: Subtract equation (2) from (1) to get:
𝑦𝑖𝑡 − 𝑦ത𝑖 = 𝛽1 𝑥𝑖𝑡 − 𝑥ҧ𝑖 +(𝑐𝑖 −𝑐𝑖ҧ ) + (𝜀𝑖𝑡 − 𝜀𝑖ҧ )
▪ This can be rewritten as (~ = tilde):
𝑦𝑖𝑡 = 𝛽1 𝑥𝑖𝑡 + 𝜀𝑖𝑡
ǁ
where 𝑦𝑖𝑡 = 𝑦𝑖𝑡 − 𝑦ഥ𝑖 , ത𝑖 ,
𝑥𝑖𝑡 = 𝑥𝑖𝑡 − 𝑥 ǁ = 𝜀𝑖𝑡 − 𝜀ҧ𝑖 , (𝑐𝑖 −𝑐𝑖ҧ ) = 0
𝜀𝑖𝑡
▪ Step 3: Run a regression of 𝑦𝑖𝑡 on 𝑥𝑖𝑡 using OLS
▪ In practice, you do not have to manually carry out these steps
because the statistical software will do them for you
▪ This regression is computationally easier to estimate than the
one with lots of FE
32
Regression Results – Within Estimator
------------------------------------------------------------------------------
ln_income | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
married | .1108063 .0098053 11.30 0.000 .0915879 .1300247
year_d1 | -2.523614 .0246953 -102.19 0.000 -2.572017 -2.475211
year_d2 | -2.306119 .0230091 -100.23 0.000 -2.351217 -2.261021
year_d3 | -2.187204 .0218349 -100.17 0.000 -2.230001 -2.144408
year_d4 | -1.985332 .0209646 -94.70 0.000 -2.026423 -1.944241
year_d5 | -1.861583 .0205182 -90.73 0.000 -1.901799 -1.821367
▪ The within estimator and the model that includes all FEs always
give the same results
▪ Estimating the model with a lot of fixed effects, however, is
computationally very demanding
33
3) First Differences
▪ Yet another alternative to estimate fixed effects models would
be to use first differences
▪ Model:
𝑦𝑖𝑡 = 𝛽0 + 𝛽1 𝑥𝑖𝑡 + 𝑐𝑖 + 𝜀𝑖𝑡 (1)
▪ Step 1: Lag the model by one period gives:
𝑦𝑖𝑡−1 = 𝛽0 + 𝛽1 𝑥𝑖𝑡−1 + 𝑐𝑖 + 𝜀𝑖𝑡−1 (2)
▪ Step 2: Subtract equation (2) from (1) to get:
𝑦𝑖𝑡 − 𝑦𝑖𝑡−1 = 𝛽1 𝑥𝑖𝑡 − 𝑥𝑖𝑡−1 +𝑐𝑖 −𝑐𝑖 + 𝜀𝑖𝑡 − 𝜀𝑖𝑡−1
▪ This can be rewritten as:
Δ𝑦𝑖𝑡 = 𝛽1 Δ𝑥𝑖𝑡 + Δ𝜀𝑖𝑡
34
3) First Differences
▪ The first-differences estimator is the pooled OLS regression of
Δ𝑦𝑖𝑡 on Δ𝑥𝑖𝑡
▪ First differencing eliminates 𝑐𝑖
▪ When we take first differences we lose the first time period for
each cross section: we now have 𝑇 − 1 periods instead of 𝑇
▪ Under “Gauss-Markov assumptions”: less efficient (for some
violations: more efficient)
▪ With just two time periods the first differences estimator is the
same as the within estimator
35
Variables That do Not Vary Over Time
▪ Once you include FE in the model you cannot estimate effects
of variables that do not vary over time (e.g. gender)
▪ Model without individual FE:
------------------------------------------------------------------------------
ln_income | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
married | .2881853 .0062954 45.78 0.000 .2758465 .3005241
female | -.4373824 .0059733 -73.22 0.000 -.4490899 -.4256749
year_d1 | -.0487806 .0597394 -0.82 0.414 -.1658687 .0683074
year_d2 | .1698219 .0594252 2.86 0.004 .0533496 .2862941
▪ Model with FE:
------------------------------------------------------------------------------
ln_income | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
married | .128708 .0069228 18.59 0.000 .1151396 .1422765
female | 0 (omitted)
year_d1 | -2.621585 .0170619 -153.65 0.000 -2.655026 -2.588144
year_d2 | -2.376223 .0160948 -147.64 0.000 -2.407768 -2.344677
36
References
▪ Bloom, N.; Liang, J.; Roberts, J. & Ying, Z. J. (2015). Does Working
from Home Work? Evidence from a Chinese Experiment. In: The
Quarterly Journal of Economics Vol. 130 (1), pp. 165-218.
37