Solution to ECON 120C Midterm
Spring 2016y
Question 1
(a) Disagree. While correlation does not imply causation, “no correlation” does
not imply “no causation” either. It could be that there is a non-linear causal
relationship between X and Y , but our regression, which captures only a linear
relationship, does not pick this up. It could be that there is a mutual linear
causality, which combines to give a zero correlation.
(b) Agree. A high shock for the demand curve (u positive and large) will shift
the demand curve up and raise price. So cov(P; u) > 0. A positive covariance
between the regressor and the residual will induce a positive bias in the OLS
estimate of b.
(c) Disagree. Randomization is needed to ensure that the fertilizer is not targeted
to the best (or worst) plots in the same …eld for example.
(d) Disagree. We need additional assumptions or evidence to interpret regression
output causally. Regression by itself gives us only correlation or association,
which can be useful for predicting the future but for predicting the e¤ect of an
intervention.
(e) Agree. Each layer is like a single coin ‡ip. If the ball travels left, right, left in
a three layer quincunx, this is like ‡ipping a coin 3 times if we associate “left”
with heads and “right” with tails. The number of layers is equivalent to the
number of Xi ’s, i.e., the sample size.
(f) Agree. We may view our dataset as arising from a simple random sample of
a population. Each dataset generates a p-value. We do not know the p-value
before we observe the random sample. So p-value is a random variable and has
a sampling distribution. The p-value in a STATA output can be regarded as a
draw from its sampling distribution.
The solution was initially written by Jason Bigenho with assistance from Roy Allen (Jason and
Roy are two of our TA’s). Professor Sun proof read the solution carefully.
y
Note that this is a solution to one of the two versions of the midterm. The two versions are
similar with some change in numbers and ordering of the questions.
1
Question 2
(a)
^ cov(X; Y ) cov(X; 1 + 2X + u) cov(X; 2X) + cov(X; u)
OLS ! = = =
var(X) var(X) var(X)
2var(X) + cov(X; u) cov(Z1 + Z2 ; Z1 Z2 )
= =2+
var(X) var(Z1 + Z2 )
2
EZ1 EZ2 2
=2+ =2
var(Z1 + Z2 )
(b)
cov(Y; X) 2var(X) 4
^ OLS ! = = =
cov(Y ) var(1 + 2X + u) var(1 + 2(Z1 + Z2 ) + Z1 Z2 )
4 4 4 4 2
= = = = =
var(1 + 3Z1 + Z2 ) 9var(Z1 ) + var(Z2 ) 9+1 10 5
Question 3
(a) We test
H0 : 1 = :1030 H1 : 1 6= :1030:
The t statistic is given by,
:0894 :103
t= = :418:
:0326
To implement the test, we check whether jtj > 1:64, where 1:64 is the 10%
critical value for a two-sided asymptotically normal test.
(b) “P > jtj”represents the probability that we observe a t statistic whose absolute
value is as large as 2:75 if the null hypothesis 1 = 0 is true.
(c) Because cov(X; u) = 0, we have that
cov(X; u)
1 = 1 + = 1 = :103:
var(X)
Let
^ :103
1
t= :
s:e:( ^ )
1
By the central limit theorem, t is approximately standard normal. (To make
this claim we use that 1 = :103 – this tells us that the mean of t should be
0.) The probability a speci…c classmate rejects is approximately :1. By the law
of large numbers, the percentage of classmates who reject the null hypothesis
should be about 10% (so p 10%).
2
Question 4
(a) advertising is correlated with bonus and bonus has a nonzero e¤ect on Sales.
The coe¢ cient on advertising in the “long” regression with two regressors
captures the direct e¤ect of advertising on sales. The coe¢ cient in the
“short” regression with only one regressor captures this direct e¤ect plus the
proxy e¤ect when advertising acts as a proxy to bonus. The proxy e¤ect
is not zero if (i) advertising is correlated with bonus and (2) bonus has a
nonzero direct e¤ect on Sales (i.e. its coe¢ cient in the long regression is not
zero)
(b) No. In both regressions, advertising could be endogeneous. We might expect
that advertising was targeted to regions thought to have the highest returns
to advertising. These may be regions that simply have a higher number of
sales (regardless of the degree of advertising and bonus awarded). On the other
hand, if the hospital randomly chose which regions to advertise in (which seems
unlikely), then the coe¢ cient of advertising for either speci…cation of sales is
consistently estimating the causal e¤ect of advertising.
(c) Yes. We need to use the omitted variable bias formula,
cov(sales;
^ advertising)
= 2:5 + b ^ bonus ;
var(advertising)
^
where ^ bonus = 2 is the regression coe¢ cient on bonus in the long regres-
sion. This relates the regression coe¢ cient obtained from regressing sales on
only advertising to the “long”regression coe¢ cient obtained from regressing
sales on advertising and bonus. (See the solutions to Problem Set 2, part
D for this formula.)
Plugging in the numbers we get:
2:8 = 2:5 + 2b;
so b = 0:3=2 = 0:15.
Question 5
(a) The causal e¤ect here is the increased likelihood of being promoted attributable
to participating in the training program, holding all else equal.
(b) Since program participation is voluntary and not randomly assigned, we expect
OLS will produce biased estimates of the causal e¤ect of program participation
on promotion. In this case, we might expect there to be an upward bias. Con-
sider employee motivation as a potential omitted variable, that is, employee
motivation become part of all other factors (the error term) that a¤ect the
likelihood of promotion. More motivated employees are likely to work harder
3
and are more likely to be promoted. More motivated employees are also more
likely to seek out the training program. Since the omitted variable has a posi-
tive e¤ect on the outcome variable of interest and is correlated with the causal
variable of interest, we expect an upward bias.
(c) We should NOT consider this as an instrument for program participation. This
is because those employees who are recommended for the program have already
shown potential, which means they may be more likely to be promoted whether
or not they participate in the program. This means that our instrument is
potentially correlated an omitted factor that a¤ects the outcome of interest,
which violates the exogeneity assumption necessary for using IV.
(d) We should consider this as an instrument for program participation. We might
expect that employees who receive the letter will be more likely to participate
in the training program, potentially because they are made aware or reminded
of the program or they just simply follow the CEO’s suggestion. This means
that our instrument is correlated with our independent variable, meaning that
our instrument is relevant. Further, since the letters are sent randomly, we
don’t expect that receiving a letter is correlated with any other factors that
could a¤ect the likelihood of being promoted. This means that the instrument
is exogeneous. With these two assumptions satis…ed, we can be con…dent that
using the instrument will help us retrieve the causal e¤ect.