Old Final Exams for Econometrics BE
Old Final Exams for Econometrics BE
Name: _________________________________________
Midterm Exam
Student Number: _____________
Final Exam
1. Consider the following multiple regression model: yi = ȕ1+ȕ2x i2 + ȕ3x i3+ei . Which of the
following statement about the variance of the least squares estimators is NOT correct?
A. The variance of the least squares estimator of ȕ2 can be reduced by increasing the
sample size.
B. The variance of the least squares estimator of ȕ2 is smaller if the correlation between
x2 and x3 is smaller.
C. The variance of the least squares estimator of ȕ2 is larger if the variance of the errors
is larger.
D. The variance of the least squares estimator is larger if there is more variation in x2
around its mean.
Answer: D
Suppose someone is interested in the relationship between the UK real consumption growth
(¨Ct), real income growth (¨Yt) and the growth in real investment (¨It), and proceed in
estimating the following regression:
(Note that C, Y, and I are in log levels). The estimates, produced via the least squares
estimation method, are presented as follows:
The sample N = 92; the sum of squared residuals Ȉt êt2 = 39.3601; and the standard deviation
of the dependent variable s¨C = 0.8861. The t-statistics are reported in the brackets.
2. To test the overall significance of the regression Model (1), one uses the F-test. Which of
the following set of hypotheses is correct?
A. H0: ȕ1 = 0, ȕ2 = 0, ȕ3 = 0 vs H1: ȕ1 0, ȕ2 0, ȕ3 0.
B. H0: ȕ1 = 0, ȕ2 = 0, ȕ3 = 0 vs H1: at least one of the ȕk is non-zero for k = 1, 2, 3.
C. H0: ȕ2 = 0, ȕ3 = 0 vs H1: at least one of the ȕk is non-zero for k = 2, 3.
D. H0: ȕ2 = 0, ȕ3 = 0 vs H1: ȕ2 0, ȕ3 0.
Answer: C
3. Carry on with the F-test for the overall significance of the regression Model (1), which of
the following gives the correct F-test statistic?
A. 36.28
B. 25.00
C. 24.46
D. 75.00
Answer: A
4. Using a 5% level of significance, the critical value for the F-statistic with the
corresponding degrees of freedom is Fc = 3.099. Which of the following is the correct
conclusion of the F-test for the overall significance of the regression Model (1)?
Answer: B
A. An increase in the values of R2 and adjusted R2 does not always mean that the
additional variable included in a regression model is statistically significant.
B. The value of adjusted R2 can increase or decrease when an additional variable is
included in a regression model.
C. Both statement A and statement B are correct.
D. Statement B is correct but statement A is incorrect.
Answer: C
6. The precision of the estimates for the intercept and slope increases if
Answer: D
7. Which of the following conditions is NOT necessary for OLS estimators to be BLUE?
Answer: D
8. If the PDF (probability density function) has a peak at zero, then this distribution CAN
NOT be
A. a t distribution.
B. a standard normal distribution.
C. an F distribution.
D. Any of the above.
Answer: C
9. Consider the model yi=ȕ1+ȕ2xi+ei. If the 95% confidence interval of ȕ2 is [-0.005, 1.995]
and you do hypothesis testing at two significance levels: (1) H0: ȕ2 = 0 vs H1: ȕ2 0 at a
5% significance level, and (2) H0: ȕ2 = 0 vs H1: ȕ2 0 at a 1% significance level, what is
your conclusion?
Answer: C
10. What distribution does the sum of the squares of m independently distributed
standardized normal random variables follow?
A. A t distribution with m degrees of freedom.
B. A normal distribution.
C. A Chi squared distribution with m degree of freedom.
D. An F(1,m) distribution.
Answer: C
11. Using White standard errors with least squares estimation when heteroskedasticity is
present implies that
A. least squares estimators are BLUE and test statistics are correct.
B. least squares estimators are not BLUE but test statistics are correct.
C. least squares estimators are BLUE but test statistics are incorrect.
D. least squares estimators are not BLUE and test statistics are incorrect.
Answer: B
12. Suppose you are implementing a Lagrange Multiplier test for heteroskedasticity. In order
to calculate the test statistic for this test, you estimate a regression model in which the
dependent variable is
Answer: D
13. Consider the population regression model Yi = ȕ0 + ȕ1Xi + ȕ2Di + ȕ3(Xi × Di) + ei, where
Xi is a continuous variable and Di is a (0, 1) dummy variable. In this model, ȕ2
Answer: B
14. You estimate a model in which you examine the relationship between the dependent
variable which is the natural logarithm (ln) of earnings (Earn is weekly earnings in
euros), and independent variables which are Age and a dummy for whether the individual
is female or not (Female =1 for women, 0 otherwise). The estimation results (with
standard errors se between brackets) are:
Answer: B
15. In the context of a standard linear regression model, you are testing a restricted vs. an
unrestricted model and have imposed 1 restriction to obtain the restricted model. The
value of SSE (the sum of squared least squares errors) for the restricted model is
2683.411 while the value of SSE for the unrestricted model is 1532.084. There are 75
observations in the dataset and 3 independent variables in the unrestricted model. The F-
statistic in this case is
A. 21.258.
B. 53.355.
C. 54.106.
D. 21.278.
Answer: B
Answer: B
17. We estimate a Phillips curve with inflation (INF) related to unemployment (U) and oil
price changes (POIL). The test equation for the serial correlation Lagrange multiplier
(LM) test (with standard errors se between brackets) is:
The number of observations T = 159, R2 = 0.759. Use this information and the statistical
tables. Is there serial correlation (test at a 5% significance level)?
A. Yes, the test statistic is smaller than the critical Chi-square (ʖ2) value.
B. Yes, the test statistic is bigger than the critical Chi-square (ʖ2) value.
C. No, the test statistic is smaller than the critical Chi-square (ʖ2) value.
D. No, the test statistic is bigger than the critical Chi-square (ʖ2) value.
Answer: B
18. Applying GLS when errors follow an AR(1) model to deal with serial correlation
Answer: A
Answer: A
20. Which of the following could be used as a test for autocorrelation up to the third order?
Answer: D
Question 1
a) Construct a 99% confidence interval of the average salary of an unmarried person with
no working experience. (1 point)
b) Perform a t-test on whether the marginal effect of one year increase in the experience on
salary is lower for married people than for unmarried people. Indicate the degrees of
freedom. (2 points)
c) What is the marginal effect of experience on salary for unmarried persons? Interpret this.
For a married person who has 10 years of experience, what is the marginal effect of 1
year increase of experience on his salary? (2 points)
Answers:
b) H0: ȕ4 = 0 vs. H1: ȕ4 < 0. We can read the t-statistic from the output or use (b2-0)/se(b2)
to get t-statistic = -0.73. The degrees of freedom are 996. The critical t-statistic is tc =
t(0.05, +) = -1.645. t> tc. So we do not reject H0.
(0.5 point for the correct hypothesis, 1 point for correct critical t-statistic, 0.5 point for
right conclusion.)
c) For unmarried person it is ȕ2. For an unmarried person, salary increases by 89.24
dollars for every additional year of experience. (1 point for complete answer, 0.5 for
just stating that it is ȕ2).
For the married person (with 10 years experience) it is b2+ b4=0.0427 thousand dollar,
or 42.7 dollar. (1 point). Note that the number of years of experience does not matter
here.
Question 2
Using a data set on hourly wages (Wage), education (Educ) and experience (Exper), you
estimate the following regression:
where Married is a dummy variable (1 for married workers; 0 for unmarried workers).
a) Using a 1% significance level, conduct a formal hypothesis test to test the null
hypothesis that the hourly wages of unmarried workers and married workers are the same
vs. the alternative hypothesis that hourly wage of married workers is higher than of un-
married workers (state the null and alternative hypotheses, test statistic and critical value
and explain your conclusion). See Table 2.1 for the least squares estimation results. (2
points)
b) A colleague of yours says that it is not correct to estimate this model with both married
and unmarried workers together in the same sample. She says that the data should be
split into two subsamples, one for married workers and one for unmarried workers. She
says that the model should be estimated separately for these two subsamples. What could
be the reasoning behind this suggestion? (No calculations required!) (1 point)
c) Your colleague suggests that the error variance in wages may be different for married vs.
unmarried workers. Using computer output (Table 2.2), apply the Goldfeld-Quandt test
to find out whether the error variance in wage is different for married vs. unmarried
workers. Use a 10% significance level. Formulate the hypothesis. Also state in your
10
Table 2.2. Least squares regression results for unmarried (a) and married workers (b)
a)
-> married = 0
(b)
-> married = 1
Answers:
a) The null and alternative hypotheses are H0: ȕ6 = 0 vs. H1: ȕ6 > 0. The test statistic from
the computer output is t = 0.04029/0.03379 = 1.192. The degrees of freedom associated
with the test are 994. In the t table we are given, the value of t(0.99, 50) is 2.403, and the
value of t(0.99, ) is 2.326. Since the test statistic is lower than both of these critical
11
values we conclude that we cannot reject the null hypothesis that the hourly wage of
married and unmarried workers are the same. (0.5 point for the null and alternative
hypotheses, 1 point for test statistic and the critical value and 0.5 point for the correct
conclusion.)
b) Your colleague may be thinking that the error structure is very different for married vs.
unmarried workers. Even if all coefficients are different but the variances are the same, it
is possible to include a dummy for married/unmarried and include interactions of this
dummy with all the remaining RHS variables and estimate them together. So your
colleague suspects that there could be heteroskedasticity; that the variance in the error
term may be different for the two sub-samples. (1 point)
The computer output gives us SSE divided by the degrees of freedom for both models
(under the column MS, alternatively you can take the square of Root MSE), therefore F
= 0.2866/0.2129= 1.346 (alternative would be to calculate the reciprocal with married in
numerator and unmarried in denominator, this would result in an F statistic of 0.743).
Since the F = 1.346 > FU = 1.163 (alternatively, F = 0.743 < FL = 0.860) we reject the
null hypothesis that the error variances are the same for married vs. un-married
individuals. (0.5 point for the null and alternative hypotheses, 1 point for test statistic
and the critical value and 0.5 point for the correct conclusion.)
Question 3
In this question we use monthly data of individual expenditure in purchasing new cars in
America during 1975 – 1991 to estimate the following model:
where:
12
a) Tables 3.1 and 3.2 show the results of Augmented Dickey-Fuller tests. Is PCECARS
stationary? Is D(PFECARS) stationary? Is PFECARS a I(0) series or a I(1) series? (2
points)
b) The researcher shows that the variables are cointegrated and decides to run a regression
in levels. Is that decision warranted? (1 point)
c) Table 3.3 shows the least squares estimation results, and Table 3.4 shows the least
squares estimation results with robust variances and covariances. Do least squares
estimators overstate or understate precision if serial correlation is neglected? (2 points)
Interpolated Dickey-Fuller
Test 1% Critical 5% Critical 10% Critical
Statistic Value Value Value
Interpolated Dickey-Fuller
Test 1% Critical 5% Critical 10% Critical
Statistic Value Value Value
13
Newey-West
PCECARS Coef. Std. Err. t P>|t| [95% Conf. Interval]
Answers:
a) The null of a unit root is not rejected for the test in levels (Table 3.1). The null of a unit
root is rejected for the test in first differences (Table 3.2): PCECARS is not stationary,
D(PCECARS) is stationary. So, PCECARS is I(1). (2 x 0.5 point for stationary and 1
point for the order of integration.)
b) Cointegrated series indicate that there exists a long-run relationship between the variables,
and a regression in levels is allowed. Also as the first stage of an ECM model. (1 point)
c) Comparing Tables 3.3 and Table 3.4 with HAC standard errors shows bigger standard
deviations of the estimates: LS overstates precision. (2 points)
14
Name: _________________________________________
Midterm Exam
Student Number: _____________
1) Consider the following regression model yi = ȕ1 + ȕ2xi + ei. Let b1 and b2 denote the least
squares estimators of ȕ1 and ȕ2, respectively. Assume that b1 and b2 both are normally
distributed. Let the linear combination of the true parameters be ߛ = (ܽߚଵ + ܿߚଶ ). What
is the probability distribution of ߛො = (ܾܽଵ + ܾܿଶ ); where a and c are some known
constants?
a) It follows a standard normal distribution with mean 0 and unit variance.
b) It follows a t-distribution with Ní2 degrees of freedom, where N is the number of
observations.
c) It follows a normal distribution with mean 0 and variance ı2, where ı2 is the variance
of the error.
d) It follows a normal distribution with mean Ȗ and variance var(ߛො).
Answer: d)
2) Use the information from Question 1. Suppose we would like to test the null hypothesis
H0: ߛ = 2 against the alternative hypothesis H1: ߛ ് 2 , where ߛ = (ܽߚଵ + ܿߚଶ ), and
the number of observations N = 25. Which of the following is the correct test statistic?
ఊෝ ିଶ
a) ~ܰ(0,1).
ෝ )
ට(ఊ
ෝ ିଶ
ఊ
b) ~ܰ(0,1); where ıො ଶ is the estimator of the variance of the error.
ඥıො మ
ෝ ିଶ
ఊ
c) ~(ݐଶହ) .
ෝ )
ට(ఊ
Answer: d)
Suppose someone is interested in the relationship between variable y and variables x1, and x2,
and proceeds in estimating the following multiple regression model:
yi = ȕ1 + ȕ2xi1 + ȕ3 xi2 + ei
The estimates, produced via the least squares estimation method, are presented as follows:
The sample size, N = 32. The standard errors of the estimated parameters are reported in the
brackets.
3) Suppose we are interested in testing the null hypothesis H0: ȕ2 = 0.45 against the
alternative hypothesis H1: ȕ2 DWthe 5% significance level. Which one of the
following is the correct test result?
a) Reject H0 as the computed t-statistic is less than t(0.025,29) = í2.045.
b) Reject H0 as the computed t-statistic is less than t(0.025,32) = í2.037.
c) Do not reject H0 as the computed t-statistic is between t(0.025,29) = í2.045 and
t(0.975,29) = 2.045.
d) Do not reject H0 as the computed t-statistic is between t(0.025,32) = í2.037 and
t(0.975,32) = 2.037.
Answer: c)
4) Let t denotes the computed test statistic for ȕ2, and t(k) denotes a t-distributed random
variable with k degrees of freedom. Which of the following is the correct way to compute
the p-value for the hypothesis stated in Question 3)?
a) 1íP[t(k) t].
b) P[t t(k)] + P[t ít(k)].
c) P[t(k) t].
d) None of the above.
Answer: b)
5) Suppose we would like to test if the expected value of y given x1=50 and x2 = 20 is less
than 40. Which of the following statements is correct?
a) The null hypothesis is H0: ߚଵ + 50ߚଶ + 20ߚଷ െ 40 = 0.
b) The null hypothesis is H0: 50ߚଶ + 20ߚଷ െ 40 = 0.
c) The null hypothesis is H0: 50ߚଶ + 20ߚଷ െ 40 < 0.
d) None of the above.
Answer: a)
Answer: b)
Answer: c)
Answer: c)
10) The following graph plots the value of x against the absolute value of residuals.
a) The graph suggests the presence of heteroskedasticity because the residuals are
positive.
b) The graph suggests the presence of homoskedasticity since the variance of the errors
is constant across the sample.
c) The graph suggests the presence of heteroskedasticity because of the trend in x.
d) None of the above.
Answer: b)
11) What would be the consequences for the least squares estimator if serial correlation is
present in a regression model but ignored?
a) It will be biased.
b) It will be inconsistent.
c) It will have the wrong standard error.
d) All of the above.
Answer: c)
12) Suppose that you wish to test for autocorrelation using an approach based on an auxiliary
regression. Which one of the following auxiliary regressions would be most appropriate?
a) ݁௧ଶ = ߙଵ + ߙଶ ݔ௧ + ߩ݁௧ିଵ + ݒ௧ .
ଶ ଶ
b) ݁௧ଶ = ߙଵ + ߙଶ ݔ௧ଵ + ߙଷ ݔ௧ଶ + ߙସ ݔ௧ଵ ݔ௧ଶ + ߙହ ݔ௧ଵ + ߙ ݔ௧ଶ + ݒ௧ .
c) ݁௧ = ߙଵ + ߙଶ ݔ௧ + ߩ݁௧ିଵ + ݒ௧ .
ଶ ଶ
d) ݁௧ = ߙଵ + ߙଶ ݔ௧ଵ + ߙଷ ݔ௧ଶ + ߙସ ݔ௧ଵ ݔ௧ଶ + ߙହ ݔ௧ଵ + ߙ ݔ௧ଶ + ݒ௧ .
Answer: c)
13) An incorrect and possibly spurious regression can be identified by the following rule-of-
thumb:
a) An R2 around 0 and a Durbin-Watson statistic around 2.
b) An R2 around 1 and a Durbin-Watson statistic around 2.
c) An R2 around 1 and a Durbin-Watson statistic around 0.
d) An R2 around 0 and a Durbin-Watson statistic around 0.
Answer: c)
Answer: b)
Answer: a)
You estimate model in which the dependent variable is LN(PRICE/1000), and where:
PRICE is the selling price of the home in dollars,
BEDS and BATHS are the number of bedrooms and bathrooms, respectively,
AGE is the age of the house in years at the time of the sale,
POOL is a dummy variable that is 1 if the house has a pool and 0 otherwise,
LGELOT (large lot) is a dummy variable that is 1 if the house is on a lot of land that is larger
than 0.5 acres (large lot) and 0 otherwise (regular lot).
LIVEAREA is the living area of home (in hundreds of square feet),
LGELOT x LIVAREA is the interaction term between LGELOT and LIVAREA.
Two models were estimated and the estimation results with standard errors in brackets are
given below.
Model 1 Model 2
Variable
LIVAREA 0.0539 0.0589
(0.0017) (0.0019)
BEDS -0.0382 -0.0480
(0.0114) (0.0113)
BATHS -0.0103 -0.0201
(0.0165) (0.0164)
LGELOT 0.2531 0.6134
(0.0255) (0.0632)
AGE -0.0013 -0.0016
(0.0005) (0.0005)
POOL 0.0787 0.0853
(0.0231) (0.0228)
LGELOT x LIVAREA -0.0161
(0.0026)
INTERCEPT 3.986 3.9649
(0.0373) (0.0370)
Answer c)
Answer b)
Answer a)
Answer c)
20) You are asked to test the joint significance of all the qualitative factors in Model 1 using
an F test. The number of restrictions is
a) 2.
b) 1.
c) 3.
d) Depends on the size of the sample.
Answer a)
Question 1.
Consider the following model for selling price of houses:
PRICE = ȕ1+ȕ2 TRADITIONAL+ ȕ3 FIREPLACE+ ȕ4 TRADITIONAL x FIREPLACE+e,
where PRICE is in US dollars, TRADITIONAL is an indicator variable indicating whether
the house is of traditional style (TRADITIONAL = 1) or not (TRADITIONAL = 0), and
FIREPLACE is an indicator variable indicating whether the house has a fireplace
(FIREPLACE = 1) or not (FIREPLACE = 0) .
Use the estimation output in Table 1.1 to answer this question.
(a) What does the constant term of 109415.2 mean in the regression output? (1 point)
It means that the average selling price of a non-traditional style house without fireplace is
109415.2 dollars.
(1 point for the correct answer.)
(b) A real estate agent who sells non-traditional houses claims that installing a fireplace
will increase the selling price while you think it will make no difference. Test the
claim of the real estate agent at a 5% significance level giving all steps. Indicate the
degree of freedom of this test. (2 points)
H0: ȕ3 = 0 vs. H1: ȕ3 > 0.
We can read the t-statistic from the output or use (b3í0)/se(b3) to get a t-statistic of 8.81. The
degrees of freedom are 1080í4 = 1076.
The critical t-statistic is tc = t GI ) = 1.645. Since t > tc, we reject H0 and find support for
the claim of the real estate agent that installing a fireplace increases the selling price.
(0.5 point for the correct hypothesis, 0.5 point for correct critical t-statistic, 1 point for right
conclusion.)
(c) Calculate the expected average price of traditional style houses in the sample with a
fireplace and without a fireplace. Formulate a null hypothesis and an alternative
hypothesis to test if the price of a traditional style houses with a fireplace is higher.
Write down the expression for the test statistic. You do not need to calculate the test
statistic. (2 points)
The expected average price of the traditional style houses with a fireplace is
b1+ b2+ b3+ b4 = 172,332,6 dollars and the expected average price of the traditional style
houses without a fireplace is b1+b2 = 112,976.6 dollars.
To test if the price of a traditional style house is higher: H0: ȕ3+ ȕ4=0 vs. H1: ȕ3+ ȕ4>0
we calculate the test statistic as: t = (b3+ b4 >Y۲U b3 Y۲U b4)+2xcov(b3, b4)]0.5
(0.5 point for expected average price with fireplace, 0.5 point for expected average price
without a fireplace, 0.5 for correct hypotheses, 0.5 for correct test statistic.)
Question 2.
Using state level data from the United States, you estimate a model in which the dependent
variable is the percentage of votes for a certain candidate A (VOTESA) as a function of
percentage of votes for president (PRTYSTRA), an indicator variable for whether A is a
democrat or not (DEMOCA = 1 for a democrat, and 0 otherwise), the logarithm of the
expenditures of the party of candidate A (LEXPENDA = LOG(EXPENDA)) and the (natural)
logarithm of the expenditures of the party of candidate B (the opposing candidate) denoted
LEXPENDB = LOG(EXPENDB). Use the estimation output in Table 2.1 to answer this
question.
(a) Interpret the estimated coefficient of DEMOCA. Is the effect of this variable
statistically different from zero at the 2% significance level? (1 point)
The interpretation of the coefficient of DEMOCA is that a democratic candidate gets 3.793 %
more votes than a non-democratic candidate. The value of the test statistic t = 2.70. To be
conservative, we take the critical value of the t statistic at a 2% significance level to be
t(0.99,50) which is 2.403. Therefore, we can conclude that the effect of this variable is
statistically different from zero at the 2% significance level.
(0.5 point for interpretation, 0.5 point for statistical significance)
(b) You suspect heteroskedasticity and conduct a White test (see Table 2.2 in the
estimation output). State the null and alternative hypothesis of the White test. What is
your conclusion based on the output using a significance level of 1%? (2 points)
The White test for heteroskedasticity tests Ho: variance of error term is constant
(homoskedasticity) vs. H1: variance of error term is not constant (heteroskedasticity).
Based on the p-value associated with the chi square statistic, we can reject the null hypothesis
of homoskedasticity.
(1 point for the hypotheses, and 1 point for the conclusion.)
(c) In the output file (Table 2.3) alternative estimation results are provided based on a
weighted least squares method. What problem is being addressed by this method and
what effect does this method have on the estimators? Compare the estimations results
in Table 2.3 with the ones from Table 2.1. (2 points)
Weighted least squares solves for heteroskedasticity. If the weights are correctly specified,
the estimators are BLUE. The estimation results in Table 2.3 show that coefficient estimates
and standard errors are different from the ones in Table 2.1.
(1 point for explanation of method, 1 point for comparison of estimates and standard errors.)
Question 3.
In this question we use a dataset on US monthly data on individual expenditures on
purchasing new cars, and other variables measured in 1975–1991. The variables are as
follows:
PCECARS: Individual expenditures in purchasing new cars (billion USD).
POP: The US population (million people).
PCDPY: Average personal income (thousand USD).
CPINEW: Consumer price index for new cars.
(a) After estimating the regression equation
PCECARSt = ȕ1 + ȕ2 PCDPYt + ȕ3CPINEWt + ȕ4POPt + et
(the regression results are in Table 3.1), you run the Breusch-Godfrey Serial Correlation
LM Test (the results are in Table 3.2). How many degrees of freedom does the Ȥ2 test
have? What is the null hypothesis of the Breusch-Godfrey Serial Correlation LM Test? (1
point)
Table 3.2 tests for first-order serial correlation: The Ȥ2-test has one degree of freedom. The
null hypothesis is no serial correlation.
(0.5 point for correct degrees of freedom, 0.5 point for correct statement of null hypothesis.)
(b) Use a 5% significance level. What is your conclusion from Table 3.2? Is there serial
correlation? If yes, what are the implications? (2 points)
The critical value of the chi squared distribution with one degree of freedom (95th percentile)
is 3.841. Since the test statistic 130.615 is larger than 3.841, the null of no serial correlation is
clearly rejected. This can also be rejected on the basis of the p-value (0.00 < 0.05).
This implies that standard errors and test statistics are not correct, and most likely least
squares overstates precision.
(1 point for conclusion, 1 point for implications.)
(c) Tables 3.3 and 3.4 show the results of Augmented Dickey-Fuller tests on PCDPY and
D(PCDPY). Is PCDPY stationary? Is D(PCDPY) stationary? Is PCDPY an I(0) series or a
I(1) series? (1 point)
The null of a unit root is not rejected for the test in levels (Table 3.3). The null of a unit root
is rejected for the test in first differences (Table 3.4): PCDPY is not stationary, D(PCDPY) is
stationary. So, PCDPY is I(1).
(0.5 point for stationarity of PCDPY and D(PCDPY), 0.5 point for the order of integration.)
(d) Suppose that the results of the Augmented Dickey-Fuller tests for all other variables in
the model in part (a) are the same as for PCDPY in part (c). What does this imply for the
specification of the model in part (a)? Is it allowed to specify the model in levels? (1 point)
The ADF tests suggest that the model should be specified in first differences, or in levels if
all I(1) variables are cointegrated.
(0.5 point for first differenced form, and 0.5 point for referring to cointegration.)
10
Name:
Final Exam
Econometrics for Business Economics [EBB061A05],
Economics [EBB814A05] and International Economics and
Business [EBB070A05] 2018-2019
Instructions:
1. Please answer all 20 Multiple Choice Questions in Part I by selecting the most appro-
priate answer. Use the computer sheet provided and follow the instructions provided
on the computer sheet.
2. Answer all three Open Questions in this exam booklet. You can earn 20 points for the
open questions.
3. Please refer to the computer output to answer the Open Questions.
4. You are required to submit all materials after completing this examination.
5. You are not allowed to use a graphical calculator but only a single line calculator. The
types allowed are Casio fx-82ES (PLUS) or the Casio fx-82MS as in Mathematics and
Data Analysis from your first year.
6. Please do not write on the tables and formula sheets as we would like to re-use them.
The tables may not include critical statistics for all degrees of freedom. Choose the
most appropriate degrees of freedom.
7. You are not allowed to visit the toilet during this exam.
ÿ
v ot eA = 45.07 + 6.08l e x pendA 6.61le x pendB + 0.15pr t yst rA
where vot eA is the percentage of the vote received by Candidate A, lex pendA and le x pendB
are the logarithm of campaign expenditures by Candidates A and B and pr t yst rA is a mea-
sure of the party strength for Candidate A.
Question 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
The interpretation of the coefficient for β1 is as follows:
A. ∆vot eA ⇡ (β1 /100)(%∆ex pendA)
B. %∆vot eA ⇡ (β1 /100)(%∆e x pendA)
C. ∆vot eA ⇡ (β1 · 100)(%∆e x pendA)
D. ∆vot eA ⇡ (β1 /100)(∆e x pendA)
Question 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
The estimates imply the following:
A. A 10 unit ceteris paribus increase in spending by candidate A increases the
predicted share of the vote going to A by about 0.608 percentage points.
B. A 1% ceteris paribus increase in spending by candidate A increases the pre-
dicted share of the vote going to A by about 6.08 percentage points.
C. A 10% ceteris paribus increase in spending by candidate A increases the pre-
dicted share of the vote going to A by about 0.608 percent.
D. A 10% ceteris paribus increase in spending by candidate A increases the
predicted share of the vote going to A by about 0.608 percentage points.
Question 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
Suppose we want to test the null hypothesis H0 : β1 + β2 = 0. Which of the following
statements is true?
A. An equivalent null hypothesis to the one given above would be H0 : β1 = β2 .
B. If the null is true, then a z% increase in expenditure by A and a z% increase in
expenditure by B leaves vot eA unchanged.
C. We would need the standard error of βˆ1 + βˆ2 to test the hypothesis.
D. All of the above.
Page 2
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
Question 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
A researcher estimates the following regression on a sample of students:
where GPAi is the grade point average, femalei is a dummy for female students and
Groningeni is a dummy for students living in Groningen. The average difference be-
tween the GPA of female students living in Groningen and male students living in
Groningen is
A. 0.07.
B. 0.01.
C. 0.03.
D. 0.02.
Question 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
A researcher is interested in estimating the impact of study time on passing rates and
estimates the following linear probability model
÷
p assi = 0.6 + 0.02studytimei ,
where passi is a dummy variable, which takes a value of 1 if the student passes the
exam, and zero otherwise. Studytime is the number of hours spend studying per week.
Which of the following statements is correct?
A. An extra hour of studying per week is associated with a 2 percent increase in
the probability of passing the course.
B. An extra hour of studying per week is associated with a 0.02 percentage point
increase in the probability of passing the course.
C. An extra hour of studying per week is associated with a 0.02 percent increase
in the probability of passing the course.
D. An extra hour of studying per week is associated with a 2 percentage point
increase in the probability of passing the course.
Question 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
What is not a potential source of endogeneity?
A. Measurement error in a dependent variable.
B. Reverse causality.
C. Unobserved heterogeneity.
D. Selection bias.
Question 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
What is the consequence of having measurement error in one of the independent vari-
ables?
A. Attenuation bias.
B. Reverse causality.
C. Model misspecification.
D. Overly precise standard errors.
Page 3
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
Question 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
The problem of unobserved heterogeneity stems from
A. having a small sample with too little variation.
B. having independent variables that affect the outcome variable that are not
observable by the researcher.
C. having heterogeneous variances of the error terms.
D. having heterogeneous coefficients in the linear regression model.
Question 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
A researcher is interested in the effect of having breakfast on the test day (breakfast)
on students’exam scores. Therefore the researcher decides to run a randomized exper-
iment, and randomly assigns students into two groups: treatment group and control
group. The researcher asks students in the treatment group to have breakfast on the
exam day, and the researcher also ask students in the control group not to have break-
fast on the exam day. When the researcher collects data on the randomized experiment,
she realizes that some student in the control group actually had breakfast. When the
researcher compares the outcomes of students who had breakfast to those who did not
have breakfast,
A. the estimate captures the causal effect of having breakfast on the exam results.
B. the estimate is likely to suffer from selection bias.
C. the estimate suffers from attenuation bias stemming from measurement error.
D. the estimate will be close to zero, since students were randomly sorted into
treatment and control groups.
Question 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
In the distributed lag model, the coefficient on the contemporaneous value of the re-
gressor is called the
A. impact propensity.
B. dynamic propensity.
C. cumulative propensity.
D. autoregressive propensity.
Question 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
The long-run propensity
A. is the coefficient on X t r in the standard formulation of the distributed lag
model.
B. is the sum of all individual propensities.
C. is the difference between the coefficient on X t 1 and X t r .
D. is the product of all individual propensity.
Question 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
cov(ui t , uis |X i t , X is ) = 0 for period t 6= s means that
A. there is no cross-correlation between units.
B. conditional on the errors, the regressors are uncorrelated over time.
C. there is no perfect multicollinearity in the errors.
D. conditional on the regressors, the errors are uncorrelated over time.
Page 4
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
Question 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
In the Fixed Effects regression model excluding the intercept, using (n 1) binary firm-
indicator variables for a sample of n firms, the coefficient of the binary variable for firm
i indicates
A. the difference in fixed effects between the i-th and the omitted firm.
B. the response in the dependent variable to a percentage change in the binary
variable.
C. will be either 0 or 1.
D. the level of the fixed effect of the i-th firm.
Question 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
The most important advantage of using panel data over cross sectional data on firms is
that it
A. allows you to analyze behaviour across time but not across firms.
B. allows you to control for some types of observable variables that are constant
over time.
C. allows you to study long-run trends.
D. allows you to control for some types of omitted variables without actually
observing them.
Question 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
Sample selection bias could occur
A. If the availability of the data is influenced by a selection process that is related
to the value of the independent variables.
B. If the choice between two samples is made by the researcher.
C. Because of the fact that we do not observe the entire population.
D. If the availability of the data is influenced by a selection process that is
related to the value of the dependent variable.
Question 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
The difference between an unbalanced and a balanced panel is that
A. the impact of different regressors are roughly the same for balanced but not
for unbalanced panels.
B. the magnitude of the intercept is meaningful only in balanced panels but not
in unbalanced panels.
C. you cannot have both fixed time effects and fixed unit effects regressions.
D. an unbalanced panel contains missing observations for at least one time
period or one unit.
Page 5
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
Question 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
Consider estimating the effect of beer tax on the traffic fatality rate using data from the
United States, using time and state fixed effect for the Northeast Region (Maine, Ver-
mont, New Hampshire, Massachusetts, Connecticut and Rhode Island) for the period
1991-2001. If Beer Tax was the only explanatory variable, how many coefficients would
you need to estimate, excluding the constant?
A. 7.
B. 16.
C. 17.
D. 18.
Question 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
A static model is postulated when:
A. a change in the independent variable at time t is believed to have an effect on
the dependent variable at period t + 1.
B. a change in the lagged independent variable is believed to have an effect on
the dependent variable for time t .
C. a change in the independent variable at time t does not have any effect on the
dependent variable.
D. a change in the independent variable at time t is believed to have an con-
temporaneous effect on the dependent variable.
Question 19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
The sample size for a time series data set is the number of:
A. variables being measured.
B. time periods over which we observe the variables of interest less the number
of variables being measured.
C. time periods over which we observe the variables of interest plus the number
of variables being measured.
D. time periods over which we observe the variables of interest.
Question 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
A variable is not suitable to be an instrumental variable if:
A. it is correlated with the endogenous variable.
B. conditional on the endogenous variable, it is correlated with the outcome.
C. it is uncorrelated with the error term.
D. it does not have a direct effect on the outcome.
Page 6
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
Reply to the sub-points of each question by using exclusively the space within boxes. The points
assigned to each sub-question are reported between the brackets.
Question 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Based on Census data from the United States the following model was estimated to
examine the influence of socio-economic variables on earnings of men, where wage are
monthly earnings, and educ is years of schooling, e x per is the overall years of experi-
ence at the labor market, t enur e is the years at the current employer, and mar r ied, black, south
and ur ban are all dummy variables defined in the usual way.
(a) (2 points) Conduct a t test to test the null hypothesis that the coefficient of educ is
equal to 0.05 against the alternative that it is greater than 0.05 at a 1% significance
level. Explain your answer giving all steps.
β̂ 0.05
Solution: The t-test for a one-sided t-test is given by t = se(
k
β̂k )
= 0.0654 0.05
0.0063 =
2.44
The associated critical value is t (0.01,1) = 2.326.
Hence, the coefficient on educ is statistically larger than 0.05 at a 1% signifi-
cance level.
Page 7
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
(b) (2 points) Which of the classical assumptions is likely to be violated in this equa-
tion? Argue why a ’proxy’ variable IQ which measures the IQ of the individual
would alleviate the problem. Is the coefficient of educ likely to increase or decline,
if IQ is added to the regression?
(c) (4 points) Dr. Strangelove suggests to employ a 2SLS method. He proposes to use
the number of siblings as an instrument for education. Briefly explain the 2SLS
method, state the conditions a variable should satisfy to be an appropriate instru-
mental variable and argue whether they are met with the proposed instrument.
(d) (1 point) Holding other factors fixed, what is the approximate difference in monthly
salary between blacks and nonblacks? Is the difference statistically significant?
Solution: The coefficient on black implies that, at given levels of the other ex-
planatory variables, black men earn about 18.8% less than nonblack men. The
t statistic is about –4.95, and so it is very statistically significant
(e) (1 point) Describe R2 , and explain what it means in the context of this question.
Solution: R-squared is a measure of how well the model can account for the
variation of the dependent variable. A value of 0.253 means that 25.3% of the
variation of l og(wage) around its mean can be explained by the model.
Page 8
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
Question 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Consider the following extension to the model outlined in Question 1, where e x per2
and t enur e2 are squared variables of e x per and t enur e, respectively.
(a) (2 points) The R2 is now higher than before. Would you therefore argue that the
two additional variables should be included in the regression? Explain why or
why not.
Page 9
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
(b) (3 points) You now run two tests: one on t enur e2 and ex per2, and one on e x per
and e x per2.
Explain what these tests are doing and analyze the results shown. Compare the
results with the P > |t| value (see table 2) of the single coefficients.
Page 10
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
Question 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Consider the following extension to the model outlined in Question 1, where an inter-
action term is included: blacksingl e is an interaction term that is the product of black
and singl e, where singl e is a dummy variable which is 1 when the person is single.
Figure 3: OLS Estimation with Log Wage as dependent variable including interaction term
(a) (3 points) What is the estimated wage differential between single blacks and mar-
ried non-blacks?
Solution: Single blacks means we have singl e = 1 and black = 1 implying that
we need to add the coefficients: 0.0614 0.1794 0.1889 = 0.43
Salaries for single blacks are, on average, 43% lower, holding all other factors
constant, compared to married non-blacks.
(b) (2 points) Can you say whether the wage differential between single blacks and
married non-blacks is statistically significant? Argue.
Page 11
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
Name:
Resit
Econometrics for Business Economics [EBB061A05],
Economics [EBB814A05] and International Economics and
Business [EBB070A05] 2018-2019
Instructions:
1. Please answer all 20 Multiple Choice Questions in Part I by selecting the most appro-
priate answer. Use the computer sheet provided and follow the instructions provided
on the computer sheet.
2. Answer all three Open Questions in this exam booklet. You can earn 20 points for the
open questions.
3. Please refer to the computer output to answer the Open Questions.
4. You are required to submit all materials after completing this examination.
5. You are not allowed to use a graphical calculator but only a single line calculator. The
types allowed are Casio fx-82ES (PLUS) or the Casio fx-82MS as in Mathematics and
Data Analysis from your first year.
6. Please do not write on the tables and formula sheets as we would like to re-use them.
The tables may not include critical statistics for all degrees of freedom. Choose the
most appropriate degrees of freedom.
7. You are not allowed to visit the toilet during this exam.
Question 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
When an estimator is consistent,
A. the coefficient estimates will be as close to their true values as possible for
small and large samples.
B. on average, the estimated coefficient values will equal the true values.
C. the least squares estimator is unbiased and no other unbiased estimator has a
smaller variance.
D. the estimates will converge to the true values as the sample size increases.
Question 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
Assume that you have the following estimated model:
’q = 2.25
log 0.7 log pi + 0.02inci ,
i
where pi is the price and qi is demanded quantity of a certain good and inc i is the
income in thousand dollars.
The interpretation of the coefficient of log p in the above equation is:
A. If the price increases by 1%, the demanded quantity will be 0.7% lower on
average, ceteris paribus.
B. If the price increases by 1%, the demanded quantity will be 0.007% lower on
average, ceteris paribus.
C. If the price increases by 1%, the demanded quantity will be 70% lower on
average, ceteris paribus.
D. None of the above.
Page 2
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
Question 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
Assume that you have the following estimated model:
’q = 2.25
log 0.7 log pi + 0.02inci ,
i
where pi is the price and qi is demanded quantity of a certain good and inc i is the
income in thousand dollars. The interpretation of the coefficient of inci in the above
equation is:
A. If the disposable income increases by a thousand dollar, the demanded
quantity will be 2% higher on average, ceteris paribus
B. If the disposable income increases by a thousand dollar, the demanded quan-
tity will be 0.02% higher on average, ceteris paribus.
C. If the disposable income increases by a thousand dollar, the demanded quan-
tity will be 0.0002% higher on average, ceteris paribus.
D. None of the above.
Question 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
Consider the least squares estimator for the standard error of the slope coefficient.
Which of the following statements are true?
Question 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
Consider the following linear regression model:
y i = β0 + β1 x i + u i .
Page 3
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
Question 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
A violation of the homoskedasticity assumption occurs if
A. the variance of the error term is individual-specific.
B. there is correlation of the error term between observations.
C. the variance of the error term depends on one of the covariates.
D. any of the above conditions hold.
Question 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
Consider the following linear regression model:
y i = β0 + β1 x i + u i .
If the covariance between y and x is positive, then the sign OLS estimate for β1
A. is negative.
B. is positive.
C. is zero.
D. depends on other factors as well.
Question 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
Consider the following model for average monthly rainfall measured in millimeters
(mm):
€ i = 100
rainfall 0.9temperaturei ,
where temperaturei denotes the monthly average temperature measured in Celsius de-
grees. What is the effect of a 10 Farenheit increase in temperature on rainfall? The
relationship between Farenheit (F) and Celsius (C) is F = 1.8C + 32.
A. A 10 Farenheit increase in temperature is associated with 5 mm less rainfall.
B. A 10 Farenheit increase in temperature is associated with 16.2 mm less rain-
fall.
C. A 9 Farenheit increase in temperature is associated with 5 mm less rainfall.
D. A 10 Farenheit increase in temperature is associated with 32 mm less rainfall.
Page 4
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
Question 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
A researcher is interested in estimating the causal effect of class size (Class sizei ) on
students’ GPA (GPAi ). Therefore, the researcher specifies the following model:
The researcher knows that schools sort students of higher academic ability in smaller
classes. The OLS estimate of β1 is likely
A. to underestimate the true causal effect of class size.
B. to overestimate the true causal effect of class size.
C. to capture the true causal effect of class size.
D. to be too imprecise to capture the true causal effect of class size.
Question 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
A researcher compares the standard error of the OLS estimates of β1 of the following
two models:
yi = β0 + β1 x 1i + ui , (Short)
yi = β0 + β1 x 1i + β2 x 2i + ui . (Long)
Long
The corresponding estimates are β̂1Short and β̂1 . Which of the following statements is
true?
Long
A. Var(β̂1Short ) < Var(β̂1 ).
Long
B. Var(β̂1Short ) = Var(β̂1 )
Long
C. The relation between Var(β̂1Short ) and Var(β̂1 ) depends on the covariance
between x 1i and x 2i .
Long
D. If β2 = 0, then Var(β̂1Short ) = Var(β̂1 ).
Question 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
A researcher estimates the following regression on a population of workers for the
cohorts born between 1950 and 1960:
Age is measured in years, and the variable yeari measures the year when the wage was
reported. What is the average difference between the wages of workers born in 1953
and 1954, holding age constant?
A. β2 .
B. β1 .
C. β1 + β2 .
D. 0.
Page 5
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
Question 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
A researcher is interested in the returns to schooling, i.e., the effect of an extra year
of schooling on log wages. The researcher obtains the estimate of 0.12 with a 95%
confidence interval of [0.02, 0.22] on a sample of 1,000 workers. Which of the following
statements is true?
A. At a 10 percent significance level, we cannot reject the null that the returns to
schooling is zero.
B. At a 1 percent significance level, we reject the null that the returns to schooling
is zero.
C. At a 5 percent significance level, we reject the null that the returns to school-
ing is zero.
D. At a 5 percent significance level, we reject the null that the returns to schooling
is 0.10.
Question 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
A researcher estimates the following model
and is interested in testing the null hypothesis β1 = β2 = 0. What is the restricted model
that the researcher has to estimate to perform an F-test?
A. wagei = β0 + ui .
B. wagei = β0 + β1 ag ei + β3 f emal ei + ui .
C. wagei = β0 + β3 f emal ei + ui .
D. wagei = β0 + β1 ag ei + β2 a gei2 + β3 f emal ei + ui .
Question 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
A researcher is interested in testing whether x 1i and x 2i are jointly significant in the
following linear regression using an F-test:
yi = β0 + β1 x 1i + β2 x 2i + ui .
Page 6
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
Question 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
A researcher estimated the following regression:
€
log wagei = 8 + 0.2agei 0.01femalei ⇥ agei ,
where femalei is a dummy variable for being a female worker. What is the difference
between the average wages of male and female workers at the age of 22?
A. 1 percent.
B. 20 percent.
C. 19 percent.
D. 22 percent.
Question 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
A researcher is interested in testing whether the effect of experience on wages is differ-
ent between men and women. Therefore, the researcher decides to estimate a model on
the full sample, as well as two separate models on the subsample of men and women,
separately. Which of the following tests can be used to test whether there is gender
difference in the effect of experience on wages?
A. Hausman test.
B. Breusch-Pagan test.
C. Chow test.
D. White test.
Question 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
A weak instrument
A. biases severely the IV estimate
B. always yields better estimates than OLS.
C. cannot be used to estimate the first-stage regression.
D. is weakly correlated with the error term.
Question 19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
A researcher is interested in estimating the supply function of housing, therefore col-
lects data on the number of houses and average prices in each Dutch municipality. The
researcher specifies the following supply function:
Page 7
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
Question 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 point
A natural experiment
A. usually takes place in the field, where the researcher can randomly assign
treatment.
B. studies the effects of natural disasters, such as earthquakes or hurricanes.
C. tests the effect of treatments in a laboratory environment.
D. studies policy reform where some people were affected and others were
not.
Page 8
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
Reply to the sub-points of the question by using exclusively the space within boxes. The points
assigned to each sub-question are reported between [square brackets].
Consider data on 32,000 married black or married Hispanic women. The data contains
information about the children (kidcount is the number, samese x is an indicator for hav-
ing two children with the same sex, mul t i2nd is an indicator for having twins at the sec-
ond birth), earnings (l abinc is labor income, hours is weekly hours worked), and socio-
economic characteristics (educ is years of schooling, a ge is age of the individual, a ge f st m
is the age at first birth, black,hispan are dummy variables)
(a) You derive the following descriptive statistics for hours and educ . Calculate the esti-
mates βb0 and βb1 in the equation hours
◊ = βb0 + βb1 educ .[2 Points]
Answer:
cov(educ,hours) 11.7669
β1 = var(educ) = 10.92432 = 1.077
β0 = ȳ β1 x̄ = 21.22011 1.077 · 11.00534 = 9.366
Page 9
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
Answer:
1021907.1
R2 = 12111869.6 = 0.084 R-squared is a measure of how well the model can account
for the variation of the dependent variable. A value of 0.084 means that 8.4% of
the variation of hours around its mean can be explained by the model
(c) Write down the t -statistic to test whether the absolute value of the coefficient for
kidscount is larger than 3.0 on a α = 0.1 significance level and evaluate the test. State
the most appropriate critical value that you use.[3 Points]
Answer:
β̂k 3.0
The t-test for a one-sided t-test is given by t = se( β̂k )
= 3.129 3.0
0.1198 = 1.077
The associated critical value is t (0.1,1) = 1.282.
Hence, the hypothesis that the coefficient on kidscount is statistically larger than
3.0 cannot be rejected at a 10% significance level.
Sandor: Is the sign-definition clear here? Should I take the negative value (see
commented-out version)?
Page 10
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
(d) Compute a 90%-confidence interval for the coefficient of black. According to your
result, is the coefficient statistically different from zero?[2 Points]
Answer:
Confidence interval:
Note: since the zero lies within the confidence interval we cannot reject the null
that the coefficient is zero on a 10% significance level.
(e) Which of the classical assumptions is likely to be violated in this equation? Argue by
describing a potential example for endogeneity for the problem at hand.[3 Points]
Answer:
Zero conditional mean assumption is violated: kidscount is correlated with the
error term.
Here endogeneity of kids might be due to the fact that women with less kids have
a higher preference for working in general whereas women with kids (controlling
for other factors) have a stronger preference for staying at home and work less.
Not accounting for this endogeneity in the estimation might bias the coefficient
for kidscount upwards in absolute terms (exaggerate the effect of children on
work), because this preference is taken up by the coefficient.
Page 11
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
The following tables presents results from a 2SLS regression. Note, that mul t i2nd is a
dummy variable indicating whether the second birth is a twin.
Page 12
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
(f) Briefly explain the 2SLS method (2 sentences are enough). Then, state the conditions
needed for an instrument and argue whether they are met with the proposed instru-
ment (use the results depicted in the tables above, if necessary). [4 Points]
Answer:
In 2SLS you use (exogenous) instrument to predict the (endogenous) indepen-
dent variable x (first-stage). The predicted value of this equation is then used in
the second-stage to estimate the dependent variable y.
Conditions:
• Looking at the coefficient of mul t i2nd in the first stage confirms relevance:
the coefficient is meaningful (having a twin is increasing the number of kids
by 0.8) and it is significant:
p a t -statistic of 14.95 is larger than the Stock-Yogo
rule of thumb of 10 = 3.2
• There is not a general test for this so we must argue: Apparently, it should be
very difficult to influence the likelihood to have twins (other than artificial
insemination which results in a higher probability to get twins), hence the
instrument should be uncorrelated with the error term.
(g) Compare the coefficients of kidcount of the OLS and the 2SLS regression. Comment
Page 13
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])
lOMoARcPSD|15344507
• The coefficient on kidcount is lower in the 2SLS compared to the OLS im-
plying that the negative effect of the number of children on hours worked
is reduced when accounting for endogeneity via the IV approach.
• The IV approach accounts for this difference and the coefficient is more
likely to be the causal effect of the number of kids on hours worked
Page 14
Downloaded by Etsegenet Tafese (etsegenettafesse12@[Link])