MAASAI MARA UNIVERSITY
UNIVERSITY EXAMINATIONS 2022/2023
FOURTH YEAR SECOND SEMESTER EXAMINATIONS FOR THE DEGREE OF
BACHELOR OF SCIENCE IN ECONOMICS AND STATISTICS
ECO 4220: LINEAR MODELLING
DATE: APRIL 2023 TIME: 2 HOURS
INSTRUCTIONS TO CANDIDATES:
1. Answer questions ONE (section A) and any THREE questions in section B
SECTION A
QUESTION ONE {25 MARKS}
a) The government of a developing country wants to implement a program where poor
families receive food stamps that can be used to purchase prepackaged foods with high
nutritional value. The government decides to set up an experiment where 500 families
(each with 1 child) are randomly assigned to a treatment group (eligible for food stamps,
Ti = 1) and to a control group (ineligible for food stamps, Ti = 0). The government has
hired a researcher to investigate the effect of food stamps on the probability that a child
has poor health. After the experiment the researcher performs a regression of Hi (a binary
variable that equals 1 if a child has poor health) on Fi (a binary variable that equals 1 if a
family received food stamps). She obtains the following OLS estimation results.
Page 1 of 7
i. Interpret the two estimated coefficients. (2 marks)
ii. The researcher finds out that some of the families in the control group received food
stamps. Explain whether we can interpret the estimated OLS coefficient on F as the
causal effect of food stamps on child health? (3 marks)
iii. The researcher decides to estimate the effect of food stamps using an instrumental
variable approach. She uses assignment to the treatment group as instrument for the
actual receipt of food stamps. She obtains the following first stage estimation results.
Do you think that the instrument relevance condition holds? Is T a weak instrument?
(3 marks)
iv. Do you think that the instrument exogeneity condition holds? (2 marks)
v. The following table shows the averages of Hi and Fi for those assigned to treatment
group (Ti = 1) and for those assigned to the control group (Ti = 1). Use the results in
the table below to obtain the instrumental variable estimate of the effect of food
stamps on the probability that a child has poor health. (3 marks)
b) With standard notation, the multiple linear regression model is given by and
where is a non-stochastic design matrix.
i) Derive the least squares estimator ̂ of and show that it is given by
̂ (4 marks)
ii) Show that ̂ is an unbiased estimator of and obtain its variance-covariance
matrix. (3 marks)
iii) Show that estimator of is given by ̂ ̂ ̂ is
unbiased. (3 marks)
Page 2 of 7
c) A simple linear regression model is given by the following equation
Such that
i) Compute the least squares estimate of (2 marks)
̅
ii) Suppose that ̃ ̅
is another estimator of . Show that ̃ is an unbiased
estimator of (3 marks)
SECTION B
QUESTION TWO { 15 MARKS}
The Norwegian government wants to know whether restricting the opening hours of liquor
stores reduces alcohol consumption. Holger, an employee of Statistics Norway, is asked to
investigate this research question. He uses panel data for n = 60 municipalities observed in
T = 10 time periods. The data set contains information on per capita alcohol consumption
(in liters per year) in municipality i in year t (alcoholit) and on the number of hours that
liquor stores were open during year t in municipality i (hoursit). Holger estimates
by OLS and obtains the following estimation results.
a) Test the null hypothesis that = 0 at a 1% significance level. (3 marks)
b) Use the above estimation results to predict the change in alcohol consumption if the
opening hours of liquor stores are reduced by 20 percent. (2 marks)
c) Marit, Holger’s colleague, suggests to augment the model with municipality fixed effects
Page 3 of 7
Explain how you could estimate model (1) (5 marks)
d) Holger decides to estimate a model that includes both municipality and year fixed effects.
Both Holger and Marit are confident that by including municipality and year fixed
effects the estimated coefficient on (hoursit) cannot suffer from omitted variable bias
problems. Do you agree with Holger and Marit? (5 marks)
QUESTION THREE {15 MARKS}
The data given in Appendix R.1 represents the impact of three advertising media- YouTube,
Facebook and Newspaper, on sales. The values for all the four columns are given in millions of
Kenyan shillings (KSHs). The advertising experiment was repeated 200 times, hence there are
200 data values for each column but Appendix 1 shows the first 6 observations.
a) Based on the scatter plots in Appendix R.2, comment on the relationship between (both
strength and direction) each of the three media channels with sales. (3 marks)
b) Using the R Output in Appendix R.3 answer the following questions
(i) Write the equation for the fitted model (1 mark)
(ii) Interpret the p-value corresponding to the F- statistic (1 mark)
(iii)Give an interpretation of the estimates and p-values of the intercept, and the coefficients
of the three advertising media (4 marks)
(iv) Based on the output, do you think one or more of the independent variables can be
removed? If yes, which one and why? If No, why? (2marks)
d) What patterns or problems do you see in the diagnostic plots in Appendix R.4. Is the
multiple linear regression a good fit for the marketing dataset? (4 marks)
Appendix R.1 Marketing Data
Appendix R.2 Scatter Plots
Page 4 of 7
Appendix R.3 Model Summary
Appendix R.4 Diagnostic plots
Page 5 of 7
QUESTION FOUR {15 MARKS}
An experiment was designed to assess the impact of two different antibiotics on the chances a
child will be cured of an ear infection after adjusting for agd and whether one or both ears were
infected. The variables are “Clear”–whether the infection has been cleared from both ears after
14 days treatment, “Antibiotic”–the treatment type (1 = Ceftriaxone, 0 = Amoxicillin), Age
(categories under two years old, 2-5 years old and 6 year or older), and “NumEars”–the number
of ears infected (either1 or 2). STATA outputs for the pertinent logistic regression model are
below. There are two versions, logit which gives the raw coefficients and their standard errors
and logistic which gives the odds ratios and their standard errors.
a) Overall do these variables help explain how likely a child is to have their ear infections
cleared in14 days? Briefly justify your answer. (2 marks)
b) Do these variables explain a lot the “variability” in how likely an ear infection is to clear?
Explain briefly. What are the practical implications of this statement for treating ear
infections in smallc hildren with antibiotics? (3 marks)
c) Describe what you think would happen if you used backwards stepwise selection to find the
best model for predicting whether a child’s ear-infection would clear. That is, say what
Page 6 of 7
variables would be included in the intial model, what would happen at each step, and what
you think the final model would be, and what you would have to do to verify your answer.
(4 marks)
d) Explain briefly how you could figure out what variable to add first in a forwards stepwise
model selection procedure for this data. (2 marks)
e) Which of the age categories have I used as the reference in this model? (2 marks)
f) Give brief interpretations of the odds ratios for the “Antibiotic” and “TwoToFive” Variables
and show how you would compute them from the information given in the first (logit)
printout. (2 marks)
QUESTION FIVE {15 MARKS}
a) Suppose the relationship between two variables X j and Y j can be given by the model
Y j X j e j , j 1, 2, ....n , where Y j , X j x , Y j is a dependant variable, X j is
random and independent of Y j , and e j ' s are random error variables. The researcher uses the
model and least squares method for estimation. Suppose the he observes X n1 so that the forecast
of Y is Yˆ ˆ ˆX , where the OLS estimators ̂ and ̂ are calculated on the basis of the
n 1 n 1 n 1
observations for j 1, 2,..., n . Find the Variance of the forecast error (7 Marks)
b) Consider a general non-linear model Y j x j , e j , j 1, 2, ....n and
x , y
j j
k 1
x , E Y , k 1 are unknown parameters and x j , is the
conditional mean at X j , e j is the error term. State and proof the necessary assumptions for the
regression error e j that enables estimation of x j , (8 Marks)
Page 7 of 7