0% found this document useful (0 votes)

21 views4 pages

Econometrics I Problem Set Overview

Q: Explain how probabilities of hypothesis test rejection change with varying true parameter values under a mixed scenario model.

In a mixed scenario model, where a proportion of data arises from null and alternative distributions, the probability of rejection depends on the true underlying parameter values and their proportions. Under this model, if 90% of data follow the null (µY = 0) and 10% the alternative (µY = 2), the overall rejection probability is a weighted sum of rejection probabilities under each condition. This requires calculating separate rejection probabilities for each scenario and combining them based on their incidence, affecting the overall type I and type II error probabilities .

Q: Under what conditions can non-parametric tests be used in lieu of normality assumptions in wage analysis?

Non-parametric tests such as the Mann-Whitney U test can be used instead of t-tests to analyze wages when normality cannot be assumed, especially with small sample sizes or when data distributions are skewed. These tests do not assume underlying data distributions and are useful when sample variances are unequal or when data involve ordinal measurements. However, non-parametric tests may have less power compared to parametric counterparts when assumptions of normality and equal variance are met .

Q: How can hypothesis testing determine if research samples were randomly drawn from populations with specific characteristics?

Hypothesis testing assesses whether observed sample data are consistent with a null hypothesis about the population characteristics. Given a significance level, the test evaluates if the data fall within the critical region defined by the hypothesis. The test involves calculating a test statistic and comparing it to critical values; exceeding this critical value suggests rejecting the null hypothesis. When samples consistently produce t-statistics beyond the critical threshold for a hypothesized population mean, it indicates that samples may not align with the presumed population characteristics .

Q: How does correlation between independent random variables affect covariance in hypothesis testing models?

For independently distributed random variables X and Z, if variable Y is defined as Y = X² + Z, the covariance between X and Y is impacted by independence such that Cov(X, Y) = Cov(X, X² + Z) = Cov(X, X²) + Cov(X, Z). Due to independence, Cov(X, Z) = 0. Furthermore, since Cov(X, X²) involves calculations based on distribution properties indicating dependence, the result with zero correlation suggests no linear relationship despite dependence affecting variance .

Q: What are some obstacles to implementing a randomized controlled experiment to study the effect of wearing seat belts on highway traffic deaths?

A hypothetical ideal randomized controlled experiment to study the effect of wearing seat belts on highway traffic deaths would assign participants randomly to either a treatment group wearing seat belts or a control group not wearing them. However, ethical and legal obstacles make implementing such an experiment impractical. Ethically, it would be unacceptable to encourage participants not to use seat belts due to the known protective benefits, and legally, seat belt use is mandated by law in many jurisdictions. Practical issues such as ensuring participant compliance and accurately measuring the outcome are additional impediments .

Q: How does one calculate the unemployment rate using joint probability distribution data?

The unemployment rate is calculated as the complement of the expected value of employment status, E(Y). First, calculate E(Y) using the joint probability distribution: E(Y) = 0 * 0.12 + 1 * 0.88 = 0.88. Therefore, the unemployment rate is given by 1 - E(Y) = 1 - 0.88 = 0.12 or 12% .

Q: What statistical evidence might suggest gender discrimination in wages at a firm?

Statistical evidence of gender discrimination might include significantly higher average salaries for men compared to women when controlling for job descriptions and qualifications, reflected in p-values or confidence intervals excluding the null hypothesis of no difference. If salary distributions are normal and variances equal, a t-test can determine if differences are statistically significant. For this firm, the higher average salary for men compared to women may suggest discrimination, but it's crucial to consider other explanatory variables before concluding systemic bias .

Q: How can an observational panel data set be useful in studying the causal effect of worker training hours on productivity?

An observational panel data set can track the same workers across multiple time periods, allowing researchers to control for unobserved heterogeneity and individual fixed effects that might confound the relationship between training hours and productivity. By observing changes over time within the same workers, researchers can better establish a causal link as potential biases from unobserved variables that are constant over time are mitigated .

Q: What is the reasoning behind using a confidence interval to determine if test scores differ between Pamplona and San Sebastián?

Confidence intervals estimate the range within which the true difference in test scores between Pamplona and San Sebastián lies with a certain probability. A 95% confidence interval for the difference informs whether the observed difference is statistically significant if the interval does not include zero. If the interval excludes zero, one can conclude with high certainty that the population means are different. This approach also accounts for sample variability and uncertainty inherent in estimating population parameters from samples .

Q: What is the probability of drawing a conclusion error in hypothesis testing with multiple independent tests?

When performing multiple independent hypothesis tests, the probability of committing at least one Type I error (false positive) increases. With each test at significance level α = 0.05, the probability of not making a Type I error in a single test is 0.95. For ten independent tests, the probability of not making any Type I error is 0.95^10. Therefore, the probability of at least one Type I error is 1 - 0.95^10, approximately 40.13%, significantly higher than individual test significance levels .

Uploaded by

laura.tello

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views4 pages

Econometrics I Problem Set Overview

Uploaded by

laura.tello

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

School of Economics and Business Administration

University of Navarra
Academic year: 2022/23
Econometrics I
Problem Set I: Ch. 1, 2a & 2b

NOTE: Please remember that problem sets do not count for the final grade, so it is not
needed to hand in the solutions for this problem set. However, it is highly recommended
trying to solve the questions before the practice class.

PROBLEMS

1. Describe a hypothetical ideal randomized controlled experiment to study the

effect of wearing seat belts on highway traffic deaths. Suggest some impediments to
implementing this experiment in practice.

2. You are asked to study the causal effect of hours spent on employee training (mea-
sured in hours per worker per week) in a manufacturing plant on the productivity
of its workers (output per worker per hour). Describe:

(a) An ideal randomized controlled experiment to measure this causal effect.

(b) An observational cross-sectional data set with which you could study this effect.
(c) An observational time series data set for studying this effect.
(d) An observational panel data set for studying this effect.

3. The following table gives the joint probability distribution between employment
status and college graduation among those either employed or looking for work
(unemployed) in the working-age population of South Africa.

Joint probability distribution

Unemployed (Y=0) Employed (Y=1) Total
Non-college grads (X=0) 0.078 0.673 0.751
College grads (X=1) 0.042 0.207 0.249
Total 0.12 0.88 1

(a) Compute E(Y ).

(b) The unemployment rate is the fraction of the labor force that is unemployed.
Show that the unemployment rate is given by 1 − E(Y ).
(c) Calculate E(Y | X = 1) and E(Y | X = 0).
(d) Calculate the unemployment rate for (i) college graduates and (ii) non-college
graduates.

1
(e) A randomly selected member of this population reports being unemployed.
What is the probability that this worker is a college graduate? And a non-
college graduate?
(f) Are educational achievement and employment status independent? Explain.

4. In any year, the weather can inflict storm damage to a home. From year to year,
the damage is random. Let Y denote the dollar value of damage in any given year.
Suppose that in 95% of the years Y = $0, but in 5% of the years Y = $30, 000.

(a) What are the mean and standard deviation of the damage in any year?
(b) Consider an “insurance pool” of 120 people whose homes are sufficiently
dispersed so that, in any year, the damage to different homes can be viewed as
independently distributed random variables. Let Ȳ denote the average damage
to these 120 homes in a year.
i. What is the expected value of the average damage Ȳ ?
ii. What is the probability that Ȳ exceeds $3, 000?

5. Let X and Z be two independently distributed standard normal random variables,

and let Y = X 2 + Z.

(a) Show that E(Y | X) = X 2 .

(b) Show that µY = 1.
(c) Show that E(XY ) = 0. (Hint: Use the fact that the odd moments of a standard
normal random variable are all 0.)
(d) Show that Cov(X, Y ) = and thus corr(X, Y ) = 0.

6. Suppose a new standardized test is given to 150 randomly selected third-grade

students in Pamplona. The sample average score Ȳ1 on the test is 42 points, and
the sample standard deviation, sY1 , is 6 points.

(a) The authors plan to administer the test to all third-grade students in Pamplona.
Construct a 99% confidence interval for the mean score of all third graders in
Pamplona.
(b) Suppose the same test is given to 300 randomly selected third graders from
San Sebastián, producing a sample average Ȳ2 of 48 points and sample stan-
dard deviation sY2 of 10 points. Construct a 95% confidence interval for the
difference in mean scores between both cities.
(c) Can you conclude with a high degree of confidence that the population means
for San Sebastián and Pamplona students are different? (Hint: Think about
the standard error of the difference in the two sample means, and also about
the p-value of the test of no difference in means versus some difference).

2
7. To investigate possible gender discrimination in a Spanish firm, a sample of 120 men
and 150 women with similar job descriptions are selected at random. A summary
of the resulting monthly salaries (in Euro) follows:

Average salary (Ȳ ) Standard deviation (sY ) n

Men 8200 450 120
Women 7900 520 150

(a) Let us assume that the monthly salary is normally distributed in both popula-
tions with equal variances. What do these data suggest about wage differences
in the firm? Do they represent statistically significant evidence that average
wages of men are higher than those for women?
(b) Is it possible to perform the analysis without assuming normality? If so, explain
how.
(c) Do these data suggest that the firm is guilty of gender discrimination in its
compensation policies? Explain.

8. Suppose Yi ∼ i.i.d. N (µY , σY2 ) for i = 1, 2, . . . , n. With σY2 known, the t-statistic
Ȳ −0 σY
for testing H0 : µY = 0 vs H1 : µY > 0 is t = SE( Ȳ )
, where SE(Ȳ ) = √ n
. Suppose
σY = 10 and n = 100, so that SE(Ȳ ) = 1. Using a test with a size of 5%, the null
hypothesis H0 is rejected if z ∗ > 1.64.

(a) Suppose µY = 0, so the null hypothesis is true. What is the probability that
the null hypothesis is rejected?
(b) Suppose µY = 2, so the alternative hypothesis is true. What is the probability
that the null hypothesis is rejected?
(c) Suppose that in 90% of cases the data are drawn from a population where the
null is true (µY = 0) and in 10% of cases the data come from a population
where the alternative is true and µY = 2. Your data came from either the first
or the second population, but you do not know which.
i. You compute the t-statistic. What is the probability that z ∗ > 1.64 —that
is, that you reject the null hypothesis?
ii. Suppose you reject the null hypothesis; that is, z ∗ > 1.64. What is the
probability that the sample data were drawn from the µY = 0 population?

9. Analyse whether the following statements are true or false. If they are true, prove
them and if they are false, justify the reason why.

(a) If X1 , X2 , X3 are three independent random variables such that X1 ∼ N (1, 2),
X2 ∼ N (1, 1), X3 ∼ N (2, 1), and we denote
√
(X1 − 1)/ 2
Y =q
(X2 −1)2 +(X3 −2)2
2

3
, then the probability that Y is below 1.89 is 0.90.
(b) The formula defining the bounds of a confidence interval for parameter θ can
never contain the parameter θ.
(c) In a hypothesis test, if the probability of the type I error is 0.01 and the null
hypothesis is true, then 1% of the times it will not be rejected.
(d) To test the null hypothesis θ = 8 against the alternative hypothesis θ < 8, we
decide to use {θb < 7} as a critical region, where θb is an estimator of θ. Then,
if the true value of θ is 7.5 and for the sample given θb is 7.5, we are making a
Type II error.
(e) Ten researchers separately test the null hypothesis that the mean difference
between two normal populations with known variances is 0, with a significance
level of α = 0.05. The samples used by these ten researchers are independent
from each other. If the null hypothesis is true, then the probability that at
least one of the ten researchers will reject the null hypothesis is greater than
40%.
(f) Given two independent samples from two normal populations with equal vari-
ances, if we reject H0 in the test of equal means with a one-sided alternative
hypothesis and a 0.05 significance level, then we will reject H0 in the test of
equal means with a two-sided alternative hypothesis and a 0.10 significance
level.

Common questions