Econometrics I Problem Set Overview
Econometrics I Problem Set Overview
In a mixed scenario model, where a proportion of data arises from null and alternative distributions, the probability of rejection depends on the true underlying parameter values and their proportions. Under this model, if 90% of data follow the null (µY = 0) and 10% the alternative (µY = 2), the overall rejection probability is a weighted sum of rejection probabilities under each condition. This requires calculating separate rejection probabilities for each scenario and combining them based on their incidence, affecting the overall type I and type II error probabilities .
Non-parametric tests such as the Mann-Whitney U test can be used instead of t-tests to analyze wages when normality cannot be assumed, especially with small sample sizes or when data distributions are skewed. These tests do not assume underlying data distributions and are useful when sample variances are unequal or when data involve ordinal measurements. However, non-parametric tests may have less power compared to parametric counterparts when assumptions of normality and equal variance are met .
Hypothesis testing assesses whether observed sample data are consistent with a null hypothesis about the population characteristics. Given a significance level, the test evaluates if the data fall within the critical region defined by the hypothesis. The test involves calculating a test statistic and comparing it to critical values; exceeding this critical value suggests rejecting the null hypothesis. When samples consistently produce t-statistics beyond the critical threshold for a hypothesized population mean, it indicates that samples may not align with the presumed population characteristics .
For independently distributed random variables X and Z, if variable Y is defined as Y = X² + Z, the covariance between X and Y is impacted by independence such that Cov(X, Y) = Cov(X, X² + Z) = Cov(X, X²) + Cov(X, Z). Due to independence, Cov(X, Z) = 0. Furthermore, since Cov(X, X²) involves calculations based on distribution properties indicating dependence, the result with zero correlation suggests no linear relationship despite dependence affecting variance .
A hypothetical ideal randomized controlled experiment to study the effect of wearing seat belts on highway traffic deaths would assign participants randomly to either a treatment group wearing seat belts or a control group not wearing them. However, ethical and legal obstacles make implementing such an experiment impractical. Ethically, it would be unacceptable to encourage participants not to use seat belts due to the known protective benefits, and legally, seat belt use is mandated by law in many jurisdictions. Practical issues such as ensuring participant compliance and accurately measuring the outcome are additional impediments .
The unemployment rate is calculated as the complement of the expected value of employment status, E(Y). First, calculate E(Y) using the joint probability distribution: E(Y) = 0 * 0.12 + 1 * 0.88 = 0.88. Therefore, the unemployment rate is given by 1 - E(Y) = 1 - 0.88 = 0.12 or 12% .
Statistical evidence of gender discrimination might include significantly higher average salaries for men compared to women when controlling for job descriptions and qualifications, reflected in p-values or confidence intervals excluding the null hypothesis of no difference. If salary distributions are normal and variances equal, a t-test can determine if differences are statistically significant. For this firm, the higher average salary for men compared to women may suggest discrimination, but it's crucial to consider other explanatory variables before concluding systemic bias .
An observational panel data set can track the same workers across multiple time periods, allowing researchers to control for unobserved heterogeneity and individual fixed effects that might confound the relationship between training hours and productivity. By observing changes over time within the same workers, researchers can better establish a causal link as potential biases from unobserved variables that are constant over time are mitigated .
Confidence intervals estimate the range within which the true difference in test scores between Pamplona and San Sebastián lies with a certain probability. A 95% confidence interval for the difference informs whether the observed difference is statistically significant if the interval does not include zero. If the interval excludes zero, one can conclude with high certainty that the population means are different. This approach also accounts for sample variability and uncertainty inherent in estimating population parameters from samples .
When performing multiple independent hypothesis tests, the probability of committing at least one Type I error (false positive) increases. With each test at significance level α = 0.05, the probability of not making a Type I error in a single test is 0.95. For ten independent tests, the probability of not making any Type I error is 0.95^10. Therefore, the probability of at least one Type I error is 1 - 0.95^10, approximately 40.13%, significantly higher than individual test significance levels .