0% found this document useful (0 votes)
22 views4 pages

Understanding Omitted Variable Bias

This document discusses omitted variable bias and the value of experiments. It presents a framework for analyzing bias that can occur when relevant variables are omitted from a linear regression model. Specifically, it examines how the direction of bias depends on whether the omitted variable increases or decreases the outcome and whether it is positively or negatively correlated with the included variable. The document also notes advantages of experiments in estimating causal effects but discusses challenges like high costs and limited generalizability.

Uploaded by

Gabrielito
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views4 pages

Understanding Omitted Variable Bias

This document discusses omitted variable bias and the value of experiments. It presents a framework for analyzing bias that can occur when relevant variables are omitted from a linear regression model. Specifically, it examines how the direction of bias depends on whether the omitted variable increases or decreases the outcome and whether it is positively or negatively correlated with the included variable. The document also notes advantages of experiments in estimating causal effects but discusses challenges like high costs and limited generalizability.

Uploaded by

Gabrielito
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Supplementary Notes

Omitted Variables Bias


Economics of Education (ECO383)
November 2012

Outline

This note
1. goes over omitted variables bias (a general framework is presented below);
2. talks in general about the value of experiments (and mention some of their demerits also see below);
In turn:

Omitted variables bias

Here, we present a general framework for analyzing the bias due to omitting potentially
relevant variables from a linear regression.
Suppose the true underlying model is given by
Ti = + Ci + Xi + i ,

(1)

where Ti measures the test score for student i and Ci is the associated class size.
In practice, let us assume that the X variable is omitted, whatever it may be, and a
sparser model is instead estimated, namely:
Ti = a + bCi + ui

c email: mcmillan@[Link].
Note by Robert McMillan ,

(2)

A basic requirement for unbiasedness (if you refer to our class discussion) is that the
covariance between C and the model error be zero. If this does not hold, then problems
can arise. If we estimate equation (2), then Cov(C, u) 6= 0. Can you see why, given (1)?
In the face of omitted variables, we will consider the following possible cases:

Uncorrelated (irrelevant) omitted variables


If Cov(C, X) = 0, then C cannot pick up any of the influence of the omitted variable,
and the estimated parameter b will be unbiased for . In this case, the omitted variable
is irrelevant to the values that C takes on. This is a special case.

Relevant omitted variables


Four cases can be distinguished here, according to whether
is less than or greater than zero in short, does more X raise or lower test
scores, T ?
Cov(C, X) is less than or greater than zero is more X associated with more or
less C on average?
To give an example, suppose X measures parental education. Thus, typically we expect
higher X to be associated with higher T . In turn, this gives rise to two cases depending
on the covariance between C and X. If that correlation is negative (perhaps because
more highly-educated parents work to get their children in smaller classes), then we will
have a downward bias in our least squares estimate of b from the restricted model. So
we would expect it to be more negative than the true value .
The intuition is as follows: the omitted X variable is advantageous to producing higher
test scores in this case. Due to the negative correlation with C, higher C will typically
be associated with lower X. So if X is omitted (as we assume), then as we reduce C,
the full effect of higher C on T , which is what the estimated coefficient captures, will
reflect both the direct effect of cutting C, but also the indirect effect of higher X. And
recall that higher X is associated with higher T . Thus we will be overstating the impact
of a reduction in class size: the estimated slope would be more negative than the true
slope.
Exercise: Please work out for yourself the bias associated with the remaining three
types of case. Specifically, fill out the full 2-by-2 matrix, listing the direction of bias in
each case. Also make sure you can talk about the intuition in each case. This will be
on the term test and final (in some form or other).
2

Actually, here is one of them: < 0 and Cov(C, X) > 0 (example of X: proportion free
lunch - disadvantaged). Here, we get a downward bias, as C has a direct negative effect,
but higher C picks up more X, which lowers test scores. So the indirect effect via X
is negative, and will drag down the impact of increased C. Can you give the intuition
here?

Responses
The most obvious solution is to include all relevant variables. When this is not possible,
perhaps because data on some relevant factor could not be collected, then we need to
think carefully about possible biases. For this purpose, the above framework is useful.

3
3.1

Experiments
Advantages of experiments

i. Randomized experiments are often thought of as the gold standard when we wish
to learn about causal effects empirically. Especially attractive in light of the earlier
discussion, they allow us to get around omitted variables problems.
Take the causal effects of class size reductions as an example. One can assign some
schools (or classes within those schools) to the treatment group, and others to the
control group, then compare performance differences.
In a regression of the form
Ti = + Ci + i ,

(3)

one can learn about the causal effect of reductions in class size because Cov(C, ) = 0,
by construction. Further, C should be uncorrelated with any observable variable, at
least if the randomization is done correctly. This is because the level of C is assigned
randomly, unrelated to anything else, observable and unobservable, by design.
ii. Experiments also allow the researcher to vary C in a calculated way, to learn about
different types of causal impact.

3.2

Some problems with experiments

i. They can be very expensive. To keep costs down, often the scale of the experiment is
restricted.

ii. It can be hard to generalize findings from an experimental study to a broader nonexperimental setting. This is usually referred to as a concern about external validity,
and arises for a couple of reasons:
1. Agents being studied can change their behaviour when they know they are being
treated. For example, teachers may think they have to make class size reductions
a success, as this will bring more resources to the teaching profession in future.
Thus special incentives operate that one would not expect in a regular setting.
2. General equilibrium effects: when a policy is enacted on a much larger scale than
most experiments, this may cause changes in other behaviour that lead the direct
effects of the policy to be swamped by unforeseen indirect effects. For example,
after looking at Tennessees Project STAR (Student Teacher Achievement Ratio),
California implemented a state-wide class size reduction in 1996-97. When implemented on a statewide basis, there was a sudden demand for lots of new teachers
necessary for smaller classes. Wealthier schools were able to attract better teachers from less fortunate schools, forcing the worst schools to recruit lower quality
teachers to access the increased funding associated with the program. As Jepsen
and Rivkin (2009) discuss, the positive effects of smaller class sizes on achievement
were often counteracted by the decrease in teacher quality.

Common questions

Powered by AI

Randomized experiments are advantageous because they allow researchers to establish causal relationships by ensuring that the treatment is independent of both observed and unobserved variables, eliminating omitted variable bias. This is achieved through the random assignment of participants into treatment and control groups, which equalizes other confounding factors between groups . However, disadvantages include the high cost of conducting large-scale experiments and the challenge of generalizing findings due to changes in behavior caused by awareness of being observed (Hawthorne effect) and external validity concerns, as the experiment's conditions may not replicate real-world settings .

Policymakers may incorrectly interpret results from education studies that fail to consider omitted variables by overestimating the effectiveness of certain interventions. For example, if a study finds a strong impact of class size reduction on student performance without accounting for omitted factors like parental education, policymakers might implement widespread class size reductions expecting similar benefits. This could lead to inefficient allocation of resources and unintended negative consequences, such as lowering the teacher quality due to increased demand for teachers, without achieving the expected educational improvements .

Random assignment in experiments helps address omitted variable bias by ensuring that all other variables, observable and unobservable, are evenly distributed between treatment and control groups. By randomly assigning class sizes or other educational treatments, the dependence of these factors on potentially omitted variables is eliminated. Consequently, any difference in outcomes between groups can be causally attributed to the treatment itself (e.g., class size reduction) rather than confounding variables, thereby providing a clearer estimate of the treatment's true effect .

The correlation between class size and omitted variables such as parental education crucially affects estimation bias. When parental education is omitted and it is negatively correlated with class size, the bias is downward, meaning the model underestimates the actual positive effect of smaller class sizes on test scores. This occurs because students in smaller classes may disproportionately have more educated parents, who also contribute to higher test scores. The regression incorrectly attributes part of this educational benefit to small class sizes, thus skewing policy conclusions towards reducing class size as a stronger intervention than it might be in reality .

Omitted variable bias may occur when estimating the effect of class size on student test scores if a relevant variable, such as parental education, is not included in the regression model. If this omitted variable is correlated with both class size and test scores, it can lead to biased estimates of the impact of class size on test scores. Specifically, if the correlation between class size and the omitted variable (e.g., parental education) is negative, the estimated effect of class size will be downward biased relative to the true effect, as the omitted variable effects are incorrectly attributed to class size .

Omitted variable bias can significantly distort the conclusions from a regression model by attributing effects to the included variables that actually arise from the omitted variables. For instance, if a variable like parental education, which positively influences test scores, is omitted and negatively correlated with class size, the model may underestimate the benefits of small class sizes. This arises because the omitted variable's effect is incorrectly captured by the coefficient of the included variable, leading to a potentially misleading policy implication that smaller class sizes have a greater impact than they truly do .

Omitted variables, such as socio-economic factors, can crucially influence the perceived effectiveness of educational interventions. If such factors are not included in the analysis, the effectiveness of interventions like class size reduction may be misestimated. For example, socio-economic factors like parental education or income can contribute to better educational outcomes independently, and if these factors are correlated with class size (e.g., smaller classes in more affluent areas), the benefits of class size reduction might be overstated when these variables are omitted from the model .

Randomized experiments are considered the 'gold standard' for determining causal relationships in education because they eliminate confounding variables through random assignment. This ensures that any observed effect on the outcome can be attributed directly to the intervention being tested, such as class size reduction, rather than to extraneous or uncontrolled factors . Randomization balances both known and unknown factors across treatment and control groups, providing a clear, unbiased estimate of the causal effect.

External validity is crucial in interpreting results of educational experiments because it pertains to the generalizability of findings beyond the experimental context. If the conditions under which an experiment is conducted differ substantially from typical real-world settings, the results may not be applicable elsewhere. For example, behaviors may change due to awareness of being studied, or the scale of policy implemented may lead to unforeseen adjustments in related sectors, as seen with the demand for lower-quality teachers in California's class size reduction . Thus, external validity ensures that experimental findings are relevant and applicable to broader contexts and populations.

Large-scale implementation of class size reduction policies can lead to general equilibrium effects, such as the redistribution of teaching talent. In California's state-wide class size reduction following Project STAR, wealthier schools attracted better teachers from less affluent schools, resulting in a decline in teacher quality at those schools. This outcome countered the positive effects of smaller class sizes, as the reduced class sizes were accompanied by lower teaching quality, thus diminishing the intended benefits of the policy .

You might also like