0% found this document useful (0 votes)

22 views4 pages

Understanding Omitted Variable Bias

This document discusses omitted variable bias and the value of experiments. It presents a framework for analyzing bias that can occur when relevant variables are omitted from a linear regression model. Specifically, it examines how the direction of bias depends on whether the omitted variable increases or decreases the outcome and whether it is positively or negatively correlated with the included variable. The document also notes advantages of experiments in estimating causal effects but discusses challenges like high costs and limited generalizability.

Uploaded by

Gabrielito

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views4 pages

Understanding Omitted Variable Bias

Uploaded by

Gabrielito

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Supplementary Notes

Omitted Variables Bias

Economics of Education (ECO383)
November 2012

Outline

This note
1. goes over omitted variables bias (a general framework is presented below);
2. talks in general about the value of experiments (and mention some of their demerits also see below);
In turn:

Omitted variables bias

Here, we present a general framework for analyzing the bias due to omitting potentially
relevant variables from a linear regression.
Suppose the true underlying model is given by
Ti = + Ci + Xi + i ,

(1)

where Ti measures the test score for student i and Ci is the associated class size.
In practice, let us assume that the X variable is omitted, whatever it may be, and a
sparser model is instead estimated, namely:
Ti = a + bCi + ui

c email: mcmillan@[Link].
Note by Robert McMillan ,

(2)

A basic requirement for unbiasedness (if you refer to our class discussion) is that the
covariance between C and the model error be zero. If this does not hold, then problems
can arise. If we estimate equation (2), then Cov(C, u) 6= 0. Can you see why, given (1)?
In the face of omitted variables, we will consider the following possible cases:

Uncorrelated (irrelevant) omitted variables

If Cov(C, X) = 0, then C cannot pick up any of the influence of the omitted variable,
and the estimated parameter b will be unbiased for . In this case, the omitted variable
is irrelevant to the values that C takes on. This is a special case.

Relevant omitted variables

Four cases can be distinguished here, according to whether
is less than or greater than zero in short, does more X raise or lower test
scores, T ?
Cov(C, X) is less than or greater than zero is more X associated with more or
less C on average?
To give an example, suppose X measures parental education. Thus, typically we expect
higher X to be associated with higher T . In turn, this gives rise to two cases depending
on the covariance between C and X. If that correlation is negative (perhaps because
more highly-educated parents work to get their children in smaller classes), then we will
have a downward bias in our least squares estimate of b from the restricted model. So
we would expect it to be more negative than the true value .
The intuition is as follows: the omitted X variable is advantageous to producing higher
test scores in this case. Due to the negative correlation with C, higher C will typically
be associated with lower X. So if X is omitted (as we assume), then as we reduce C,
the full effect of higher C on T , which is what the estimated coefficient captures, will
reflect both the direct effect of cutting C, but also the indirect effect of higher X. And
recall that higher X is associated with higher T . Thus we will be overstating the impact
of a reduction in class size: the estimated slope would be more negative than the true
slope.
Exercise: Please work out for yourself the bias associated with the remaining three
types of case. Specifically, fill out the full 2-by-2 matrix, listing the direction of bias in
each case. Also make sure you can talk about the intuition in each case. This will be
on the term test and final (in some form or other).
2

Actually, here is one of them: < 0 and Cov(C, X) > 0 (example of X: proportion free
lunch - disadvantaged). Here, we get a downward bias, as C has a direct negative effect,
but higher C picks up more X, which lowers test scores. So the indirect effect via X
is negative, and will drag down the impact of increased C. Can you give the intuition
here?

Responses
The most obvious solution is to include all relevant variables. When this is not possible,
perhaps because data on some relevant factor could not be collected, then we need to
think carefully about possible biases. For this purpose, the above framework is useful.

3
3.1

Experiments
Advantages of experiments

i. Randomized experiments are often thought of as the gold standard when we wish
to learn about causal effects empirically. Especially attractive in light of the earlier
discussion, they allow us to get around omitted variables problems.
Take the causal effects of class size reductions as an example. One can assign some
schools (or classes within those schools) to the treatment group, and others to the
control group, then compare performance differences.
In a regression of the form
Ti = + Ci + i ,

(3)

one can learn about the causal effect of reductions in class size because Cov(C, ) = 0,
by construction. Further, C should be uncorrelated with any observable variable, at
least if the randomization is done correctly. This is because the level of C is assigned
randomly, unrelated to anything else, observable and unobservable, by design.
ii. Experiments also allow the researcher to vary C in a calculated way, to learn about
different types of causal impact.

3.2

Some problems with experiments

i. They can be very expensive. To keep costs down, often the scale of the experiment is
restricted.

ii. It can be hard to generalize findings from an experimental study to a broader nonexperimental setting. This is usually referred to as a concern about external validity,
and arises for a couple of reasons:
1. Agents being studied can change their behaviour when they know they are being
treated. For example, teachers may think they have to make class size reductions
a success, as this will bring more resources to the teaching profession in future.
Thus special incentives operate that one would not expect in a regular setting.
2. General equilibrium effects: when a policy is enacted on a much larger scale than
most experiments, this may cause changes in other behaviour that lead the direct
effects of the policy to be swamped by unforeseen indirect effects. For example,
after looking at Tennessees Project STAR (Student Teacher Achievement Ratio),
California implemented a state-wide class size reduction in 1996-97. When implemented on a statewide basis, there was a sudden demand for lots of new teachers
necessary for smaller classes. Wealthier schools were able to attract better teachers from less fortunate schools, forcing the worst schools to recruit lower quality
teachers to access the increased funding associated with the program. As Jepsen
and Rivkin (2009) discuss, the positive effects of smaller class sizes on achievement
were often counteracted by the decrease in teacher quality.

Common questions

Randomized experiments are advantageous because they allow researchers to establish causal relationships by ensuring that the treatment is independent of both observed and unobserved variables, eliminating omitted variable bias. This is achieved through the random assignment of participants into treatment and control groups, which equalizes other confounding factors between groups . However, disadvantages include the high cost of conducting large-scale experiments and the challenge of generalizing findings due to changes in behavior caused by awareness of being observed (Hawthorne effect) and external validity concerns, as the experiment's conditions may not replicate real-world settings .

Policymakers may incorrectly interpret results from education studies that fail to consider omitted variables by overestimating the effectiveness of certain interventions. For example, if a study finds a strong impact of class size reduction on student performance without accounting for omitted factors like parental education, policymakers might implement widespread class size reductions expecting similar benefits. This could lead to inefficient allocation of resources and unintended negative consequences, such as lowering the teacher quality due to increased demand for teachers, without achieving the expected educational improvements .

Random assignment in experiments helps address omitted variable bias by ensuring that all other variables, observable and unobservable, are evenly distributed between treatment and control groups. By randomly assigning class sizes or other educational treatments, the dependence of these factors on potentially omitted variables is eliminated. Consequently, any difference in outcomes between groups can be causally attributed to the treatment itself (e.g., class size reduction) rather than confounding variables, thereby providing a clearer estimate of the treatment's true effect .

The correlation between class size and omitted variables such as parental education crucially affects estimation bias. When parental education is omitted and it is negatively correlated with class size, the bias is downward, meaning the model underestimates the actual positive effect of smaller class sizes on test scores. This occurs because students in smaller classes may disproportionately have more educated parents, who also contribute to higher test scores. The regression incorrectly attributes part of this educational benefit to small class sizes, thus skewing policy conclusions towards reducing class size as a stronger intervention than it might be in reality .

Omitted variable bias may occur when estimating the effect of class size on student test scores if a relevant variable, such as parental education, is not included in the regression model. If this omitted variable is correlated with both class size and test scores, it can lead to biased estimates of the impact of class size on test scores. Specifically, if the correlation between class size and the omitted variable (e.g., parental education) is negative, the estimated effect of class size will be downward biased relative to the true effect, as the omitted variable effects are incorrectly attributed to class size .

Omitted variable bias can significantly distort the conclusions from a regression model by attributing effects to the included variables that actually arise from the omitted variables. For instance, if a variable like parental education, which positively influences test scores, is omitted and negatively correlated with class size, the model may underestimate the benefits of small class sizes. This arises because the omitted variable's effect is incorrectly captured by the coefficient of the included variable, leading to a potentially misleading policy implication that smaller class sizes have a greater impact than they truly do .

Omitted variables, such as socio-economic factors, can crucially influence the perceived effectiveness of educational interventions. If such factors are not included in the analysis, the effectiveness of interventions like class size reduction may be misestimated. For example, socio-economic factors like parental education or income can contribute to better educational outcomes independently, and if these factors are correlated with class size (e.g., smaller classes in more affluent areas), the benefits of class size reduction might be overstated when these variables are omitted from the model .

Randomized experiments are considered the 'gold standard' for determining causal relationships in education because they eliminate confounding variables through random assignment. This ensures that any observed effect on the outcome can be attributed directly to the intervention being tested, such as class size reduction, rather than to extraneous or uncontrolled factors . Randomization balances both known and unknown factors across treatment and control groups, providing a clear, unbiased estimate of the causal effect.

External validity is crucial in interpreting results of educational experiments because it pertains to the generalizability of findings beyond the experimental context. If the conditions under which an experiment is conducted differ substantially from typical real-world settings, the results may not be applicable elsewhere. For example, behaviors may change due to awareness of being studied, or the scale of policy implemented may lead to unforeseen adjustments in related sectors, as seen with the demand for lower-quality teachers in California's class size reduction . Thus, external validity ensures that experimental findings are relevant and applicable to broader contexts and populations.

Large-scale implementation of class size reduction policies can lead to general equilibrium effects, such as the redistribution of teaching talent. In California's state-wide class size reduction following Project STAR, wealthier schools attracted better teachers from less affluent schools, resulting in a decline in teacher quality at those schools. This outcome countered the positive effects of smaller class sizes, as the reduced class sizes were accompanied by lower teaching quality, thus diminishing the intended benefits of the policy .

Linear Regression & Omitted Variable Bias
No ratings yet
Linear Regression & Omitted Variable Bias
53 pages
Applied Econometrics 2014 2
No ratings yet
Applied Econometrics 2014 2
72 pages
Omitted Variable Bias in Econometrics
No ratings yet
Omitted Variable Bias in Econometrics
33 pages
Empirical Methods in Economics
No ratings yet
Empirical Methods in Economics
36 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
50 pages
Common Empirical Methods in Economics
No ratings yet
Common Empirical Methods in Economics
36 pages
Emp Handout
No ratings yet
Emp Handout
36 pages
Chapter 6-Linear Regression With Multiple Regressors
No ratings yet
Chapter 6-Linear Regression With Multiple Regressors
68 pages
SW3e Ch6 Slides 2026
No ratings yet
SW3e Ch6 Slides 2026
57 pages
Technical Instrumental Variables Premand
No ratings yet
Technical Instrumental Variables Premand
27 pages
ch7 Sec1 2 3 Full Notes
No ratings yet
ch7 Sec1 2 3 Full Notes
11 pages
Manzan SW4e Ch06
No ratings yet
Manzan SW4e Ch06
52 pages
Ch15.SocialScienceResearch AB
No ratings yet
Ch15.SocialScienceResearch AB
8 pages
Omitted Variable Bias in Multiple Regression
No ratings yet
Omitted Variable Bias in Multiple Regression
30 pages
Lecture 05
No ratings yet
Lecture 05
57 pages
17.874 Lecture Notes Part 6: Panel Models
No ratings yet
17.874 Lecture Notes Part 6: Panel Models
13 pages
Omitted Variable Bias in Regression Analysis
No ratings yet
Omitted Variable Bias in Regression Analysis
25 pages
M346 Linear Statistical Modelling Solutions
No ratings yet
M346 Linear Statistical Modelling Solutions
10 pages
ANOVA vs Regression Models Explained
No ratings yet
ANOVA vs Regression Models Explained
12 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
39 pages
Omitted Variable Bias in Regression Analysis
No ratings yet
Omitted Variable Bias in Regression Analysis
38 pages
Omitted Variable Bias in Model Specification
No ratings yet
Omitted Variable Bias in Model Specification
5 pages
1 4 Multilevel and Longitudinal Mode PDF
No ratings yet
1 4 Multilevel and Longitudinal Mode PDF
1,503 pages
Understanding Correlational Research
No ratings yet
Understanding Correlational Research
11 pages
(AEA) Econometrics Module 2 Instrumental Variables
No ratings yet
(AEA) Econometrics Module 2 Instrumental Variables
21 pages
Variable Selection in Regression Analysis
No ratings yet
Variable Selection in Regression Analysis
22 pages
Fixed vs. Random Effects in ANOVA
No ratings yet
Fixed vs. Random Effects in ANOVA
3 pages
Variable Selection in Regression Models
No ratings yet
Variable Selection in Regression Models
30 pages
Data Analysis: Variable Transformation Techniques
No ratings yet
Data Analysis: Variable Transformation Techniques
52 pages
Part02 Categorical Predictors
No ratings yet
Part02 Categorical Predictors
25 pages
Causal Inference Syllabus - NYU
No ratings yet
Causal Inference Syllabus - NYU
6 pages
Understanding Experiments in Econometrics
No ratings yet
Understanding Experiments in Econometrics
26 pages
Introduction to Econometrics Explained
No ratings yet
Introduction to Econometrics Explained
7 pages
CH 2 Experiments 2025
No ratings yet
CH 2 Experiments 2025
52 pages
Hypothesis Testing in OLS Regression
No ratings yet
Hypothesis Testing in OLS Regression
10 pages
Fixed vs. Random Effects in Regression
No ratings yet
Fixed vs. Random Effects in Regression
3 pages
Regression and ANOVA Cheat Sheet
No ratings yet
Regression and ANOVA Cheat Sheet
3 pages
Chapter 6 Econometrics
No ratings yet
Chapter 6 Econometrics
22 pages
Econometrics Ch2 Multiple Regression Analysis
No ratings yet
Econometrics Ch2 Multiple Regression Analysis
91 pages
Understanding Causality in Time Series Analysis
No ratings yet
Understanding Causality in Time Series Analysis
76 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
18 pages
Program Evaluation in Microeconometrics
No ratings yet
Program Evaluation in Microeconometrics
29 pages
Understanding Conceptual Frameworks
100% (2)
Understanding Conceptual Frameworks
5 pages
Analyzing Gender Wage Disparities
No ratings yet
Analyzing Gender Wage Disparities
45 pages
Regression After 3 Seconds in Chapter 9
No ratings yet
Regression After 3 Seconds in Chapter 9
28 pages
Understanding Simple Linear Regression
No ratings yet
Understanding Simple Linear Regression
28 pages
Understanding Simple Linear Regression
No ratings yet
Understanding Simple Linear Regression
28 pages
Simple Linear Regression Scott M Lynch
No ratings yet
Simple Linear Regression Scott M Lynch
111 pages
Mdm4u 3-5A
No ratings yet
Mdm4u 3-5A
15 pages
Four Hurdles of Causality Explained
No ratings yet
Four Hurdles of Causality Explained
10 pages
Two-Variable Data Analysis Techniques
No ratings yet
Two-Variable Data Analysis Techniques
32 pages
Fixed vs. Random Effects in Regression
No ratings yet
Fixed vs. Random Effects in Regression
3 pages
Characteristics of Professionalism in Teaching
No ratings yet
Characteristics of Professionalism in Teaching
28 pages
Germany Student Visa Document Checklist
No ratings yet
Germany Student Visa Document Checklist
2 pages
Installation of New District Supervisor
No ratings yet
Installation of New District Supervisor
3 pages
Understanding Holacracy and Its Impact
100% (1)
Understanding Holacracy and Its Impact
3 pages
101 ADHD Interventions For The Elementary School Classroom Teacher
96% (24)
101 ADHD Interventions For The Elementary School Classroom Teacher
50 pages
Enhancing Student Skills Through Exhibitions
No ratings yet
Enhancing Student Skills Through Exhibitions
4 pages
Rhyming Words in Family Poems
No ratings yet
Rhyming Words in Family Poems
4 pages
Motivating English Learning in Diverse Contexts
No ratings yet
Motivating English Learning in Diverse Contexts
6 pages
Personality Assessment Insights
0% (1)
Personality Assessment Insights
3 pages
EDU 536: Socialization and School Culture
No ratings yet
EDU 536: Socialization and School Culture
33 pages
Effective Use of Manipulatives in Math
No ratings yet
Effective Use of Manipulatives in Math
5 pages
Iskandar Tony's Finance Career Overview
No ratings yet
Iskandar Tony's Finance Career Overview
1 page
Process Writing in ESL Education
No ratings yet
Process Writing in ESL Education
11 pages
Diversity at Schoo: Anne Lodge and Kathleen Lynch (Editors)
No ratings yet
Diversity at Schoo: Anne Lodge and Kathleen Lynch (Editors)
143 pages
Wilson County Schools Calendar 2026-27
No ratings yet
Wilson County Schools Calendar 2026-27
1 page
Children's Smoking Prevention Strategies
No ratings yet
Children's Smoking Prevention Strategies
7 pages
Playfulness in Responsible Innovation Reflection
No ratings yet
Playfulness in Responsible Innovation Reflection
22 pages
BCG Career Development Insights
100% (1)
BCG Career Development Insights
10 pages
Pre-Workshop Diagnostic Questionnaire
No ratings yet
Pre-Workshop Diagnostic Questionnaire
7 pages
Certificates of Recognition for INSET 2024
No ratings yet
Certificates of Recognition for INSET 2024
40 pages
Modul Guru Ganti Terancang Tingkatan 5
No ratings yet
Modul Guru Ganti Terancang Tingkatan 5
17 pages
UP Mindanao BA Communication Arts Curriculum
No ratings yet
UP Mindanao BA Communication Arts Curriculum
2 pages
Social Behavior Assessment Report
No ratings yet
Social Behavior Assessment Report
23 pages
Secret Guest Survey Feedback
No ratings yet
Secret Guest Survey Feedback
6 pages
Foundation Problem Solving Questions
No ratings yet
Foundation Problem Solving Questions
20 pages
Md. Saidul Islam's CV Highlights
No ratings yet
Md. Saidul Islam's CV Highlights
1 page
Understanding Media Effects Theories
No ratings yet
Understanding Media Effects Theories
21 pages
21st Century Skills Development Through Inquiry-Based Learning
No ratings yet
21st Century Skills Development Through Inquiry-Based Learning
16 pages
Reviving Lalbagh Palace Heritage
No ratings yet
Reviving Lalbagh Palace Heritage
5 pages
Assignment 2 Cover Page Template
No ratings yet
Assignment 2 Cover Page Template
4 pages

Understanding Omitted Variable Bias

Uploaded by

Understanding Omitted Variable Bias

Uploaded by

Supplementary Notes

Omitted Variables Bias

Omitted variables bias

Uncorrelated (irrelevant) omitted variables

Relevant omitted variables

Some problems with experiments

Common questions

What are the advantages and disadvantages of using randomized experiments in educational research?

How might policymakers mistakenly interpret results from biased education studies if omitted variables are not considered?

How can random assignment in experiments help address omitted variable bias, particularly in studies of class size and educational outcomes?

How does the correlation between class size and omitted variables like parental education affect estimation bias in education studies?

Why might omitted variable bias occur in the context of estimating the effect of class size on student test scores?

In what ways can omitted variable bias impact the conclusions drawn from a regression model involving class size and student performance?

What role do omitted variables, such as socio-economic factors, play in determining the effectiveness of educational interventions like class size reduction?

What is the primary reason randomized experiments are considered the 'gold standard' in determining causal relationships in education?

Why is concern about 'external validity' important when interpreting the results of educational experiments?

What are some potential general equilibrium effects of implementing class size reduction policies on a large scale, like California's response to Project STAR?

You might also like