0% found this document useful (0 votes)

16 views25 pages

Omitted Variable Bias in Regression Analysis

The document discusses omitted variable bias (OVB) in simple linear regression, explaining that bias occurs when an omitted variable is correlated with the regressor and affects the dependent variable. It uses the example of English language ability as an omitted variable that impacts test scores and class size, demonstrating how ignoring such factors can lead to biased estimates. The document concludes with empirical results from a regression analysis that confirms the presence of OVB in the context of test scores and class size.

Uploaded by

Baran Alp Özdemir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views25 pages

Omitted Variable Bias in Regression Analysis

Uploaded by

Baran Alp Özdemir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Omitted Variable Bias

(SW Section 6.1)

1
Omitted Variable Bias

In Simple (Linear) Regression, we posit the population relation

between Y and X as:
Yi = 0 + 1Xi + ui
where u denotes the error term. The error u arises because of other
factors (or variables) that influence Y but are not included in the
regression function.
Fact 1. There are always omitted variables; they hide in u.
Fact 2. Sometimes, the omission of those variables can lead to
bias in the OLS estimator.
Question: When bias occurs? Answer: If the assumption (LS A1)
that justifies OLS estimation is violated.

2
Definition: The bias in the OLS estimator that occurs as a result
of an omitted factor, or variable, is called omitted variable bias.

Let Z denote an omitted variable.

Claim: For omitted variable bias (OVB) to occur, the omitted
variable “Z” must satisfy both of the following conditions:
(1) Z is correlated with the regressor X (i.e., Cov(Z, X)  0),
(2) Z is a determinant of Y (i.e., Z is part of u);

We will prove this claim. First, let’s return to our primary

example and think again.

3
Question: Does reducing class size (student to teacher ratio)
improve student achievement (test scores)?
Many factors have been left out. Consider Z1 = English language
ability. We can measure it in various ways. For simplicity we take,
1 if English is the second language of student
Z1 = { .
0 else
(1) Immigrant communities tend to be less affluent. As a result
they have smaller school budgets and higher student-teacher
ratios (STR): Z1 is correlated with X.
(2) English language ability (whether the student learned English
as a second language) is likely to affect test scores: Z1 is a
determinant of Y.
Since both conditions are met, we expect 𝛽̂1 in SR to be biased.
4
For omitted variable bias (OVB) to occur, the omitted variable “Z”
must satisfy both conditions.
Question: Do all omitted factors satisfy both conditions?
• Consider Z2 = Time of day of the test.
Is Z2 correlated with X?
Is Z2 a determinant of Y?

• Consider Z3 = Parking lot space per pupil.

Is Z3 correlated with X?
Is Z3 a determinant of Y?

5
What is the direction of the omitted variable bias?
Common sense suggests an answer. Think about the example of Z1
= English language (in)ability. Common sense says overstatement
of the expected negative class size effect. (see next two slides)

What is the magnitude of the omitted variable bias?

Common sense will not help here, but there is a formula.

We will obtain an expression for the asymptotic bias in the OLS

estimator of slope, relying on the algebra we used in SR.
(For an alternative approach, see S&W Appendix 6.1 which builds
on their Appendix 4.3.)
6
Test Score and Class Size problem
Instead of Z (= 1 if English Language student, = 0 else) we have
fraction (percentage) of students in the district who are English
Learners (PctEL).
Logic: Districts that have higher PctEL are expected to:
(1) Have larger class sizes, and
(2) Do worse on standardized tests

Claim about the direction of bias: Ignoring the effect of “having

many English learners” in the class (omitting PctEL) would result
in a larger negative slope (overstatement of the expected negative
class size effect).
7
Is this is actually going on in the California data?

Verification of 1: Districts with higher PctEL have bigger classes.

(compare the distribution of small vs. large classes for each PctEL)

Verification of 2: Districts with higher PctEL have lower test scores.

(compare the test scores by PctEL within each class size)
8
Verification of the claim about the direction of the bias:
Among districts with comparable PctEL, the effect of class size is
smaller than the overall “test score gap” = 7.4.
In the Simple Regression model, STR gets credit for the negative
effect attributable to presence of English learners in the class.
Thus, ignoring the effect of “having many English learners” in the
class (omitting PctEL) would result in a larger negative slope in
the Simple Regression.

How about the magnitude of the bias?

9
The OLS estimator of the slope is a linear function of Y,
n

( X − X )(Yi − Y ) ∑𝑖 𝑥∗𝑖 𝑦𝑖∗ ∑𝑖 𝑥∗𝑖 𝑌𝑖

ˆ1 =
i
i =1 = = = ∑𝑖 𝑐𝑖 𝑌𝑖 (*)
n ∑𝑖(𝑥∗𝑖 )2 ∑𝑖(𝑥∗𝑖 )2
 i
( X
i =1
− X ) 2

𝑥∗𝑖
where 𝑥𝑖∗ = Xi – X , 𝑦𝑖∗ = Yi – Y , and ci = , a function of
∑𝑖(𝑥∗𝑖 )2
X alone. Recall that ci’s satisfy the following conditions:
1
∑𝑖 𝑐𝑖 = 0, ∑𝑖 𝑐𝑖 𝑥𝑖∗ = 1, ∑𝑖 𝑐𝑖2 = 2, ∑𝑖 𝑐𝑖 𝑋𝑖 = 1.
∗
∑𝑖(𝑥𝑖 )

Population relation: Yi = 0 + 1Xi + ui. Plug this in (*):

ˆ = ∑𝑖 𝑐𝑖 𝑌𝑖 = ∑𝑖 𝑐𝑖 ( +  𝑋𝑖 + 𝑢𝑖 )
1 0 1
= 0 ∑
⏟ 𝑖 𝑐𝑖 + 𝛽1 ∑
⏟𝑖 𝑐𝑖 𝑋𝑖 + ∑
⏟𝑖 𝑐𝑖 𝑢𝑖
0 1 𝐿𝑖𝑛𝑒𝑎𝑟 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑢

10
Rewrite as:
ˆ1 – 𝛽1 = ∑𝑖 𝑐𝑖 𝑢𝑖 .
Examine the discrepancy further:
1 1
𝑥𝑖∗ 𝑢𝑖 ∑𝑖 𝑥𝑖∗ 𝑢𝑖 ∑𝑖 𝑥𝑖∗ 𝑢𝑖 ∑𝑖 𝑥𝑖∗ 𝑢𝑖
𝑛 𝑛
∑𝑖 𝑐𝑖 𝑢𝑖 = ∑𝑖 2 = 2 = 1 2 = 1 2 .
∑𝑖(𝑥𝑖∗ ) ∑𝑖(𝑥𝑖∗ ) ∑𝑖(𝑥𝑖∗ ) ∑𝑖(𝑥𝑖∗ )
𝑛 𝑛

Now:
1 1
∑𝑖(𝑥𝑖∗ )2 = ∑𝑖(𝑋𝑖 − 𝑋̅ )2 ≅ 𝑠𝑋2 ,
𝑛 𝑛
1 1
∑𝑖 𝑥𝑖∗ 𝑢𝑖 = ∑𝑖(𝑋𝑖 − 𝑋̅)𝑢𝑖 ≅ 𝑠𝑋𝑢 ,
𝑛 𝑛

𝑠𝑋2 = sample variance of X,

𝑠𝑋𝑢 = sample covariance between X and u.

11
Under LS A2-3: As n → ∞, 𝑠𝑋2 → Var(X) and 𝑠𝑋𝑢 → Cov(X, u).

Thus,
𝑃 𝐶𝑜𝑣(𝑋,𝑢)
ˆ1 – 𝛽1 → . ())
𝑉𝑎𝑟(𝑋)

LS A1: E(u | X) = 0 implies that 𝐶𝑜𝑣(𝑋, 𝑢) is zero (because mean-

independence implies zero covariance). In this case the OLS
estimator ˆ1 is unbiased and consistent for 𝛽1 .

12
Since “Z is a determinant of Y,” we may augment the population
regression equation as:
(Multiple: MR) Yi = 0 + 1Xi + 2Zi + vi
where vi denotes the new error term. Earlier we used simple
regression (SR) and took the population regression to be
(Simple: SR) Yi = 0 + 1Xi + ui.
The relation between (MR) and (SR) is given by ui = 2Zi + vi.
Suppose E(v | X, Z) = 0. Then Cov(X, v) = 0. In this case :
𝐶𝑜𝑣(𝑋,𝑢) 𝐶𝑜𝑣(𝑋,𝛽2 𝑍+𝑣)
()) =
𝑉𝑎𝑟(𝑋) 𝑉𝑎𝑟(𝑋)
𝛽2 𝐶𝑜𝑣(𝑋,𝑍)+𝐶𝑜𝑣(𝑋,𝑣) 𝐶𝑜𝑣(𝑋,𝑍)
= = 𝛽2 × .
𝑉𝑎𝑟(𝑋) 𝑉𝑎𝑟(𝑋)
13
Thus, the consequence of using (SR) and running the LS
regression of Y on X when (MR) is the correct model is:
𝑃 𝐶𝑜𝑣(𝑋,𝑍)
̂
𝛽1,𝑠𝑖𝑚𝑝𝑙𝑒 – 𝛽1 → 𝛽2 ()
𝑉𝑎𝑟(𝑋)

where 𝛽̂1,𝑠𝑖𝑚𝑝𝑙𝑒 will be the LS estimator of 𝛽1 in the regression of

Y on X.

With the help of (), we can determine both the direction, and
magnitude of the “large sample” bias (inconsistency).

14
Recapitulation: We argued that two conditions have to be met
for the OLS estimator to suffer from omitted variable bias:
OVB condition 1 → 𝐶𝑜𝑣(𝑋, 𝑍) ≠ 0.
OVB condition 2 → 𝛽2 ≠ 0.

If both are met, then the SR slope 𝛽̂1,𝑠𝑖𝑚𝑝𝑙𝑒 estimated by OLS is

not consistent for 𝛽1 .

The “true” population slope of X will be estimated from the

multiple regression (i.e., regression of Y on X and Z.). The LS
estimator of 𝛽1 (i.e., the slope of X) in that multiple regression will
be called 𝛽̂1,𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑒 .

15
Return to test score-class size problem:
Logic: Districts that have higher PctEL are expected to:
(1) Have larger class sizes (𝐶𝑜𝑣(𝑆𝑇𝑅, 𝑃𝑐𝑡𝐸𝐿) > 0), and
(2) Do worse on standardized tests (𝛽2 < 0).
Plug these in (): 𝛽̂1,𝑠𝑖𝑚𝑝𝑙𝑒 – 𝛽1 < 0 (negative bias)

→ Ignoring the effect of “having many English learners” in the

class (omitting PctEL) would result in a larger negative slope
(overstatement of the expected negative class size effect).

16
Magnitude of bias:
𝐶𝑜𝑣(𝑋,𝑍)
We need estimates of: (1) γ = , and (2) 𝛽2 .
𝑉𝑎𝑟(𝑋)
Can we find them using regression? (1) Yes; (2) Yes.

(1) To estimate γ: We regress PctEL on STR.

̂ = 19.3 + 1.81 STR
(Auxiliary) 𝑃𝑐𝑡𝐸𝐿
̂ (𝑋, 𝑍) ≠ 0.
OVB condition 1 is met: 𝛾̂ ≠ 0 ↔ 𝐶𝑜𝑣

(2) To estimate 𝛽2 : We regress TestScore on STR and PctEL!

̂
(Multiple) 𝑇𝑒𝑠𝑡𝑆𝑐𝑜𝑟𝑒 = 686.0 – 1.10 STR – 0.65PctEL
OVB condition 2 is met: 𝛽̂2 ≠ 0.
17
Empirical implementation:
We regressed TestScore on STR and got:
̂
(Simple) 𝑇𝑒𝑠𝑡𝑆𝑐𝑜𝑟𝑒 = 698.9 – 2.28 STR

Question: What is the bias in the SR slope estimate?

We can use ().
Verify that: – 2.28 – ( – 1.10) = – 0.65 * 1.81

18
Mutiple Regression using Stata:

regress testscr str pctel, robust

Regression with robust standard errors Number of obs = 420

F( 2, 417) = 223.82
Prob > F = 0.0000
R-squared = 0.4264
Root MSE = 14.464
------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -1.101296 .4328472 -2.54 0.011 -1.95213 -.2504616
pctel | -.6497768 .0310318 -20.94 0.000 -.710775 -.5887786
_cons | 686.0322 8.728224 78.60 0.000 668.8754 703.189
------------------------------------------------------------------------------

19
Recapitulation: Omitting variables from a regression might
cause a bias if the omitted variable satisfies the two conditions:
(i) being correlated with other independent variables, and
(ii) being a determinant of the dependent variable.

A solution to deal with OVB is to apply Multiple Regression

(MR). Before studying MR in detail. Let’s pose and answer the
following question.

Question: Are there any other solutions to OVB issue?

20
Ideal Randomized Controlled Experiment
Ideal: subjects all follow the treatment protocol – perfect
compliance, no errors in reporting, etc.!
Randomized: subjects from the population of interest are
randomly assigned to a treatment or control group (so there are no
confounding factors)
Controlled: having a control group permits measuring the
differential effect of the treatment
Experiment: treatment status is assigned as part of the
experiment. The subjects have no choice, so there is no “reverse
causality” in which subjects choose the treatment they think will
work best.

21
Back to class size:
Imagine an ideal randomized controlled experiment for measuring
the effect on Test Score of reducing STR.
• In that experiment, students would be randomly assigned to
classes, which would have different sizes.
• Because they are randomly assigned, all student characteristics
(and thus ui) would be distributed independently of STRi.
• Thus, E(ui|STRi) = 0. That is, LS A1 holds in a randomized
controlled experiment.
Is this how class size data were collected? No.
Observational data often differ from this ideal! Why? Because
treatment status is not randomly assigned.
22
What we know about California school data:
Consider PctEL (percent English learners) in the [Link]
plausibly satisfies the two criteria for omitted variable bias.
Thus, the “control” and “treatment” groups differ in a systematic
way, so corr(STR, PctEL)  0.

Idea: (based on our examination of Table 6.1) We can eliminate

the influence of systematic differences in PctEL between the large
(control) and small (treatment) groups by examining the effect of
class size among districts with the same PctEL.

23
How this idea helps us:
If the only systematic difference between the large and small class
size groups is in PctEL, then we would have something similar to
the randomized controlled experiment: within each PctEL group,
assignment to treatment would be random.
This is what Multiple Regression (MR) achieves! Consider
Yi = 0 + 1Xi + 2Zi + vi
Providing E(v | X, Z) = 0,
E(Y | X, Z) = 0 + 1X + 2Z.
𝜕𝐸(𝑌|𝑋,𝑍)
Math: 1 = is the change in E(Y | X, Z) when Z is held
𝜕𝑋
constant. We often say “1 is the expected change in Y” when Z is
held constant.
24
Summary: Two ways to overcome omitted variable bias:
1. Run a randomized controlled experiment in which treatment
(STR) is randomly assigned. In this case PctEL is still a
determinant of TestScore, but PctEL is uncorrelated with STR.
(This solution to OV bias is rarely feasible.)
2. Use a regression in which the omitted variable (PctEL) is no
longer omitted: include PctEL as an additional regressor in a
multiple regression.

Linear Regression & Omitted Variable Bias
No ratings yet
Linear Regression & Omitted Variable Bias
53 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
50 pages
Omitted Variable Bias in Multiple Regression
No ratings yet
Omitted Variable Bias in Multiple Regression
30 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
18 pages
Omitted Variable Bias in Regression Analysis
No ratings yet
Omitted Variable Bias in Regression Analysis
40 pages
Manzan SW4e Ch06
No ratings yet
Manzan SW4e Ch06
52 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
36 pages
Lecture 5 EC3303 KSeah (Slides)
No ratings yet
Lecture 5 EC3303 KSeah (Slides)
29 pages
Chapter 6-Linear Regression With Multiple Regressors
No ratings yet
Chapter 6-Linear Regression With Multiple Regressors
68 pages
Applied Econometrics 2014 2
No ratings yet
Applied Econometrics 2014 2
72 pages
Linear Regression with Multiple Variables
No ratings yet
Linear Regression with Multiple Variables
13 pages
SW3e Ch6 Slides 2026
No ratings yet
SW3e Ch6 Slides 2026
57 pages
Omitted Variable Bias in Regression Analysis
No ratings yet
Omitted Variable Bias in Regression Analysis
71 pages
Omitted Variable Bias in Econometrics
No ratings yet
Omitted Variable Bias in Econometrics
33 pages
Omitted Variable Bias in Regression Analysis
No ratings yet
Omitted Variable Bias in Regression Analysis
35 pages
Introduction To Econometrics - Stock & Watson - CH 5 Slides
100% (2)
Introduction To Econometrics - Stock & Watson - CH 5 Slides
71 pages
Omitted Variable Bias in Regression
No ratings yet
Omitted Variable Bias in Regression
72 pages
Understanding Omitted Variable Bias in Econometrics
No ratings yet
Understanding Omitted Variable Bias in Econometrics
29 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
36 pages
Multiple Regression and Omitted Variable Bias
No ratings yet
Multiple Regression and Omitted Variable Bias
35 pages
Multiple Regression Analysis Guide
No ratings yet
Multiple Regression Analysis Guide
22 pages
Omitted Variable Bias in Regression Analysis
No ratings yet
Omitted Variable Bias in Regression Analysis
7 pages
Omitted Variable Bias in Regression Analysis
No ratings yet
Omitted Variable Bias in Regression Analysis
38 pages
OLS Multiple Regression and Bias Analysis
No ratings yet
OLS Multiple Regression and Bias Analysis
53 pages
Introduction to Multiple Regression Analysis
No ratings yet
Introduction to Multiple Regression Analysis
48 pages
Econometrics Ch2 Multiple Regression Analysis
No ratings yet
Econometrics Ch2 Multiple Regression Analysis
91 pages
Class Size Impact on Test Scores
100% (2)
Class Size Impact on Test Scores
84 pages
Multiple Regression and Omitted Variable Bias
No ratings yet
Multiple Regression and Omitted Variable Bias
30 pages
RE 2e Chapter 5 2026
No ratings yet
RE 2e Chapter 5 2026
44 pages
Regression With Dummy Variables Econ420 1
No ratings yet
Regression With Dummy Variables Econ420 1
47 pages
Multiple Regression Analysis in OLS
No ratings yet
Multiple Regression Analysis in OLS
30 pages
Omitted Variable Bias in Regression Analysis
No ratings yet
Omitted Variable Bias in Regression Analysis
10 pages
Class Size Impact on Test Scores Analysis
No ratings yet
Class Size Impact on Test Scores Analysis
73 pages
Multiple Regression Analysis Basics
No ratings yet
Multiple Regression Analysis Basics
40 pages
Omitted Variable Bias in MLR Explained
No ratings yet
Omitted Variable Bias in MLR Explained
49 pages
OLS Estimator Study Guide for Midterm
No ratings yet
OLS Estimator Study Guide for Midterm
7 pages
Linear Regression Fundamentals in Econometrics
No ratings yet
Linear Regression Fundamentals in Econometrics
12 pages
Omitted Variable Bias in Regression
No ratings yet
Omitted Variable Bias in Regression
7 pages
Multiple Regression Analysis of Student Achievement
No ratings yet
Multiple Regression Analysis of Student Achievement
60 pages
Skedacity in Regression Analysis
No ratings yet
Skedacity in Regression Analysis
25 pages
Understanding Omitted Variable Bias
No ratings yet
Understanding Omitted Variable Bias
22 pages
Class Size Impact on Student Outcomes
No ratings yet
Class Size Impact on Student Outcomes
30 pages
ch7 Sec1 2 3 Full Notes
No ratings yet
ch7 Sec1 2 3 Full Notes
11 pages
Nonlinear Regression Overview
No ratings yet
Nonlinear Regression Overview
14 pages
Understanding Multiple Regression Pitfalls
No ratings yet
Understanding Multiple Regression Pitfalls
16 pages
Lecture Week 3
No ratings yet
Lecture Week 3
46 pages
Understanding Simple Linear Regression
No ratings yet
Understanding Simple Linear Regression
7 pages
Assessing Multiple Regression Validity
No ratings yet
Assessing Multiple Regression Validity
80 pages
Sargan-Hausman Test in OLS Analysis
No ratings yet
Sargan-Hausman Test in OLS Analysis
54 pages
Implications of High Multicollinearity in OLS
No ratings yet
Implications of High Multicollinearity in OLS
54 pages
Linear Regression Analysis of Class Size
No ratings yet
Linear Regression Analysis of Class Size
38 pages
Understanding Omitted Variable Bias
No ratings yet
Understanding Omitted Variable Bias
4 pages
Introduction to Linear Regression Analysis
No ratings yet
Introduction to Linear Regression Analysis
64 pages
Regression Analysis: OLS Relationships
No ratings yet
Regression Analysis: OLS Relationships
6 pages
Attitude Change Through Learning Techniques
No ratings yet
Attitude Change Through Learning Techniques
10 pages
Persuasion and Attitude Change Strategies
No ratings yet
Persuasion and Attitude Change Strategies
10 pages
Understanding Consumer Attitudes
No ratings yet
Understanding Consumer Attitudes
6 pages
Emotions and Their Evolutionary Role
No ratings yet
Emotions and Their Evolutionary Role
42 pages
Macroeconomics Final Exam Questions
No ratings yet
Macroeconomics Final Exam Questions
5 pages
Pattern Recognition: Key Concepts & Methods
No ratings yet
Pattern Recognition: Key Concepts & Methods
3 pages
Section Check in Statistics Continuous Random Variables B
No ratings yet
Section Check in Statistics Continuous Random Variables B
12 pages
STAT2910 Midterm 2 Winter 2022 Exam
No ratings yet
STAT2910 Midterm 2 Winter 2022 Exam
5 pages
Lessons from LaLonde's 1986 Study
No ratings yet
Lessons from LaLonde's 1986 Study
71 pages
Chapter 8 9 10 FRQ's
No ratings yet
Chapter 8 9 10 FRQ's
2 pages
Central Tendency Analysis and Calculations
No ratings yet
Central Tendency Analysis and Calculations
18 pages
Analysis of Birth Weight Factors
No ratings yet
Analysis of Birth Weight Factors
13 pages
Surrogate Optimization Strategies Explained
No ratings yet
Surrogate Optimization Strategies Explained
12 pages
SLR vs MLR: Regression Models Explained
No ratings yet
SLR vs MLR: Regression Models Explained
98 pages
NSS 68th Round Data Pooling Report
No ratings yet
NSS 68th Round Data Pooling Report
192 pages
MKT3802 Statistical Methods Homework
No ratings yet
MKT3802 Statistical Methods Homework
2 pages
II PUC Statistics Mid-Term Model Paper
No ratings yet
II PUC Statistics Mid-Term Model Paper
3 pages
Understanding Hypothesis Testing Concepts
No ratings yet
Understanding Hypothesis Testing Concepts
12 pages
Analisis Reliabilitas Kappa SPSS
No ratings yet
Analisis Reliabilitas Kappa SPSS
6 pages
Probability and Statistics Applications
0% (4)
Probability and Statistics Applications
5 pages
MTH 313 (Statistical Methods in Engineering) New - 032145
100% (1)
MTH 313 (Statistical Methods in Engineering) New - 032145
120 pages
Professionals Play Minimax
No ratings yet
Professionals Play Minimax
22 pages
Jarque-Bera Test for Normality Explained
No ratings yet
Jarque-Bera Test for Normality Explained
3 pages
Time Series Analysis and Forecasting Techniques
No ratings yet
Time Series Analysis and Forecasting Techniques
61 pages
Acet Actuarial PPT Feb 2023 Final
No ratings yet
Acet Actuarial PPT Feb 2023 Final
15 pages
Point Estimation Techniques in Statistics
No ratings yet
Point Estimation Techniques in Statistics
3 pages
Random Variables and Probability Distributions
No ratings yet
Random Variables and Probability Distributions
6 pages
Regression Analysis Exam Questions
No ratings yet
Regression Analysis Exam Questions
5 pages
Overview of Stationary Random Processes
No ratings yet
Overview of Stationary Random Processes
14 pages
Statistical Estimation and Confidence Intervals
No ratings yet
Statistical Estimation and Confidence Intervals
2 pages
Grade 11 Statistics Learning Guide
No ratings yet
Grade 11 Statistics Learning Guide
2 pages
Advanced Econometrics I ECON 405 Sample Exam Questions and Detailed Answers
No ratings yet
Advanced Econometrics I ECON 405 Sample Exam Questions and Detailed Answers
11 pages
Python Tool for Multiscale GWR Analysis
No ratings yet
Python Tool for Multiscale GWR Analysis
39 pages
Glossophobia: The Fear of Public Speaking in Female and Male Students of University of Karachi
No ratings yet
Glossophobia: The Fear of Public Speaking in Female and Male Students of University of Karachi
15 pages
Language Modeling: Perplexity & Smoothing
No ratings yet
Language Modeling: Perplexity & Smoothing
35 pages

Omitted Variable Bias in Regression Analysis

Uploaded by

Omitted Variable Bias in Regression Analysis

Uploaded by

Omitted Variable Bias

(SW Section 6.1)

In Simple (Linear) Regression, we posit the population relation

Let Z denote an omitted variable.

We will prove this claim. First, let’s return to our primary

• Consider Z3 = Parking lot space per pupil.

What is the magnitude of the omitted variable bias?

We will obtain an expression for the asymptotic bias in the OLS

Claim about the direction of bias: Ignoring the effect of “having

Verification of 1: Districts with higher PctEL have bigger classes.

Verification of 2: Districts with higher PctEL have lower test scores.

How about the magnitude of the bias?

( X − X )(Yi − Y ) ∑𝑖 𝑥∗𝑖 𝑦𝑖∗ ∑𝑖 𝑥∗𝑖 𝑌𝑖

Population relation: Yi = 0 + 1Xi + ui. Plug this in (*):

𝑠𝑋2 = sample variance of X,

LS A1: E(u | X) = 0 implies that 𝐶𝑜𝑣(𝑋, 𝑢) is zero (because mean-

where 𝛽̂1,𝑠𝑖𝑚𝑝𝑙𝑒 will be the LS estimator of 𝛽1 in the regression of

If both are met, then the SR slope 𝛽̂1,𝑠𝑖𝑚𝑝𝑙𝑒 estimated by OLS is

The “true” population slope of X will be estimated from the

→ Ignoring the effect of “having many English learners” in the

(1) To estimate γ: We regress PctEL on STR.

(2) To estimate 𝛽2 : We regress TestScore on STR and PctEL!

Question: What is the bias in the SR slope estimate?

regress testscr str pctel, robust

Regression with robust standard errors Number of obs = 420

A solution to deal with OVB is to apply Multiple Regression

Question: Are there any other solutions to OVB issue?

Idea: (based on our examination of Table 6.1) We can eliminate

You might also like