ECO 345 - Applied Econometrics II
Chapter 9
Muhammad Salman Khalid
School of Economics & Social Sciences
March 24, 2026
Salman (IBA) Chapter 9 1 / 30
Functional Form Misspecification
Salman (IBA) Chapter 9 2 / 30
Functional Form Misspecification
So far we have assumed that our model is correctly specified - the
functional form is correct.
What if the true model is nonlinear in some variables, but we
estimated a linear model?
For Example: The true model is
wage = β0 + β1 educ + β2 exper + β3 exper 2 + µ
But we estimated:
wage = β0 + β1 educ + β2 exper + µ
This is a case of functional form misspecification - a type of
omitted variable bias.
The omitted variable is exper 2 which is correlated with exper -
leading to biased and inconsistent estimates.
Salman (IBA) Chapter 9 3 / 30
Consequences of Functional Form Misspecification
The OLS estimators will be biased and inconsistent.
The standard errors, t-statistics and F-statistics will all be invalid.
The R 2 and adjusted R 2 will be misleading.
Therefore, it is important to test whether the functional form of the
model is correctly specified.
We have already learned one way to check for this - adding squared
and interaction terms and testing their joint significance.
However, there is a more systematic test for functional form
misspecification - the RESET Test.
Salman (IBA) Chapter 9 4 / 30
RESET Test (Regression Specification Error Test)
The RESET test was proposed by Ramsey (1969).
The idea is that if the model is misspecified, then nonlinear functions
of the fitted values ŷ should be significant when added to the model.
Why ŷ ? Because ŷ is a function of all the independent variables, so
ŷ 2 and ŷ 3 capture various nonlinear combinations of the independent
variables.
The RESET test adds ŷ 2 and ŷ 3 to the original model and tests their
joint significance.
Salman (IBA) Chapter 9 5 / 30
Steps to Conduct RESET Test
1 Run the original model and obtain the fitted values ŷ .
y = β0 + β1 x1 + β2 x2 + .... + βk xk + µ
2 Run the expanded model including ŷ 2 and ŷ 3 :
y = β0 + β1 x1 + .... + βk xk + δ1 ŷ 2 + δ2 ŷ 3 + γ
3 Test the joint significance of ŷ 2 and ŷ 3 :
H0 : δ 1 = δ 2 = 0 HA : Not(H0 )
4 Use F-test with q = 2 restrictions:
2 −R 2
Rnew old
2
F = 1−Rnew2 ∼ F2,n−k−3
n−k−3
5 If we reject H0 , the functional form is misspecified.
Salman (IBA) Chapter 9 6 / 30
Limitations of RESET Test
The RESET test tells us whether the model is misspecified but NOT
how it is misspecified.
If we reject H0 , we know something is wrong with the functional
form, but we do not know which variable needs transformation.
We must use our economic intuition and knowledge of the data to
determine the correct specification.
Additionally, the RESET test can sometimes reject the null even when
the functional form is correct (e.g., due to heteroskedasticity).
It is best to use RESET test in combination with other diagnostic
tools.
Salman (IBA) Chapter 9 7 / 30
Testing Against Non-Nested Alternatives
Sometimes we have two competing models that are non-nested -
neither is a special case of the other.
For example:
Model 1 : y = β0 + β1 x1 + β2 x2 + µ
Model 2 : y = β0 + β1 log (x1 ) + β2 log (x2 ) + µ
We cannot use the standard F-test because neither model is a
restricted version of the other.
One approach is to create a comprehensive model that includes
both sets of variables:
y = β0 + β1 x1 + β2 x2 + β3 log (x1 ) + β4 log (x2 ) + µ
Then test H0 : β1 = β2 = 0 (Model 2 preferred) or H0 : β3 = β4 = 0
(Model 1 preferred).
If both are rejected or neither is rejected, the test is inconclusive.
Salman (IBA) Chapter 9 8 / 30
Using Proxy Variables for Unobserved Explanatory
Variables
Salman (IBA) Chapter 9 9 / 30
The Problem of Omitted Variables
One of the most serious problems in econometrics is the omission of a
relevant variable that is correlated with the included variables.
For example, consider the wage equation:
wage = β0 + β1 educ + β2 exper + β3 ability + µ
We cannot observe ability directly.
If we omit ability, and it is correlated with educ, then βˆ1 will be
biased.
One solution is to use a proxy variable for the unobserved variable.
Salman (IBA) Chapter 9 10 / 30
What is a Proxy Variable?
A proxy variable is an observable variable that is related to the
unobservable variable we want to control for.
For ability, common proxies include IQ score, standardized test scores,
or GPA.
Let x3∗ be the unobserved variable and x3 be the proxy.
For x3 to be a good proxy for x3∗ , we need:
µ ∗ = δ0 + δ1 x3 + v
where E (v |x1 , x2 , x3 ) = 0.
This means that after controlling for x3 , the unobserved variable x3∗
should be uncorrelated with x1 and x2 .
Salman (IBA) Chapter 9 11 / 30
Using Proxy Variables in Practice
When we include the proxy variable in our model:
y = β0 + β1 x1 + β2 x2 + β3 x3 + γ
The coefficient β3 on the proxy does not have a direct interpretation
as the effect of x3∗ .
However, β1 and β2 will be consistent estimates of the parameters of
interest.
The key benefit of using a proxy is to reduce the omitted variable bias
on the coefficients of the other explanatory variables.
It is important to note that a bad proxy (weakly related to x3∗ ) can
make things worse rather than better.
Salman (IBA) Chapter 9 12 / 30
Lagged Dependent Variable as Proxy
A very useful proxy strategy is to include the lagged dependent
variable (yt−1 ) as an explanatory variable.
For example:
crimet = β0 + β1 unemt + β2 crimet−1 + µt
Why is this useful?
crimet−1 captures all the unobserved factors from the past that affect
crime in the current period.
It serves as a proxy for historical and institutional factors that are
difficult to measure.
However, using a lagged dependent variable requires that crimet−1 is
uncorrelated with µt (no serial correlation in errors).
Go over Example 9.3 in the book.
Salman (IBA) Chapter 9 13 / 30
Properties of OLS Under Measurement Error
Salman (IBA) Chapter 9 14 / 30
Measurement Error
In practice, the data we use may not accurately measure the true
values of the variables.
The difference between the observed value and the true value is called
measurement error.
We will consider two cases:
1 Measurement error in the dependent variable (y).
2 Measurement error in an explanatory variable (x).
The consequences of measurement error depend critically on which
variable is measured with error.
Salman (IBA) Chapter 9 15 / 30
Measurement Error in the Dependent Variable
Let y ∗ be the true value and y be the observed value.
The measurement error is defined as:
e0 = y − y ∗
The true model is:
y ∗ = β0 + β1 x1 + ..... + βk xk + µ
Since y = y ∗ + e0 , the estimated model becomes:
y = β0 + β1 x1 + ..... + βk xk + (µ + e0 )
The new error term is µ + e0 .
Salman (IBA) Chapter 9 16 / 30
Measurement Error in the Dependent Variable
If e0 is uncorrelated with the explanatory variables (x1 , x2 , ...., xk ):
1 The OLS estimators remain unbiased and consistent.
2 However, the variance of the error term increases:
Var (µ + e0 ) > Var (µ).
3 This leads to larger standard errors and less precise estimates.
If e0 is correlated with some xj , then the OLS estimators will be
biased.
In most practical cases, we assume e0 is uncorrelated with the
explanatory variables.
Therefore, measurement error in y is generally less problematic than
measurement error in x.
Salman (IBA) Chapter 9 17 / 30
Measurement Error in an Explanatory Variable
This case is much more serious.
Let x1∗ be the true value and x1 be the observed value.
The measurement error is:
e1 = x1 − x1∗
The true model is:
y = β0 + β1 x1∗ + µ
Substituting x1∗ = x1 − e1 :
y = β0 + β1 x1 + (µ − β1 e1 )
The new error term is µ − β1 e1 .
Salman (IBA) Chapter 9 18 / 30
Classical Errors-in-Variables (CEV) Assumption
Under the classical errors-in-variables (CEV) assumption:
Cov (x1 , e1 ) = 0 and Cov (x1∗ , e1 ) = 0
However, even under CEV:
Cov (x1 , µ − β1 e1 ) = −β1 Cov (x1 , e1 ) ̸= 0
Wait! Isn’t Cov (x1 , e1 ) = 0 under CEV?
Actually, the CEV assumption says Cov (x1∗ , e1 ) = 0, not
Cov (x1 , e1 ) = 0.
Since x1 = x1∗ + e1 :
Cov (x1 , e1 ) = Cov (x1∗ + e1 , e1 ) = Var (e1 ) > 0
Therefore, x1 is correlated with the composite error - OLS is biased
and inconsistent.
Salman (IBA) Chapter 9 19 / 30
Attenuation Bias
Under the CEV assumption, it can be shown that:
σx2∗
plim(βˆ1 ) = β1 1
σx2∗ + σe21
1
σx2∗
Since 1
σx2∗ +σe21
< 1, the OLS estimate is biased toward zero.
1
This is known as attenuation bias or bias toward zero.
The larger the measurement error variance (σe21 ), the greater the
attenuation bias.
This means that measurement error in an explanatory variable makes
it harder to find a significant effect - the estimates are understated
in magnitude.
Go over Example 9.4 in the book.
Salman (IBA) Chapter 9 20 / 30
Measurement Error in Multiple Regression
In a multiple regression, measurement error in x1 can affect the
estimates of all coefficients, not just β1 .
However, if the mismeasured variable is uncorrelated with the other
explanatory variables, then only βˆ1 is affected.
In practice, reducing measurement error through better data
collection is the best solution.
Another solution is using Instrumental Variables (IV) estimation,
which we will cover later in the course.
Salman (IBA) Chapter 9 21 / 30
Missing Data, Nonrandom Samples, and Outliers
Salman (IBA) Chapter 9 22 / 30
Missing Data
In practice, datasets often have missing observations for some
variables.
If data is missing completely at random (MCAR), then dropping
the observations with missing values does not bias OLS estimates.
We simply have a smaller sample size and therefore less precise
estimates.
However, if data is not missing at random, then dropping
observations can lead to bias.
For example, if high-income individuals are less likely to report their
income, then the sample is not random with respect to income.
In such cases, special techniques like imputation or sample selection
corrections (Heckman correction) are needed.
Salman (IBA) Chapter 9 23 / 30
Nonrandom Samples and Sample Selection Bias
Sample selection bias occurs when the sample is not representative
of the population.
Types of nonrandom sampling:
1 Exogenous sample selection: Selection based on the independent
variable.
Example: Sampling only college graduates to study wage determinants.
OLS is still unbiased but less efficient.
2 Endogenous sample selection: Selection based on the dependent
variable.
Example: Studying wage determinants using only employed individuals.
OLS can be biased because the selection is related to the outcome.
Endogenous sample selection is much more problematic and requires
correction methods.
Salman (IBA) Chapter 9 24 / 30
Outliers and Influential Observations
An outlier is an observation that is far from the rest of the data.
OLS estimates can be very sensitive to outliers because OLS
minimizes the sum of squared residuals - squaring gives
disproportionate weight to large residuals.
How to detect outliers?
1 Scatter plots of y against each x.
2 Examining the residuals - observations with very large residuals (in
absolute value) may be outliers.
3 Studentized residuals: If |eistud | > 3, observation i may be an outlier.
It is important to understand why an observation is an outlier before
removing it.
Salman (IBA) Chapter 9 25 / 30
Dealing with Outliers
Should we remove outliers?
If the outlier is due to a data entry error, it should be corrected or
removed.
If the outlier is a legitimate observation, removing it is not advisable.
Best practices for handling outliers:
1 Report the OLS results with and without the suspected outlier(s).
2 If results change substantially, investigate the outlier further.
3 Consider using robust estimation methods that are less sensitive to
outliers.
One such method is the Least Absolute Deviations (LAD)
estimator.
Salman (IBA) Chapter 9 26 / 30
Least Absolute Deviations (LAD) Estimation
Salman (IBA) Chapter 9 27 / 30
Least Absolute Deviations (LAD)
OLS minimizes the sum of squared residuals:
n
X
min µ̂2i
βs
i=1
LAD minimizes the sum of absolute residuals:
n
X
min |µ̂i |
βs
i=1
Since LAD does not square the residuals, it gives less weight to
extreme observations.
LAD is also known as Median Regression because it estimates the
conditional median (instead of the conditional mean).
Therefore, LAD is more robust to outliers than OLS.
Salman (IBA) Chapter 9 28 / 30
LAD vs OLS
When should we prefer LAD over OLS?
1 When the data has heavy-tailed distributions (many outliers).
2 When we are interested in the median rather than the mean.
3 When the conditional distribution of y is skewed.
When should we prefer OLS?
1 When the errors are normally distributed (OLS is efficient under
normality).
2 When we are interested in the conditional mean.
3 When the data does not have significant outliers.
In practice, comparing OLS and LAD results can be informative about
the influence of outliers.
Salman (IBA) Chapter 9 29 / 30
THANK YOU
The measure of intelligence is the ability to change.
Salman (IBA) Chapter 9 30 / 30