0% found this document useful (0 votes)
53 views13 pages

Understanding Autocorrelation in Regression

Autocorrelation, or serial correlation, occurs when error terms in a regression model are correlated over time, violating the independence assumption of the Classical Linear Regression Model. It can lead to biased standard errors and unreliable inference, even though OLS estimators remain unbiased. The Durbin-Watson and Breusch-Godfrey tests are commonly used to detect autocorrelation, with the latter being more flexible and capable of handling higher-order autocorrelation.

Uploaded by

laibaadeelnasir
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views13 pages

Understanding Autocorrelation in Regression

Autocorrelation, or serial correlation, occurs when error terms in a regression model are correlated over time, violating the independence assumption of the Classical Linear Regression Model. It can lead to biased standard errors and unreliable inference, even though OLS estimators remain unbiased. The Durbin-Watson and Breusch-Godfrey tests are commonly used to detect autocorrelation, with the latter being more flexible and capable of handling higher-order autocorrelation.

Uploaded by

laibaadeelnasir
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

🚗 What Is Autocorrelation?

Autocorrelation, also called serial correlation, occurs when the error terms (disturbances) in a
regression model are not independent across time. This violates Assumption 6 of the Classical
Linear Regression Model (CLRM), which states:

 Cov(ut, us) = 0 for all t ≠ s

If Cov(ut, us) ≠ 0 for some t ≠ s, autocorrelation exists, meaning the error at one time period
may be correlated with errors at other periods. This issue primarily arises in time series data,
where data is chronologically ordered and observations are dependent on prior values. It is less
common in cross-sectional data, unless spatial autocorrelation (not covered here) is involved.

📚 Why It Matters in Statistics

one of the big rules in regular regression math:

Errors should not be connected to each other over time.

This rule is important because it helps make sure your model is trustworthy.

But if autocorrelation is there, it breaks this rule. It means:

 Your predictions still might be right on average (unbiased ✅)


 But they’re not the most accurate they could be (not efficient ❌)
 And the math that tells you how confident to be in your results (like standard errors)
becomes wrong

When autocorrelation is present, the OLS estimators remain unbiased but are no longer
efficient—they are not BLUE (Best Linear Unbiased Estimators)—leading to invalid inference
due to underestimated standard errors.

🎯 Why Should You Care?

If you use a model with autocorrelation and ignore it:

 Your results might look more certain than they really are.
 You could make bad decisions based on false confidence.
So, autocorrelation doesn’t destroy your model, but it does mess up how much you can trust
the details.

Causes of Autocorrelation

1. Omitted Variables:
If relevant variables (e.g., X3t) are excluded from the model, their effects get captured by
the error term. If these omitted variables are autocorrelated, so will the errors (ut, ut−1,
etc.).

Example:
If your grades also depend on how much sleep you get, and you forget to include that in
your model, then your prediction errors might follow a pattern (because your sleep
follows a pattern too).

2. Model Misspecification:
Using an incorrect functional form (e.g., modeling a quadratic relationship as linear) can
result in auto correlated residuals, especially if the independent variable follows a time
trend.
3. Systematic Measurement Errors:
Repeated systematic errors (like in inventory updates) accumulate over time, leading to
auto correlated measurement errors in the dataset.

Example:
If you always overestimate your study time, then your predictions will consistently be a bit off—
and those little mistakes can pile up and become auto correlated.

💡 In Simple Terms:

Autocorrelation happens when your mistakes today are linked to mistakes from the past. This
usually means:

 You forgot something important


 You used the wrong math
 You kept measuring something the wrong way over time

Types of Autocorrelation

First-Order Autocorrelation:

The most common form, where the current error depends linearly on the immediately previous
error term:
 ut = ρut−1 + εt
 Here, ρ is the autocorrelation coefficient (−1 < ρ < 1), and εt is a white noise (iid) term.

Three possible cases:

1. ρ = 0: No autocorrelation; ut = εt (errors are iid)


2. ρ → +1 (ρ > 0): Positive autocorrelation; current errors follow the sign of previous errors

 Errors tend to follow the same sign (e.g., positive follows positive)
 Residuals exhibit smooth, trending patterns
3. ρ → −1 (ρ < 0): Negative autocorrelation; errors alternate signs over time (saw-tooth pattern)
 Errors tend to alternate signs (positive → negative → positive...)
 Residuals show a "saw-tooth" pattern

Positive autocorrelation is more common in economics than negative.

Higher-Order Autocorrelation:

 Occurs when errors depend on multiple past error term.


 Called p-th order autocorrelation

 ut = ρ₁ut−1 + ρ₂ut−2 + ... + ρput−p + εt


 p indicates the order of autocorrelation.

Examples:

 Quarterly data with omitted seasonality may show 4th-order autocorrelation.


 Monthly data could exhibit 12th-order autocorrelation.

Higher-order autocorrelation is less frequent than first-order.

➤ Practical Notes:

 Higher-order autocorrelation is less common and more complex.


 It may arise from seasonality, model misspecification, or omitted periodic variables.
 Detection and correction require specialized tests (e.g., Ljung-Box, Durbin-Watson,
Breusch-Godfrey).

😬 What Happens If Autocorrelation Is in Your Model?

Consequences of Autocorrelation for OLS Estimators

Given the model:


 Yt = β1 + β2X2t + ... + βkXkt + ut

If the errors are autocorrelated, the following occurs:

1. OLS estimators remain unbiased and consistent:


o These properties do not rely on assumption 6, so the β estimates are still
centered around the true values and converge to them as sample size increases.

 They are unbiased → not too high or too low overall.


 They are consistent → with more data, your guesses get closer to the truth.
 The OLS estimators β^ are still:

o Unbiased: E[β^]= β
o Consistent: β^→β as n→∞

2. OLS estimators become inefficient (not BLUE):


o Due to violation of assumption 6, OLS is no longer the Best Linear Unbiased
Estimator.

 It’s like using a blurry camera—you still see the shape, but the details are fuzzy.
 This means there exist other estimators (e.g., GLS, Newey-West) that have lower
variance than OLS under autocorrelations:
 your predictions are okay, but someone else could do it better

3. Variance estimates are biased and inconsistent:


o Inference becomes unreliable:
 t-statistics are overstated (indicating false significance),
 R² is inflated (suggesting a misleadingly good model fit),
 Overall hypothesis testing (e.g., t-tests, F-tests) becomes invalid.

This causes:

 Fake t-values → you think something matters when it really doesn’t.


 Fake R² → the model looks like it fits the data well, but it might just be chasing patterns
in the noise.

🔧 Summary:
What Stays Okay What Goes Wrong

✔ Coefficients are still “correct” on average ❌ Model is less efficient

✔ Still gets better with more data ❌ t-tests and p-values become unreliable
What Stays Okay What Goes Wrong

❌ You may believe things are important when they’re not

Durbin–Watson Test
Durbin–Watson (DW) Test for Autocorrelation

The Durbin–Watson test is the most commonly used method to detect first-order serial correlation in
the residuals of a regression model.

Key Assumptions for the DW Test:

1. The regression model includes a constant.


2. Serial correlation is assumed to be first-order only (i.e., correlation between ut and ut−1).
3. The regression equation does not contain a lagged dependent variable as a regressor.

Model Setup

 Regression model:
Yt = β₁ + β₂X₂t + β₃X₃t + ... + βkXkt + ut
 Error structure:
ut = ρut−1 + εt, where |ρ| < 1

Steps of the DW Test

Step 1

Estimate the regression model using OLS and obtain the residuals (ût).

Step 2

Calculate the Durbin–Watson test statistic:

 The DW statistic measures the squared differences between successive residuals relative to the
total squared residuals.
Step 3

Use the critical values from the Durbin–Watson table (Savin & White, 1977):

 dL: lower bound


 dU: upper bound
 Based on sample size (n) and number of explanatory variables (excluding the constant, denoted
as k′)

Step 4a: Testing for Positive Serial Correlation

 H₀: ρ = 0 (no serial correlation)


 Hₐ: ρ > 0 (positive serial correlation)

Decision rules:

 If d ≤ dL → Reject H₀ → evidence of positive autocorrelation


 If d ≥ dU → Do not reject H₀ → no positive autocorrelation
 If dL < d < dU → Test is inconclusive

Step 4b: Testing for Negative Serial Correlation

 H₀: ρ = 0 (no serial correlation)


 Hₐ: ρ < 0 (negative serial correlation)

Decision rules:

 If d ≥ 4 − dL → Reject H₀ → evidence of negative autocorrelation


 If d ≤ 4 − dU → Do not reject H₀ → no negative autocorrelation
 If 4 − dU < d < 4 − dL → Test is inconclusive

Interpretation and Limitations

 The range of the DW statistic is from 0 to 4:


o d ≈ 2 → No autocorrelation
o d < 2 → Possible positive autocorrelation
o d > 2 → Possible negative autocorrelation
 Rule of Thumb:
o Estimate ρ̂ using:
ρ̂ = (∑ ûₜ ûₜ₋₁) / (∑ ûₜ²)
o The DW statistic is approximately:
d ≈ 2(1 − ρ̂ )
 Limitation:
o The DW test may be inconclusive in many situations due to the complex small sample
distribution of the statistic, which depends on the explanatory variables.
o A better alternative in some cases is the Lagrange Multiplier (LM) test, described later
in the chapter.

Breusch–Godfrey LM Test for Serial Correlation

The Breusch–Godfrey (BG) test is an advanced test for detecting serial correlation that
overcomes the limitations of the Durbin–Watson (DW) test.

✅ Advantages over DW:

 Works with lagged dependent variables


 Tests for higher-order autocorrelation
 Handles inconclusive DW results

Model Framework

The original regression model is:

Yₜ = β₁ + β₂X₂ₜ + β₃X₃ₜ +···+ βₖXₖₜ + uₜ(7.24)

The error term follows an autoregressive process:

uₜ = ρ₁uₜ₋₁ + ρ₂uₜ₋₂ +···+ ρₚuₜ₋ₚ + εₜ(7.25)

Combining both leads to the augmented regression:

Yₜ = β₁ + β₂X₂ₜ +···+ βₖXₖₜ + ρ₁uₜ₋₁ + ρ₂uₜ₋₂ +···+ ρₚuₜ₋ₚ + εₜ(7.26)

Hypotheses:

 H₀: ρ₁ = ρ₂ =···= ρₚ = 0 → No serial correlation.


 H₁: At least one ρ ≠ 0 → Serial correlation is present.
Steps to Perform the BG LM Test:

Step 1:

Estimate the original model using OLS and obtain the residuals (ûₜ).

Step 2:

Run the auxiliary regression:

ûₜ = α₀ + α₁X₂ₜ + ... + αᴿXᴿₜ + αᴿ₊₁ûₜ₋₁ + ... + αᴿ₊ₚûₜ₋ₚ

Step 3:

Compute the LM statistic:

LM = (n − p) × R²

 n = number of observations
 p = number of lagged residuals used
 R² = from the auxiliary regression in Step 2

Decision Rule:

 Compare the LM statistic to the χ² critical value with p degrees of freedom.


o If LM > χ², reject H₀ → Serial correlation is present.
o If LM ≤ χ², do not reject H₀ → No evidence of serial correlation.

Note on Lag Length (p):

 The choice of p depends on the data frequency:


o Quarterly data → p = 4
o Monthly data → p = 12
 Chosen to capture likely autocorrelation cycles in the dataset.

Conclusion:
The Breusch–Godfrey LM test is a flexible and reliable method for testing serial correlation.
It:

 Allows for lagged dependent variables


 Detects higher-order serial correlation
 Uses a simple auxiliary regression and a χ²-based decision rule

Resolving Autocorrelation (When ρ is Known)

The presence of autocorrelation makes OLS estimators inefficient, so it is crucial to correct for
it.

🧠 The Fix (When ρ is Known)

Let’s say we know how much today's error depends on yesterday’s error — we call that ρ (rho).

If we know ρ, we can use that to "clean up" our data so the errors stop following each other.

1. Change the data slightly using ρ:


o We take today’s value and subtract ρ times yesterday’s value.
o We do this for both the thing we’re trying to predict (Y) and all the things we use
to predict it (X).
2. This transformation removes the pattern in the errors — they’re now random again!

Model with Autocorrelated Errors

Consider the regression model:

Yₜ = β₁ + β₂X₂ₜ + β₃X₃ₜ +···+ βₖXₖₜ + uₜ(7.29)

Assume the error term follows first-order autocorrelation:

uₜ = ρuₜ₋₁ + εₜ(7.30)

Correcting Autocorrelation with Known ρ

Using the assumption of first-order autocorrelation, we transform the data through quasi-
differencing (also called generalized differencing) to eliminate serial correlation in the error
term.
The transformed model becomes:

Y*ₜ = β*₁ + β₂X*₂ₜ + β₃X*₃ₜ +···+ βₖX*ₖₜ + εₜ(7.34)

Where the transformed variables are defined as:

 Y*ₜ = Yₜ − ρYₜ₋₁
 β*₁ = β₁(1 − ρ)
 X*ᵢₜ = Xᵢₜ − ρXᵢₜ₋₁

These transformations remove autocorrelation from the residuals, allowing us to apply OLS to
the transformed equation and obtain efficient (BLUE) estimators.

Treatment of First Observation

To avoid losing the first observation during the differencing process, it is suggested to transform
the first observation as follows:

 Y*₁ = Y₁ / √(1 − ρ²)


 X*ᵢ₁ = Xᵢ₁ / √(1 − ρ²)(7.35)

Conclusion

The process of transforming the model via quasi-differencing corrects for serial correlation
when ρ is known, producing error terms that satisfy the CLRM assumptions. As a result,
applying OLS to the transformed equation gives us efficient and unbiased estimators.

Question:
Here's a clear breakdown of the Durbin-Watson (DW) test for detecting autocorrelation in a
regression model—plus its disadvantages and some alternative tests.

🧪 Steps of the Durbin-Watson (DW) Test

The Durbin-Watson test checks for first-order autocorrelation in the residuals (errors) of a
regression model. Here’s how it works:

🔢 Step-by-Step:
Interpret the DW statistic:

DW Value Interpretation

≈2 No autocorrelation

<2 Positive autocorrelation

>2 Negative autocorrelation

≈0 Strong positive autocorrelation

≈4 Strong negative autocorrelation

1. Compare to critical values:


o Use Durbin-Watson tables with lower (d_L) and upper (d_U) bounds.
o There is an inconclusive zone between d_L and d_U where the test gives no
answer.

⚠️Disadvantages of the DW Test

1. ❌ Only detects first-order autocorrelation


o It won’t catch higher-order patterns (e.g., seasonal effects).
2. ❌ Not valid with lagged dependent variables
o If the model includes $Y_{t-1}$ as an explanatory variable, DW is biased.
3. ❌ Inconclusive zone
o Sometimes, the test gives no clear result (i.e., it falls between d_L and d_U).
4. ❌ Not suitable for large or complex models
o It becomes less informative in models with many predictors or in large datasets.

✅ Alternative Tests for Autocorrelation

1. Breusch-Godfrey (BG) Test ✅ Recommended

 Tests for higher-order autocorrelation.


 Works even with lagged dependent variables.
 More flexible and powerful than DW.

2. Ljung-Box Q Test

 Tests overall randomness of residuals.


 Can check for multiple lags (2nd, 3rd, etc.).
 Often used in time series models (e.g., ARIMA).

3. Durbin’s h Test

 A variation of DW for models that include lagged dependent variables.


 More complex, but better suited than DW in those cases.

🧠 Summary Table:
Works with Lagged Checks Higher-
Test Detects Common Use
DV? Order?

Durbin-Watson First-order only ❌ No ❌ No Basic OLS

Breusch- 1st and higher- Preferred in


✅ Yes ✅ Yes
Godfrey order regression

Ljung-Box Higher-order ✅ Yes ✅ Yes Time series analysis

Durbin’s h First-order ✅ Yes ❌ No Regression with lags

Common questions

Powered by AI

First-order autocorrelation refers to a scenario where current error terms are linearly dependent on the immediately preceding error term. Higher-order autocorrelation involves dependency on multiple previous error terms and is often linked to data with inherent periodicity like quarterly or monthly data, where seasonality is not accounted for. First-order autocorrelation is more common, while higher-order is less frequent but more complex to detect, requiring tests like Breusch-Godfrey rather than the Durbin-Watson test .

The Durbin-Watson test only detects first-order autocorrelation and is invalid when lagged dependent variables are present, often resulting in inconclusive results between critical value bounds. Alternatives like the Breusch-Godfrey test handle higher-order autocorrelation and work with models that include lagged dependent variables. The Ljung-Box test evaluates overall randomness of residuals across multiple lags, while Durbin’s h test is adapted for models with lagged dependent variables. These alternatives provide more comprehensive checks on autocorrelation in various model scenarios .

Autocorrelation violates Assumption 6 of the Classical Linear Regression Model, which states that covariances of error terms should be zero for all different time periods. It's significant because when autocorrelation is present, the model's standard error estimates become unreliable, affecting the confidence in hypothesis tests and leading to over-optimistic conclusions about the precision of estimators. Despite remaining unbiased, OLS estimators are inefficient and no longer the Best Linear Unbiased Estimators (BLUE).

The Breusch-Godfrey test is preferable when there is a need to test for higher-order autocorrelations, especially in models with lagged dependent variables. It overcomes the limitations of the Durbin-Watson test, which fails under these conditions and provides inconclusive results. The Breusch-Godfrey test offers flexibility and power in complex models typically encountered in regression and time series analysis .

The main causes of autocorrelation include omitted variables, model misspecification, and systematic measurement errors. Omitted variables like relevant predictors not included in the model lead to correlated errors if the omitted variables have autocorrelated effects themselves. Model misspecification, such as incorrect functional forms, also introduces autocorrelation. Systematic measurement errors accumulated over time, such as repeated estimation biases, can lead to autocorrelated errors. These factors cause model predictions to be inefficient and can lead to misleading confidence in the results due to underestimated standard errors .

Positive autocorrelation manifests in time series data as a tendency for errors to carry the same sign across consecutive periods, creating trends or systematic patterns in residuals. This continuity in error direction can inflate R², overstate model fit, and invalidate standard hypothesis tests by providing overly optimistic confidence intervals, leading to erroneous conclusions about predictor significance .

Autocorrelation is more prevalent in time series data because such data consists of chronologically ordered observations where dependencies between sequential values are common. Time series data reflect phenomena over time that naturally exhibit temporal patterns or trends, unlike cross-sectional data, which lack this temporal ordering and dependence unless spatial autocorrelation is involved .

To detect higher-order autocorrelation, methods such as the Breusch-Godfrey test can be employed as it accommodates lagged dependent variables and checks for autocorrelation beyond the first order. The Ljung-Box Q test is another technique that evaluates overall randomness and can detect autocorrelation across multiple lags, making it suitable for deeper time series analysis .

If the autocorrelation coefficient (ρ) is known, corrections can be made by quasi-differencing the data. This involves adjusting both dependent and independent variables by subtracting ρ times their lagged values. This transformation removes the pattern of autocorrelation, making error terms independent and identically distributed, after which efficient estimators can be achieved by applying OLS on the transformed data .

Correcting autocorrelation using methods like quasi-differencing restores compliance with the classical regression model assumptions by transforming the data to eliminate serial correlation from the error terms. This allows the application of OLS to yield efficient and unbiased estimators, thus achieving BLUE status. With corrected estimations, variance estimates become reliable, leading to valid hypothesis testing and accurate confidence intervals, significantly improving model inference and decision-making reliability .

You might also like