Remedial Measures for Model Inadequacy

Chapter 5 discusses remedial measures for model inadequacy in linear regression, focusing on issues like multicollinearity, variance-stabilizing transformations, and the treatment of influential observations. It outlines methods for detecting and addressing multicollinearity, including the use of Variance Inflation Factor (VIF), and introduces various transformations to stabilize variance and linearize relationships. Additionally, the chapter covers Generalized Least Squares (GLS) and Weighted Least Squares (WLS) as techniques to handle violations of OLS assumptions, along with strategies for identifying and managing influential observations.

Uploaded by

bisratengda613

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views15 pages

Remedial Measures for Model Inadequacy

Uploaded by

bisratengda613

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Chapter 5

REMEDIAL MEASURES OF
MODEL INADEQUACY
Introduction
• Chapter 4 presented several techniques for checking the adequacy of the
linear regression model.
• Recall that regression model fitting has several implicit assumptions,
including the following:
 The model errors have constant variance and are uncorrelated.
 The model errors have a normal distribution; this assumption is made in
order to conduct hypothesis tests and construct CIs, under this
assumption, the errors are independent.
 The model should be linear at least in parameter.
• Plots of residuals are very powerful methods for detecting violations of
these basic regression assumptions.
• This form of model adequacy checking should be conducted for every
regression model that is under serious consideration for use in practice.
• In this chapter, we focus on methods and procedures for building
regression models when some of the above assumptions are violated.
5.1 Multicolliarity
• Multicollinearity occurs in statistical modelling, specifically in multiple
regression analysis, when two or more independent variables are highly correlated
with one another.
• This high correlation undermines the statistical significance of the independent
variables, making it difficult to isolate their individual effects on the dependent
variable.
Key Characteristics of Multicollinearity:
• High Correlation Between Variables: Independent variables have strong
linear relationships with each other.
• Unstable Coefficients: Regression coefficients become unstable, meaning small
changes in the data can lead to large changes in the coefficients.
• Difficulty in Interpretation: It becomes challenging to determine the true effect
of each independent variable on the dependent variable.
Indicators of Multicollinearity:
1. Variance Inflation Factor (VIF): A common metric; a VIF value > 10 suggests
high multicollinearity.
2. High R-squared with Few Significant Variables: The model has a high
overall R-squared, but individual variables are not statistically significant.
3. Correlation Matrix: Pairwise correlations between independent variables are
close to 1 or -1.
Why Multicollinearity is Problematic:
• It inflates the standard errors of the coefficients, reducing their statistical
significance.
• Makes the model less robust to changes in the dataset.
• Complicates interpretation, as the model cannot distinguish the individual effects
of correlated variables.
How to Detect Multicollinearity:
1. Calculate VIF for all predictors.
2. Examine the condition number (a large value indicates multicollinearity).
3. Analyze the correlation matrix for strong relationships.
How to Handle Multicollinearity:
1. Drop Variables: Remove one or more highly correlated variables.
2. Combine Variables: Use dimensionality reduction techniques like Principal
Component Analysis (PCA).
3. Regularization: Apply techniques like Ridge Regression or Lasso to mitigate
multicollinearity.
4. Transform Variables: Modify or scale variables to reduce correlation.
Variance Inflation Factor (VIF)
• The Variance Inflation Factor (VIF) quantifies how much the variance of a
regression coefficient is inflated due to multicollinearity in the model.
• It helps identify multicollinearity by measuring how strongly an independent
variable is correlated with the other independent variables in the model.
𝟏
For a predictor Xi, the VIF is given by: VIFi=𝟏−𝑹𝒊𝟐
Where:
• 𝑅𝑖 2 is the coefficient of determination when Xi is regressed on all other predictors.
• 𝑅𝑖 2 indicates how well Xi is explained by the other predictors. A high 𝑅𝑖 2 implies
that Xi is highly correlated with the other variables, leading to high
multicollinearity.
Interpreting VIF
• VIF = 1: No multicollinearity.
• 1 < VIF ≤ 5: Moderate correlation; acceptable.
• VIF > 5: High correlation; potentially problematic.
• VIF > 10: Severe multicollinearity; action needed
Steps to Calculate VIF
1. Fit a Regression Model: Fit a regression model with the dependent variable (Y)
and independent variables (X1,X2,...,Xp).
2. Calculate 𝑅𝑖 2 for Each Predictor: For each independent variable Xii, regress it
on all other predictors (Xj, j≠i).
3. Compute VIF: Use the formula to calculate VIF for each variable.
How to Reduce VIF
Drop Variables: Remove one or more highly correlated predictors.
Combine Variables: Create a composite variable, e.g., sum or average.
Regularization: Use Ridge or Lasso regression.
Center Variables: Mean-center predictors to reduce multicollinearity caused by
polynomial terms or interactions.
Variance-Stabilizing transformations
• Variance-stabilizing transformations (VSTs) are mathematical
transformations applied to data to stabilize the variance across the range
of a dataset.
• These transformations are commonly used in statistical modelling and
analysis when the variance of the dependent variable is not constant,
violating the assumption of homoscedasticity in regression and other
parametric methods.
Why Use Variance-Stabilizing Transformations?
• Stabilize Variance: Ensure that the variance remains consistent across
different levels of the data.
• Improve Model Fit: Help meet assumptions of linear regression and ANOVA,
such as constant variance.
• Normalize Data: In some cases, VSTs also help make data more symmetric and
closer to a normal distribution.
• Interpretability: Transformations can simplify relationships between variables,
making trends clearer.
Common Variance-Stabilizing Transformations
1. Logarithmic Transformation: Y′=log(Y)
• Use When: Variance increases with the mean (e.g., exponential or multiplicative
data).
• Example: Count data with large ranges, like population sizes.
CONT…
2. Square Root Transformation: Y′= 𝑌
• Use When: Variance is proportional to the mean.
• Example: Data based on counts, such as number of occurrences.
1
3. Reciprocal Transformation: Y′=𝑌
• Use When: Variance decreases with the mean.
• Example: When high values dominate and need reduction.
4. Box-Cox Transformation:
𝑌 λ −1
, 𝑖𝑓 λ ≠ 0
Y′= λ
log 𝑌 , 𝑖𝑓 λ = 0
• Use When: A family of transformations is needed to find the best stabilizing
power.
• Example: Data with non-constant variance, such as skewed data.
5. Arcsine Transformation (for proportions or percentages): Y′=arcsin(Y))
• Use When: Data represent proportions or percentages.
• Example: Proportion of successes in binary outcomes.
𝑌
6. Logit Transformation (for proportions): Y′=log( )
1−𝑌
Use When: Proportional data bounded between 0 and 1.
Example: Data representing probabilities.
Choosing a Transformation
1. Plot the Data: Create scatterplots or residual plots to identify patterns in
variance.
2. Check for Normality: Use histograms or Q-Q plots to examine if a
transformation is needed.
3. Apply Candidate Transformations: Test different transformations to see
which stabilizes variance best.
4. Box-Cox or Yeo-Johnson:
Use these systematic approaches to select an optimal transformation
automatically.
Transformations to linearized model
• Transformations are often applied to non-linear relationships to linearize
them, enabling the use of linear regression or simplifying the analysis.
• By applying appropriate mathematical transformations to the dependent
(Y) and/or independent variables (X), a non-linear model can often be
transformed into a linear one.
Common Non-Linear Relationships and Their Transformations
Here are examples of common non-linear relationships and how they can be
transformed:
1. Exponential Relationship (Y=𝑨𝒆𝒃𝑿 )
Example: Population growth, radioactive decay.
Linear Form: log(Y)=log(A)+bX
Transformation: Apply the logarithm to Y.
Steps to Apply Transformations
1. Understand the Relationship: Plot the data to identify non-linear patterns.
Use scatterplots or pairplots to visualize relationships.
2. Choose the Appropriate Transformation: Based on the observed pattern,
select the transformation (e.g., log, square root).
3. Apply the Transformation: Transform Y, X, or both as needed.
4. Fit the Linear Model: Use the transformed variables in a linear regression
model.
5. Evaluate the Model:
Check residual plots and 𝑅2 values to ensure the model is well-fit.
Generalized and weighted least-squares
• Both Generalized Least Squares (GLS) and Weighted Least Squares
(WLS) are extensions of Ordinary Least Squares (OLS) regression
designed to handle violations of the standard OLS assumptions,
particularly when:
• The variance of the residuals is not constant (heteroscedasticity).
• The residuals are correlated (autocorrelation or serial correlation).
Generalized Least Squares (GLS)
• GLS generalizes OLS to account for correlations and non-constant
variances in the residuals by transforming the data so that the
transformed residuals satisfy the assumptions of OLS (i.e.,
homoscedasticity and no correlation).
Model Assumptions
Residuals (ϵ\epsilonϵ) have a covariance matrix Σ, which is not the identity
matrix.
Σ describes the structure of heteroscedasticity or correlation among
residuals.
.
GLS Transformation
1. Estimate the covariance structure Σ (if not known).
2. Transform the data:
𝑌 ∗ =Σ−1/2 Y
𝑋 ∗ =Σ−1/2X
−1
3. Perform OLS on the transformed data: β^GLS=(𝑋 ∗ ’ 𝑋 ∗ ) 𝑋 ∗ ’ 𝑌 ∗
Advantages
Provides unbiased and efficient estimates of β.
Handles non-spherical error structures.
Challenges
Requires knowledge or estimation of Σ.
Estimation errors in Σ can affect results.

Weighted Least Squares (WLS)

Overview
WLS is a special case of GLS that assumes heteroscedasticity (non-constant
variance) but no correlation among residuals. It weights observations
differently based on the inverse of their variance to stabilize variance.
Model Assumptions
2
• Residuals have a diagonal covariance matrix: Var(ϵi)=σi ,i=1,2,…,n
• Larger weights are assigned to observations with smaller variance.
WLS Transformation
• The WLS estimator minimizes the following objective:
2
• Where: wi=1/σi : Weight assigned to each observation.
Steps to Apply WLS
1. Estimate the weights wi (often based on residuals from an initial OLS
model).
2. Multiply each observation by wi to transform the data.
3. Perform OLS on the weighted data.
−1
β^WLS=(𝑋 ′ WX) 𝑋 ′ Y, where W is a diagonal matrix of weights wi.
Evaluate the Model
Check if heteroscedasticity has been corrected:
Residual diagnostics (residual plots).
Statistical tests for homoscedasticity.
Interpret Results
• Analyse coefficients and their statistical significance.
Note that WLS improves efficiency but does not change the interpretation of
coefficients compared to OLS.
Treatment of influential observation
• Influential observations are data points that have a disproportionate
impact on the results of a statistical model, such as regression.
• They may distort parameter estimates, predictions, and model
assumptions, making it essential to identify and address them.
Steps to Treat Influential Observations
1. Identification of Influential Observations
Common Metrics
Leverage (hii):
• Measures how far an observation is from the centre of X values.
• Rule of thumb: High leverage if hii>2p/n , where p is the number of
• Predictors (including the intercept) and n is the number of observations.
Cook’s Distance (Di):
• Assesses how much a data point influences all regression coefficients.
• Rule of thumb: High influence if Di>1.
Studentized Residuals:
• Residuals standardized by their estimated variance.
• Rule of thumb: Observations are problematic if the absolute value of the
Studentized residual exceeds 2 or 3.
DFBETAs:
• Measures the impact of a data point on each regression coefficient.
• Rule of thumb: Large influence if ∣DFBETA∣>2/ 𝑛.
DFFITS:
Measures the influence of a data point on its fitted value.
Rule of thumb: Influential if ∣DFFITS∣>2pn
Treatment Options
Option 1: Investigate the Cause
Data Entry Errors: Correct any errors in data collection or entry.
Measurement Issues: Assess whether the data point is reliable or an artifact.
Contextual Relevance: Determine if the observation is representative of the
population or an outlier due to special conditions.
Option 2: Transformation
Variable Transformation: Apply transformations like logarithmic or square root
to reduce the impact of extreme values.
Robust Regression: Use methods less sensitive to influential points, such as robust
regression (e.g., Huber regression, M-estimators).
Option 3: Model Adjustment
Weighted Regression: Assign smaller weights to influential observations.
Nonlinear Models: Fit a model better suited to capture non-standard patterns.
Option 4: Exclusion
Exclude Influential Observations:
Remove points only if justified (e.g., they are genuine outliers or irrelevant to the
analysis).
Rerun the analysis and check the impact of exclusion.

Linear Regression Assumptions & Practices
No ratings yet
Linear Regression Assumptions & Practices
28 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
22 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
22 pages
Transforming Data for Regression Analysis
No ratings yet
Transforming Data for Regression Analysis
25 pages
Lecture Six Part One
No ratings yet
Lecture Six Part One
6 pages
Understanding Collinearity in Regression
100% (2)
Understanding Collinearity in Regression
18 pages
Five Assumptions of Multiple Regression
No ratings yet
Five Assumptions of Multiple Regression
18 pages
Multiple Linear Regression Assumptions
No ratings yet
Multiple Linear Regression Assumptions
17 pages
Multiple Regression Techniques Explained
No ratings yet
Multiple Regression Techniques Explained
27 pages
Multicollinearity in Regression Analysis
No ratings yet
Multicollinearity in Regression Analysis
61 pages
Understanding Multicollinearity and Remedies
No ratings yet
Understanding Multicollinearity and Remedies
23 pages
Understanding Reverse Causality Bias
No ratings yet
Understanding Reverse Causality Bias
48 pages
Predictive and Textual Analytics Overview
No ratings yet
Predictive and Textual Analytics Overview
24 pages
Presentation On Multicollinearity
No ratings yet
Presentation On Multicollinearity
27 pages
Polynomial Transformations in Regression
No ratings yet
Polynomial Transformations in Regression
2 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
15 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
33 pages
Five Assumptions of Multiple Linear Regression
No ratings yet
Five Assumptions of Multiple Linear Regression
10 pages
Understanding Imperfect Multicollinearity
No ratings yet
Understanding Imperfect Multicollinearity
26 pages
2 MLR 2
No ratings yet
2 MLR 2
10 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
29 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
5 pages
Understanding Multicollinearity in Regression
No ratings yet
Understanding Multicollinearity in Regression
11 pages
Lecture 4 - Multicolinearity
No ratings yet
Lecture 4 - Multicolinearity
24 pages
Understanding Multicollinearity in Regression
No ratings yet
Understanding Multicollinearity in Regression
24 pages
Box-Cox Transformation MLE in SAS & MATLAB
No ratings yet
Box-Cox Transformation MLE in SAS & MATLAB
37 pages
Multiple Regression Analysis in SPSS
No ratings yet
Multiple Regression Analysis in SPSS
23 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
11 pages
Analyzing Residuals and Model Assumptions
No ratings yet
Analyzing Residuals and Model Assumptions
12 pages
Key Assumptions of Linear Regression
No ratings yet
Key Assumptions of Linear Regression
8 pages
M CollinearityHCBF
No ratings yet
M CollinearityHCBF
17 pages
Data Analysis Techniques and Interpretation
No ratings yet
Data Analysis Techniques and Interpretation
18 pages
Regression and ANOVA Cheat Sheet
No ratings yet
Regression and ANOVA Cheat Sheet
3 pages
CLRM Assumptions in Econometrics
100% (1)
CLRM Assumptions in Econometrics
13 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
17 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
24 pages
Linear Regression in Machine Learning
100% (1)
Linear Regression in Machine Learning
55 pages
Econometrics Ch2 Multiple Regression Analysis
No ratings yet
Econometrics Ch2 Multiple Regression Analysis
91 pages
Understanding Multicollinearity in Regression
No ratings yet
Understanding Multicollinearity in Regression
24 pages
Understanding Linear Regression Techniques
No ratings yet
Understanding Linear Regression Techniques
41 pages
Detecting Multicollinearity in Regression
No ratings yet
Detecting Multicollinearity in Regression
11 pages
Statistical Models: Regression Techniques
No ratings yet
Statistical Models: Regression Techniques
93 pages
Model Validation in Econometrics
No ratings yet
Model Validation in Econometrics
27 pages
Regression Diagnostics and Transformations
No ratings yet
Regression Diagnostics and Transformations
46 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
23 pages
Statistical Tests and R Commands Guide
No ratings yet
Statistical Tests and R Commands Guide
5 pages
Multiple Regression Analysis Overview
No ratings yet
Multiple Regression Analysis Overview
23 pages
Presentation Theme 3 Multiple Linear Regression
No ratings yet
Presentation Theme 3 Multiple Linear Regression
29 pages
Model Adequacy Checking in Regression
No ratings yet
Model Adequacy Checking in Regression
15 pages
VIF and Tolerance in Regression Analysis
No ratings yet
VIF and Tolerance in Regression Analysis
6 pages
Understanding Multicollinearity in Regression
No ratings yet
Understanding Multicollinearity in Regression
25 pages
Topic1 - Multicollinearity - 013331
No ratings yet
Topic1 - Multicollinearity - 013331
25 pages
Multicolinearity
No ratings yet
Multicolinearity
29 pages
Lecture Seven - Heteroscedasticity
No ratings yet
Lecture Seven - Heteroscedasticity
5 pages
Iymcom-25-06 Dannel Isa 3
No ratings yet
Iymcom-25-06 Dannel Isa 3
27 pages
Understanding Impairment and Disability
No ratings yet
Understanding Impairment and Disability
115 pages
Understanding Anthropology: Scope & History
No ratings yet
Understanding Anthropology: Scope & History
90 pages
Overview of Artificial Intelligence Concepts
No ratings yet
Overview of Artificial Intelligence Concepts
37 pages
2 - 4 Data Cleaning
No ratings yet
2 - 4 Data Cleaning
24 pages
2 - 2 Pandas Series
No ratings yet
2 - 2 Pandas Series
8 pages
Loss Reserving Techniques Overview
100% (1)
Loss Reserving Techniques Overview
81 pages
Accounting for Defined Benefit Plans
No ratings yet
Accounting for Defined Benefit Plans
19 pages
Panel Data Econometrics Overview
No ratings yet
Panel Data Econometrics Overview
15 pages
Econometrics Formula Overview
No ratings yet
Econometrics Formula Overview
3 pages
Logistic Regression Overview
No ratings yet
Logistic Regression Overview
11 pages
Demographic Transition Model Stages Explained
No ratings yet
Demographic Transition Model Stages Explained
2 pages
Soa Exam MLC - Arch (Fall 2009)
100% (2)
Soa Exam MLC - Arch (Fall 2009)
491 pages
Shrinkage Methods in Machine Learning
No ratings yet
Shrinkage Methods in Machine Learning
2 pages
Joint Life & Survivor Benefits Overview
No ratings yet
Joint Life & Survivor Benefits Overview
36 pages
Understanding Multicollinearity in Regression
No ratings yet
Understanding Multicollinearity in Regression
5 pages
NHL Expected Goals Brian Macdonald
No ratings yet
NHL Expected Goals Brian Macdonald
8 pages
India's Population Growth Projections 2021-2101
No ratings yet
India's Population Growth Projections 2021-2101
15 pages
Actuarial Models in Disability Insurance
100% (27)
Actuarial Models in Disability Insurance
23 pages
Regression Analysis Summary Outputs
No ratings yet
Regression Analysis Summary Outputs
29 pages
Machine Learning: Regression Techniques
No ratings yet
Machine Learning: Regression Techniques
16 pages
Understanding LASSO Regression Basics
No ratings yet
Understanding LASSO Regression Basics
15 pages
Insurance Accounting and Actuarial Science
No ratings yet
Insurance Accounting and Actuarial Science
20 pages
Simple Interest Question Bank
No ratings yet
Simple Interest Question Bank
3 pages
Life Insurance Actuarial Fundamentals
No ratings yet
Life Insurance Actuarial Fundamentals
4 pages
Actuarial Advantage Book
100% (1)
Actuarial Advantage Book
60 pages
Understanding Autocorrelation in Regression
No ratings yet
Understanding Autocorrelation in Regression
6 pages
Linear Regression Techniques in SAS
No ratings yet
Linear Regression Techniques in SAS
62 pages
Overfitting and Model Evaluation Techniques
No ratings yet
Overfitting and Model Evaluation Techniques
20 pages
EC2610 Quantitative Methods Mock Exam
No ratings yet
EC2610 Quantitative Methods Mock Exam
4 pages
Fancy Indexing and R-Squared Explained
No ratings yet
Fancy Indexing and R-Squared Explained
4 pages
AE 2023 Lecture10
No ratings yet
AE 2023 Lecture10
40 pages
HKRBC Implementation Guide 2023
No ratings yet
HKRBC Implementation Guide 2023
40 pages
Simple Linear Regression in Business
No ratings yet
Simple Linear Regression in Business
60 pages
Overview of Annuities Types and Formulas
No ratings yet
Overview of Annuities Types and Formulas
20 pages
Econometrics: Regression with Dummy Variables
No ratings yet
Econometrics: Regression with Dummy Variables
19 pages

Remedial Measures for Model Inadequacy

Uploaded by

Remedial Measures for Model Inadequacy

Uploaded by

Chapter 5

Weighted Least Squares (WLS)

You might also like