0% found this document useful (0 votes)
15 views29 pages

Multicolinearity

Multicollinearity occurs when independent variables in a regression model are correlated, violating the assumption of independence. It can be perfect, leading to indeterminate regression coefficients, or imperfect, resulting in large standard errors and difficulties in estimation. Causes include data collection methods, model specification, and constraints in the population, with detection methods involving high R-squared values and significant pairwise correlations among regressors.

Uploaded by

ssashajib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views29 pages

Multicolinearity

Multicollinearity occurs when independent variables in a regression model are correlated, violating the assumption of independence. It can be perfect, leading to indeterminate regression coefficients, or imperfect, resulting in large standard errors and difficulties in estimation. Causes include data collection methods, model specification, and constraints in the population, with detection methods involving high R-squared values and significant pairwise correlations among regressors.

Uploaded by

ssashajib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Multicolinearity

Farjana Eyasmin
Lecturer, Department of Economics
Pabna University of Science and Technology
Introduction
Violation of the assumption that no independent variable is a perfect
linear function of one or more other independent variables. If two or
more independent variable are corelated with each other the problem is
called Multicollinearity.
Consider multiple regression model Yi = B1 + B2 X 2i + B3 X 3i + ui
If 𝑋2 and 𝑋3 have a relationship the problem arises is multicollinearity.
Perfect Vs. Imperfect Multicollinearity
• Perfect collinearity occurs when there is a perfect relationship between two or more independent variables.
When independent variables take a constant value in all observation.
For the k-variable regression involving explanatory variable 𝑋1 , 𝑋2 , …… 𝑋𝑘 (where 𝑋1 = 1 for all observations to
allow for the intercept term), an exact linear relationship is said to exist if the following condition is satisfied:
𝑋1 𝜆1 + 𝑋2 𝜆2 + 𝑋3 𝜆3 + ⋯ … . . +𝑋𝑘 𝜆𝑘 = 0
where 𝜆1 , 𝜆2 , … … 𝜆𝑘 are constants such that not all of them are zero simultaneously.
• When the relation between two or more variables is not linear that’s the scatter plot look like linear
relationship is known as linear relationship.
𝑋1 𝜆1 + 𝑋2 𝜆2 + 𝑋3 𝜆3 + ⋯ … . . +𝑋𝑘 𝜆𝑘 + 𝑣𝑖 = 0
Where, 𝑣𝑖 is stochastic error term.

𝜆1 𝜆3 𝜆𝑘 1
𝑋2 = − 𝑋1 − 𝑋3 − ⋯ … . . − 𝑋𝑘 − 𝑣𝑖
𝜆2 𝜆2 𝜆2 𝜆2
which shows that 𝑋2 is not an exact linear combination of other X’s because itt is also determined by the
stochastic error term 𝑣𝑖 .
• If multicollinearity is perfect ,the regression coefficients of the X variables are indeterminate and their
standard errors are infinite. On the other hand, If multicollinearity is less than perfect, the regression
coefficients, although determinate, possess large standard errors (in relation to the coefficients themselves),
which means the coefficients cannot be estimated with great precision or accuracy
Causes of Multicollinearity
• The data collection method employed, for example, sampling over a limited range
of the values taken by the regressors in the population.
• Constraints on the model or in the population being sampled. For example, in the
regression of electricity consumption on income (X2) and house size (X3) there is
a physical constraint in the population in that families with higher incomes
generally have larger homes than families with lower incomes.
• Model specification, for example, adding polynomial terms to a regression model,
especially when the range of the X variable is small.
• An overdetermined model, when the model has more explanatory variables than
the number of observations. This could happen in medical research where there
may be a small number of patients about whom information is collected on a
large number of variables.
• In time series data, may be that the regressors included in the model share a
common trend, that is, they all increase or decrease over time.
Estimation of Parameters in the presence of Multicollinearity
• In the presence of perfect multicollinearity, the regression coefficients of the X variables
are indeterminate and their standard errors are infinite.
Consider, multiple regression:

Then the parameter of multiple regression is

If assume 𝑋3𝑖 = 𝜆𝑋2𝑖 where, 𝜆 is non zero constant,


And

Where, 𝜎 2 is the variance of 𝑢𝑖


Estimation of Parameters in the presence of imperfect Multicollinearity
In case of imperfect multicollinearity replace 𝑋3𝑖 = 𝜆𝑋2𝑖 + 𝑣𝑖
𝑣𝑖 = error term and σ 𝑋2𝑖 𝑣𝑖 = 0
(σ 𝑦𝑖 𝑋2𝑖 ){σ(𝜆𝑋2𝑖 + 𝑣𝑖 )2 } − {σ 𝑦𝑖 𝜆𝑋2𝑖 + 𝑣𝑖 }{σ 𝑋2𝑖 (𝜆𝑋2𝑖 + 𝑣𝑖 )}
β̂2 = 2
(σ 𝑋2𝑖 2 ){ σ 𝜆𝑋2𝑖 + 𝑣𝑖 2 } − {σ 𝑋2𝑖 (𝜆𝑋2𝑖 + 𝑣𝑖 )}
2
(σ 𝑦𝑖 𝑋2𝑖 ){σ(𝜆2 𝑋2𝑖 + 𝑣𝑖2 + 2𝜆𝑋2𝑖 𝑣𝑖 )} − (𝜆 σ 𝑦𝑖 𝑋2𝑖 + σ 𝑦𝑖 𝑣𝑖 ){σ 𝜆𝑋2𝑖
2
+ σ 𝑋2𝑖 𝑣𝑖 }
=
(σ 𝑋2𝑖 2 ){ (σ 𝜆2 𝑋2𝑖
2
+ 𝑣𝑖2 + 2𝜆𝑋2𝑖 𝑣𝑖 )} − (σ 𝜆𝑋2𝑖
2
+ σ 𝑋2𝑖 𝑣𝑖 )2
2
(σ 𝑦𝑖 𝑋2𝑖 ){𝜆2 (σ 𝑋2𝑖 ) + σ 𝑣𝑖2 } − (𝜆 σ 𝑦𝑖 𝑋2𝑖 + σ 𝑦𝑖 𝑣𝑖 )(𝜆 σ 𝑋2𝑖
2
)
=
(σ 𝑋2𝑖 2 ){ (σ 𝜆2 𝑋2𝑖
2
+ 𝑣𝑖2 + 2𝜆𝑋2𝑖 𝑣𝑖 )} − (σ 𝜆𝑋2𝑖
2 2
)
2 σ
𝜆2 σ 𝑋2𝑖 𝑦𝑖 𝑋2𝑖 + σ 𝑣𝑖2 σ 𝑦𝑖 𝑋2𝑖 − 𝜆2 σ 𝑦𝑖 𝑋2𝑖 σ 𝑋2𝑖
2 2 σ
− 𝜆 σ 𝑋2𝑖 𝑦𝑖 𝑣𝑖
=
𝜆2 (σ 𝑋2𝑖 2 )2 + σ 𝑋2𝑖 𝑣𝑖 − 𝜆2 (σ 𝑋2𝑖 2 )2
2 σ 2

σ 𝑣𝑖2 σ 𝑦𝑖 𝑋2𝑖 − 𝜆 σ 𝑋2𝑖


2 σ
𝑦𝑖 𝑣𝑖
= 2 σ 2
σ 𝑋2𝑖 𝑣𝑖
Similarly,
2
σ 𝑋2𝑖 σ 𝑦𝑖 𝑣𝑖
β̂3̂ = 2 σ 2
σ 𝑋2𝑖 𝑣𝑖

2
𝜎 2 σ 𝑋3𝑖
𝑣𝑎𝑟 𝛽መ2 = 2 2
σ 𝑋2𝑖 𝑋3𝑖 − (σ 𝑋2𝑖 𝑋3𝑖 )2
𝜎 2 σ(𝜆𝑋2𝑖 + 𝑣𝑖 )2
= 2
2
σ 𝑋2𝑖 2 2
σ(𝜆𝑋2𝑖 + 𝑣𝑖 ) −(𝜆 σ 𝑋2𝑖 + σ 𝑋2𝑖 𝑣𝑖 )
2
𝜎 2 (𝜆2 σ 𝑋2𝑖 + σ 𝑣𝑖2 )
= 2 2 σ 2 σ 2 2 2
𝜆2 (σ 𝑋2𝑖 ) + 𝑋2𝑖 𝑣𝑖 − 𝜆2 (σ 𝑋2𝑖 )
2
𝜎 2 (𝜆2 σ 𝑋2𝑖 + σ 𝑣𝑖2 )
= 2 σ 2
σ 𝑋2𝑖 𝑣𝑖
2
𝜎 2 σ 𝑋2𝑖
𝑣𝑎𝑟 𝛽መ3 = 2 σ 2
σ 𝑋2𝑖 𝑣𝑖
So, in case of imperfect multicollinearity, parameter and variance can be estimated.

Consequences of Multicollinearity
• Firstly, Although BLUE, the OLS estimators have large variances and covariances, making precise
estimation difficult.
Where, 𝑟23 is coefficient of correlation between 𝑋2 and
𝑋3 . As 𝑟23 tends toward 1, that is, as collinearity
increases, the variances of the two estimators increase
and in the limit when 𝑟23 = 1, they are infinite. Since, 𝑟23
increases toward 1, the covariance of the two
estimators also increases in absolute value.
Where,cov 𝛽መ2 , 𝛽መ3 ≡ cov 𝛽መ3 , 𝛽መ2
The speed with which variances and covariances
increase can be seen with the variance-inflating factor
(VIF), which is defined as

2
VIF shows how the variance of an estimator is inflated by the presence of multicollinearity. As 𝑟23 approaches 1,
the VIF approaches infinity. That is, as the extent of collinearity increases, the variance of an estimator
increases, and in the limit it can become infinite. Again if VIF is 1, there is no multicollinearity.

which show that the variances of 𝛽መ2 𝑎𝑛𝑑 𝛽መ3 are directly
proportional to the VIF.

The inverse of the VIF is called tolerance (TOL). When 𝑅𝑗2 = 1


𝑇𝑂𝐿𝑗 = 0, when 𝑅𝑗2 = 0 (i.e., no collinearity whatsoever), 𝑇𝑂𝐿𝑗 = 1.
Because of the intimate connection between VIF and TOL, one can use them interchangeably.
Secondly, since variance is larger, the confidence intervals tend to be much wider, leading to the
acceptance of the “zero null hypothesis” (i.e., the true population coefficient is zero) more readily.
Thirdly, because of larger variance the t ratio of one or more coefficients tends to be statistically
insignificant.
Fourth, Although the t ratio of one or more coefficients is statistically insignificant, R2, the overall
measure of goodness of fit, can be very high.
Fifth, The OLS estimators and their standard errors can be sensitive to small changes in the data.
Consider, 𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝑢

(σ 𝑋1𝑦 )(σ 𝑋22 )−(σ 𝑋2𝑦 )(σ 𝑋1 𝑋2 ) (σ 𝑋2𝑦 )(σ 𝑋12 )−(σ 𝑋1𝑦 )(σ 𝑋1 𝑋2 )
Where, 𝛽̂1 = 2 and 𝛽̂2 = 2
(σ 𝑋22 )(σ 𝑋12 )−(σ 𝑋1 𝑋2 ) (σ 𝑋22 )(σ 𝑋12 )−(σ 𝑋1 𝑋2 )

Now approaches 𝑋2 = 𝑘𝑋1 then

𝑘 2 (σ 𝑋1𝑦 ) (σ 𝑋12 ) − 𝑘 2 (σ 𝑋1𝑦 )(σ 𝑋12 )


𝛽̂1 = 2 2
=0
𝑘 2 (σ 𝑋12 ) −𝑘 2 (σ 𝑋1 )
Similarly, 𝛽̂2 = 0
Detection Of Multicollinearity
• High 𝑹𝟐 but few significant t ratios.
• High pair-wise correlations among regressors. high zero-order correlations are a sufficient but
not a necessary condition for the existence of multicollinearity because it can exist even though
the zero-order or simple correlations are comparatively low.
Consider, 𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + 𝛽4 𝑋4𝑖 + 𝑢𝑖 and 𝑋4𝑖 = 𝜆2 𝑋2𝑖 + 𝜆3 𝑋3𝑖
Where, 𝜆2 and 𝜆3 are constant, not both zero. Obviously, 𝑋4 is an exact linear combination of 𝑋2 and 𝑋3 ,
2
giving 𝑅4.23 = 1, the coefficient of determination in the regression of 𝑋4 on 𝑋2 and 𝑋3 .

2
Since, 𝑅4.23 = 1, because of perfect collinearity, we obtain

If 𝑟42 = 0.5, 𝑟43 = 0.5, and 𝑟23 = − 0.5 , which are not very high values. Therefore, in models involving more
than two explanatory variables, the simple or zero-order correlation will not provide an infallible guide to the
presence of multicollinearity. Of course, if there are only two explanatory variables, the zero-order
correlations will suffice.
• Examination of partial correlations. Because of the problem just mentioned in relying on zero-order
correlations, Farrar and Glauber have suggested that one should look at the partial correlation coefficients.
2 2 2 2
Thus, in the regression of Y on 𝑋2 , 𝑋3 and 𝑋4 , a finding that 𝑅1.234 is very high but 𝑟12.34 𝑟13.24 𝑟14.23 are
comparatively low may suggest that the variables 𝑋2 , 𝑋3 and 𝑋4 are highly intercorrelated and that at least
one of these variables is superfluous.
• Auxiliary Regression. regress each 𝑋𝑖 on the remaining X variables and compute the corresponding 𝑅2 ,
which we designate as 𝑅𝑖2 each one of these regressions is called an auxiliary regression, auxiliary to the
main regression of Y on the X’s.

F distribution follows the k− 2 and n− k+ 1df. If the


computed F exceeds the critical Fi at the chosen level of
significance, it is taken to mean that the particular Xi is
collinear with other X’s.

• Tolerance and variance inflation factor.


1 The larger the value of 𝑉𝐼𝐹𝑖 , the more “troublesome” or collinear the variable 𝑋𝑗 .
𝑉𝐼𝐹𝑖 =
1−𝑅𝑖2
1 As a rule of thumb, if the VIF of a variable exceeds 10, which will happen if 𝑅𝑗2
𝑇𝑂𝐿𝑖 = = 1 − 𝑅𝑖2 exceeds 0.90, that variable is said be highly collinear. The closer is 𝑇𝑂𝐿𝑖 to zero,
𝑉𝐼𝐹𝑖
the greater the degree of collinearity of that variable with the other regressors
• Eigenvalues and condition index.
If k is between 100 and 1000 there is moderate to strong
multicollinearity and if it exceeds 1000 there is severe
multicollinearity. Alternatively, if the 𝐶𝐼 (= √𝑘) is between
10 and 30, there is moderate to strong multicollinearity and if
it exceeds 30 there is severe multicollinearity.
Remedial of Multicollinearity
1. Do nothing Procedure
2. Rule of Thumb Procedure
Rule of Thumb procedure:
I. A priori Information: consider, 𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + 𝑢𝑖
Where, 𝑌𝑖 = 𝐶𝑜𝑛𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 𝑐𝑜𝑠𝑡, 𝑋2 = 𝐼𝑛𝑐𝑜𝑚𝑒 𝑋3 = 𝑤𝑒𝑎𝑙𝑡ℎ and
assume 𝛽3 = 0.10𝛽2
Replace 𝛽3 = 0.10𝛽2 to the consumption equation,
𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + 0.10𝛽2 𝑋3𝑖 + 𝑢𝑖
= 𝛽1 + 𝛽2 (𝑋2𝑖 +0.10𝑋3𝑖 ) + 𝑢𝑖
= 𝛽1 + 𝛽2 𝑋2𝑖 + 𝑢𝑖
• Combining cross-sectional and time series data. Consider,
𝑙𝑛 𝑌𝑡 = 𝛽1 + 𝛽2 𝑙𝑛𝑃𝑡 + 𝛽3 𝑙𝑛𝐼𝑡 + 𝑢𝑡
Where, 𝑌𝑡 = 𝑛𝑢𝑚𝑏𝑒𝑟𝑠 𝑜𝑓 𝑐𝑎𝑟 𝑠𝑜𝑙𝑑 𝑃𝑡 = 𝑝𝑟𝑖𝑐𝑒 𝑜𝑓 𝑐𝑎𝑟𝑠; 𝐼𝑡 =Income; t= time
Let the cross-sectionally estimated income elasticity be 𝛽3 and the time series regression would be
𝑌𝑡∗ = 𝛽1 + 𝛽2 𝑙𝑛𝑃𝑡 + 𝑢𝑡 where, 𝑌𝑡∗ = 𝑙𝑛𝑌𝑡 − 𝛽̂3 𝑙𝑛𝐼𝑡
• Dropping a variable(s) and specification bias.
• Transformation of variables.
Consider, 𝑌𝑡 = 𝛽1 + 𝛽2 𝑋2𝑡 + 𝛽3 𝑋3𝑡 + 𝑢𝑡
Where, 𝑌𝑖 = 𝐶𝑜𝑛𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 𝑐𝑜𝑠𝑡, 𝑋2 = 𝐼𝑛𝑐𝑜𝑚𝑒 𝑋3 = 𝑤𝑒𝑎𝑙𝑡ℎ
At the time period (t-1)
𝑌𝑡−1 = 𝛽1 + 𝛽2 𝑋2𝑡−1 + 𝛽3 𝑋3𝑡−1 + 𝑢𝑡−1
𝑌𝑡 − 𝑌𝑡−1 = 𝛽2 (𝑋2𝑡 − 𝑋2𝑡−1 ) + 𝛽3 (𝑋3𝑡 − 𝑋3𝑡−1 ) + 𝑣𝑡 where, 𝑣𝑡 = 𝑢𝑡 − 𝑢𝑡−1
Another option to transfer proportionately,
𝑌𝑡 = 𝛽1 + 𝛽2 𝑋2𝑡 + 𝛽3 𝑋3𝑡 + 𝑢𝑡 Where, 𝑌𝑖 = 𝑟𝑒𝑎𝑙 𝐶𝑜𝑛𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛, 𝑋2 = 𝐺𝑁𝑃
𝑋3 = 𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
𝑌𝑡 𝛽1 𝛽2 𝑋2𝑡 𝑢𝑡
= + + 𝛽3 +
𝑋3𝑡 𝑋3𝑡 𝑋3𝑡 𝑋3𝑡
• Additional or new data. In the three-variable model

as the sample size increases, σ 𝑋2𝑖 will generally increase. Therefore, for any given 𝑟23 , the variance
of 𝛽̂2 will decrease, thus decreasing the standard error, which will enable us to estimate 𝛽2 more
precisely.
For example,
𝐶𝑡 = 𝛼 + 𝛽𝑌𝑡 + 𝑢𝑡 and
𝑌𝑡 = 𝐶𝑡 + 𝐼𝑡
Where, 𝐶𝑡 = 𝑐𝑜𝑛𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛; 𝑌𝑡 = 𝐼𝑛𝑐𝑜𝑚𝑒; 𝐼𝑡 = 𝑖𝑛𝑣𝑒𝑠𝑡𝑚𝑒𝑛𝑡
Now, 𝐶𝑡 = 𝛼 + 𝛽 𝐶𝑡 + 𝐼𝑡 + 𝑢𝑡
1 − 𝛽 𝐶𝑡 = 𝛼 + 𝛽𝐼𝑡 + 𝑢𝑡
𝛼 𝛽
𝐶𝑡 = + 𝐼𝑡 + 𝑢𝑡
1−𝛽 1−𝛽
• Reducing collinearity in polynomial regressions. polynomial regression models is a special feature of
models where the explanatory variable(s) appear with various powers. Thus, in the total cubic cost function
involving the regression of total cost on output, (output)2 , and (output)3
As for example, 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝛽2 𝑋𝑖 2 + 𝛽3 𝑋𝑖 3 + 𝑢𝑡 where 𝑌 = 𝑡𝑜𝑡𝑎𝑙 𝑐𝑜𝑠𝑡; 𝑋 = 𝑜𝑢𝑡𝑝𝑢𝑡
POLYNOMIAL REGRESSION MODELS
The linear regression model 𝑦 = 𝑋𝛽 + 𝜀 is a general model for fitting
any relationship that is linear in the unknown parameters β. This
includes the important class of polynomial regression models. For
example, the second - order polynomial in one variable.
𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝛽1 𝑥 2 + 𝜀
and the second - order polynomial in two variables,
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽11 𝑥12 + 𝛽22 𝑥22 + 𝛽12 𝑥1 𝑥2 + 𝜀
are linear regression models. Polynomials are widely used in situations
where the response is curvilinear, as even complex nonlinear
relationships can be adequately modeled by polynomials over
reasonably small ranges of the x ’ s.
POLYNOMIAL MODELS
Basic Principles:
a polynomial regression model in one variable, consider,
𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝛽1 𝑥 2 + 𝜀
This model is called a second - order model in one variable. It is also
sometimes called a quadratic model , since the expected value of y is
𝐸(𝑦) = 𝛽0 + 𝛽1 𝑥 + 𝛽1 𝑥 2
In general, the k th - order polynomial model in one variable is
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥12 + 𝛽2 𝑥13 + ⋯ … . +𝛽𝑘 𝑥1𝑘
There are several important considerations that arise when fitting a
polynomial in one variable.
1. Order of the Model: It is important to keep the order of the model
as low as possible . When the response function appears to be
curvilinear, transformations should be tried to keep the model first
order.
First order model: Assume that the response variable y is related to a power of the
regressor, say
𝜉 = 𝑥 𝑎 , as 𝐸(𝑦) = 𝑓 (𝜉, 𝛽0 , 𝛽1 ) = 𝛽0 + 𝛽1 𝜉
𝑥𝑎 , 𝑎≠0
Where, 𝜉 = 𝑓 𝑥 = ቊ
ln 𝑥 , 𝑎=0
and 𝛽0 , 𝛽1, and α are unknown parameters. Suppose that α0 , is an initial guess of
the constant α . Usually this first guess is α0 = 1, so that 𝜉, α0 = x0 = 𝑥, or that
no transformation at all is applied in the first iteration. Expanding about the initial
guess in a Taylor series and ignoring terms of higher than first order gives
𝑑𝑓 𝜁0 ,𝛽0 ,𝛽1
𝐸 𝑦 = 𝑓 𝜁0 , 𝛽0 , 𝛽1 + 𝛼 − 𝛼0 𝑑𝛼 𝜁=𝜁0 ……. (1)
𝛼=𝛼0
𝑑𝑓 𝜁0 ,𝛽0 ,𝛽1
= 𝛽0 + 𝛽1 𝑥 + 𝛼 − 1 𝑑𝛼 𝜁=𝜁0
𝛼=𝛼0
𝑑𝑓 𝜁0 ,𝛽0 ,𝛽1
Now if the term in braces in 𝑑𝛼 𝜁=𝜁0 were known, it could be treated as an
𝛼=𝛼0
additional regressor variable, and it would be possible to estimate the parameters 𝛽0 , 𝛽1,
and α in equation (1) by least squares.
The estimate of α could be taken as an improved estimate of the transformation parameter.
The term can be written as
𝑑𝑓 𝜁0 ,𝛽0 ,𝛽1 𝑑𝑓 𝜁0 ,𝛽0 ,𝛽1 𝑑𝜁
𝑑𝛼 𝜁=𝜁0 = 𝑑𝜁
𝛼=𝛼0 𝜁=𝜁0 𝑑𝛼 𝛼=𝛼0
𝑎 𝑑𝜉
and since the form of the transformation is known, that is, 𝜉 = 𝑥 , we have = 𝑥 ln x .
𝑑𝑓 𝜁0 ,𝛽0 ,𝛽1 𝑑(𝛽0 +𝛽1 𝑥) 𝑑𝛼
Furthermore, 𝑑𝜁
= 𝑑𝑥
= 𝛽1
𝜁=𝜁0
This parameter may be conveniently estimated by fitting the model by least squares,
𝑦ො = 𝛽መ0 + 𝛽መ1 𝑥
Then an “ adjustment ” to the initial guess 𝛼0 = 1 may be computed by defining a second
regressor variable as 𝑤 = 𝑥 ln 𝑥 , estimating the parameters in 𝐸 𝑦 = 𝛽0∗ + 𝛽1∗ 𝑥 +
𝛼 − 1 𝛽1 𝑤 = 𝛽0∗ + 𝛽1∗ 𝑥 + 𝛾𝑤 by least square
yො = 𝛽መ0∗ + 𝛽መ1∗ 𝑥 + 𝛾𝑤


𝛾
by taking 𝛼1 = ෡1
𝛽
+ 1 as the revised estimate of α
This procedure usually converges quite rapidly, and often the first - stage result α is a
satisfactory estimate of α .
If this fails, a second - order polynomial should be tried. As a general rule the use of high -
order polynomials ( k > 2) should be avoided unless they can be justified for reasons
outside the data. A low - order model in a transformed variable is almost always preferable
to a high - order model in the original metric. Arbitrary fitting of high - order polynomials
is a serious abuse of regression analysis. One should always maintain a sense of
parsimony , that is, use the simplest possible model that is consistent with the data and
knowledge of the problem environment. Remember that in an extreme case it is always
possible to pass a polynomial of order n − 1 through n points so that a polynomial of
sufficiently high degree can always be found that provides a “ good ” fit to the data.
2. Model - Building Strategy: Various strategies for choosing the order of an
approximating polynomial have been suggested. One approach is to
successively fi t models of increasing order until the t test for the highest order
term is nonsignificant. An alternate procedure is to appropriately fi t the
highest order model and then delete terms one at a time, starting with the
highest order, until the highest order remaining term has a significant t
statistic. These two procedures are called forward selection and backward
elimination , respectively. They do not necessarily lead to the same model. In
light of the comment in 1 above, these procedures should be used carefully. In
most situations we should restrict our attention to first - and second - order
polynomials.
3. Extrapolation: Extrapolation with polynomial models can be extremely
hazardous. For example, consider the second - order model. If we extrapolate
beyond the range of the original data, the predicted response turns downward.
This may be at odds with the true behavior of the system. In general,
polynomial models may turn in unanticipated and inappropriate directions,
both in interpolation and in extrapolation.
4. Ill - Conditioning I: As the order of the polynomial increases, the X′
X matrix becomes Ill - conditioned . This means that the matrix
inversion calculations will be inaccurate, and considerable error may be
introduced into the parameter estimates.
5. Ill - Conditioning II: If the values of x are limited to a narrow range,
there can be significant Ill - conditioning or multicollinearity in the
columns of the X matrix. For example, if x varies between 1 and 2, 𝑥 2
varies between 1 and 4, which could create strong multicollinearity
between x and 𝑥 2 .
6. Hierarchy: The regression model y = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑥 2 + 𝛽3 𝑥 3 + 𝜀
is said to be hierarchical because it contains all terms of order 3 and
lower. By contrast, the model y = 𝛽0 + 𝛽1 𝑥 + 𝛽3 𝑥 3 + 𝜀 is not
hierarchical. Only hierarchical models are invariant under linear
transformation and suggests that all polynomial models should have this
property.
Ridge Regression
The mean square error of the estimator 𝛽መ ∗ is defined as
𝑀𝑆𝐸 𝛽መ ∗ = 𝐸(𝛽መ ∗ − 𝛽)2 = 𝑣𝑎𝑟 𝛽መ ∗ + E ( 𝛽መ ∗ − 𝛽]2
= 𝑣𝑎𝑟 𝛽መ ∗ + (𝑏𝑖𝑎𝑠 𝑖𝑛 𝛽መ ∗ )2
A number of procedures have been developed for obtaining biased estimators of regression
coefficients. One of these procedures is ridge regression , originally proposed by Hoerl and Kennard.
The ridge estimator is found by solving a slightly modified version of the normal equations.
Specifically we define the ridge estimator 𝛽መ𝑅 as the solution to

𝑋´𝑋 + 𝑘𝐼 𝛽መ𝑅 = 𝑋´𝑦


Or
𝛽መ𝑅 = 𝑋´𝑋 + 𝑘𝐼 −1 𝑋´𝑦

where k ≥ 0 is a constant selected by the analyst. The procedure is called ridge regression because the
underlying mathematics are similar to the method of ridge analysis.
The ridge estimator is a linear transformation of the least - squares estimator,
𝛽መ𝑅 = 𝑋´𝑋 + 𝑘𝐼 −1 𝑋´𝑦 = 𝑋´𝑋 + 𝑘𝐼 −1 𝑋´𝑋 𝛽መ = 𝑍𝑘 𝛽መ
Therefore, since 𝐸 𝛽መ𝑅 = 𝐸 𝑍𝑘 𝛽መ = 𝑍𝑘 𝛽 = 𝛽መ𝑅 is a biased estimator of 𝜷. We usually refer to the
constant k as the biasing parameter . The covariance matrix of 𝛽መ𝑅
𝑉𝑎𝑟(𝛽መ𝑅 ) = 𝜎 2 𝑋´𝑋 + 𝑘𝐼 −1 𝑋´𝑋 𝑋´𝑋 + 𝑘𝐼 −1
The mean square error of the ridge estimator is,
2
𝑀𝑆𝐸 𝛽መ𝑅 = 𝑉𝑎𝑟 𝛽መ𝑅 + 𝑏𝑖𝑎𝑠𝑒𝑑 𝑖𝑛 𝛽መ𝑅
= 𝜎 2 𝑇𝑟[ 𝑋´𝑋 + 𝑘𝐼 −1 𝑋´𝑋 𝑋´𝑋 + 𝑘𝐼 −1 + 𝑘 2 𝛽´ 𝑋´𝑋 + 𝑘𝐼 −2 𝛽

2 𝑝 𝜆𝑗
= σ
𝜎 𝑗=1 + 𝑘 2 𝛽´ 𝑋´𝑋 + 𝑘𝐼 −2 𝛽
(𝜆𝑗 +𝑘)2

𝜆1 , 𝜆2 , 𝜆3 , … … . . Are the eigenvalues.


If A and X are two matrix and their product 𝑨𝑿 = 𝝀𝑿
Here 𝝀 is eigen values and A and X is eigen vector matrix.
The ridge estimates may result in an equation that does a better job of predicting future observations
than would least squares Hoed and Kennard have suggested that an appropriate value of k may be
determined by inspection of the ridge trace . The ridge trace is a plot of the elements of 𝛽መ𝑅 versus k for
values of k usually in the interval 0 – 1. Marquardt and Snee [ 1975 ] suggest using up to about 25 values
of k, spaced approximately logarithmically over the interval [0, 1]. If multicollinearity is severe, the
instability in the regression coefficients will be obvious from the ridge trace. As k is increased, some of
the ridge estimates will vary dramatically.
At some value of k , the ridge estimates 𝛽መ𝑅 will stabilize. The objective is to select a
reasonably small value of k at which the ridge estimates 𝛽መ𝑅 are stable. Hopefully this
will produce a set of estimates with smaller MSE than the least - squares estimates.
Principal - Component Regression
Biased estimators of regression coefficients can also be obtained by using a procedure
known as principal - component regression . Consider the canonical form of the
model,
𝒚 = 𝒁𝜶 + 𝜺
Where 𝒁 = 𝑿𝑻, 𝜶 = 𝑻´𝜷, 𝑻´𝑿´𝑿𝑻 = 𝒁´𝒁 = 𝑨
𝐴 = 𝜆1 , 𝜆2 , 𝜆3 , … … . . is a p × 𝑝 matrix of the eigenvalues of 𝑿´𝑿 and T is a p × 𝑝
orthogonal matrix whose columns are the eigenvectors associated With
𝜆1 , 𝜆2 , 𝜆3 , … … . .
The columns of Z , which defi ne a new set of orthogonal regressors, such as
The columns of Z , which defi ne a new set of orthogonal regressors, such as
𝜆𝑗 = j th principles of component.
If all 𝜆𝑗 = 1 the original regressors are orthogonal,
If 𝜆𝑗 = 0 this implies a perfect linear relationship between the original
regressors.
One or more of the 𝜆𝑗 near zero implies that multicollinearity is
present.

You might also like