0% found this document useful (0 votes)
18 views23 pages

Comparing Regression Models with F-Test

The document discusses the comparison of two regression models, specifically the full model and reduced model, using the Partial F-test to assess the significance of a subset of variables. It explains how to evaluate the change in sum of squares before and after including additional variables and provides an example of applying this analysis using SAS. The document also covers sequential sum squares regression and its implications for model interpretation.

Uploaded by

zzhang2494-c
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views23 pages

Comparing Regression Models with F-Test

The document discusses the comparison of two regression models, specifically the full model and reduced model, using the Partial F-test to assess the significance of a subset of variables. It explains how to evaluate the change in sum of squares before and after including additional variables and provides an example of applying this analysis using SAS. The document also covers sequential sum squares regression and its implications for model interpretation.

Uploaded by

zzhang2494-c
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MULTIPLE

REGRESSION
Part D: Comparing Two Regression Models

1
Outline
• Comparing Two Regression Models
• Full Model vs. Reduced Model
• Partial 𝐹𝐹-test
• Change in 𝑆𝑆𝑆𝑆𝑆𝑆
• Sequential Sum Squares Regression

2
Comparing Two Regression Models
• Adjusted 𝑅𝑅 2
• Only look into the sample

• We want to test the significance of a subset of 𝑋𝑋 variables as a


group in the presence of the others
• Such as, we have 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝐿𝐿 , 𝑋𝑋𝐿𝐿+1 , … , 𝑋𝑋𝐾𝐾
• Examine the contribution of 𝑋𝑋𝐿𝐿+1 , … , 𝑋𝑋𝐾𝐾 to the relationship with the 𝑌𝑌
variable in the presence of 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝐿𝐿

3
Comparing Two Regression Models
– Full Model vs. Reduced Model
• Full model
• Has 𝐾𝐾 𝑋𝑋 variables
𝑌𝑌� = 𝑏𝑏0 + 𝑏𝑏1 𝑋𝑋1 + ⋯ + 𝑏𝑏𝐿𝐿 𝑋𝑋𝐿𝐿 + 𝑏𝑏𝐿𝐿+1 𝑋𝑋𝐿𝐿+1 + ⋯ + 𝑏𝑏𝐾𝐾 𝑋𝑋𝐾𝐾

• Reduced model
• Has 𝐿𝐿 𝑋𝑋 variables
• The subset of 𝑋𝑋 variables being tested are not in it
𝑌𝑌� = 𝑏𝑏0 + 𝑏𝑏1 𝑋𝑋1 + ⋯ + 𝑏𝑏𝐿𝐿 𝑋𝑋𝐿𝐿

4
Partial 𝐹𝐹-test
• To test the significance of a subset of 𝑋𝑋 variables as a group

• 𝐻𝐻0 : 𝛽𝛽𝐿𝐿+1 = 𝛽𝛽𝐿𝐿+2 = ⋯ = 𝛽𝛽𝐾𝐾 = 0


• 𝑋𝑋 variables in the subset do not significantly improve the model when all
the other 𝑋𝑋 variables are included
• 𝐻𝐻1 : 𝐴𝐴𝐴𝐴 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡 𝛽𝛽𝐿𝐿+1 , 𝛽𝛽𝐿𝐿+2 , … , 𝛽𝛽𝐾𝐾 ≠ 0
• At least one 𝑋𝑋 variable in the subset is significantly differ from zero

5
Partial 𝐹𝐹-test – Change in 𝑆𝑆𝑆𝑆𝑆𝑆
• Look at how much the 𝑆𝑆𝑆𝑆𝑆𝑆 change before and after the inclusion of
the subset of 𝑋𝑋 variables
• 𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑆𝑆𝑆𝑆𝑆𝑆 + 𝑆𝑆𝑆𝑆𝑆𝑆
Reduced Model Full Model
(Before Including the (After Including the
Subset) Subset)
Number of 𝑋𝑋 Variables Fewer, from 𝑋𝑋1 to 𝑋𝑋𝐿𝐿 More, from 𝑋𝑋1 to 𝑋𝑋𝐾𝐾
𝑆𝑆𝑆𝑆𝑆𝑆 and 𝑅𝑅2 Smaller Larger
𝑆𝑆𝑆𝑆𝑆𝑆 Larger Smaller

6
Partial 𝐹𝐹-test – Change in 𝑆𝑆𝑆𝑆𝑆𝑆
(𝑆𝑆𝑆𝑆𝑆𝑆𝑅𝑅 −𝑆𝑆𝑆𝑆𝑆𝑆𝐹𝐹 )/(𝐾𝐾−𝐿𝐿)
Partial 𝐹𝐹 = with (𝐾𝐾 − 𝐿𝐿), (𝑛𝑛 − 𝐾𝐾 − 1) df
𝑆𝑆𝑆𝑆𝑆𝑆𝐹𝐹 /(𝑛𝑛−𝐾𝐾−1)
where 𝑆𝑆𝑆𝑆𝑆𝑆𝑅𝑅 = 𝑆𝑆𝑆𝑆𝑆𝑆 of the reduced model
𝑆𝑆𝑆𝑆𝑆𝑆𝐹𝐹 = 𝑆𝑆𝑆𝑆𝑆𝑆 of the full model
𝐾𝐾 = no. of 𝑋𝑋 variables in the full model
𝐿𝐿 = no. of 𝑋𝑋 variables in the reduced model
𝑝𝑝-value = 𝑃𝑃(𝐹𝐹 ≥ 𝐹𝐹)
Reject 𝐻𝐻0 if 𝐹𝐹 > CV = 𝐹𝐹𝛼𝛼,(𝐾𝐾−𝐿𝐿),(𝑛𝑛−𝐾𝐾−1) or 𝑝𝑝-value < 𝛼𝛼

7
Example
• To certain extend, size and number of bedrooms are both
measuring how big the apartment is
• Examine them as a group to indicate whether “area” is related to
apartment price
• Given that floor level and age of the building are included in the
model

8
Example – SAS Program
• There is no short-cut in SPSS

proc reg data = [Link] ;


model price = size bedrooms floor age ;  Full model
model price = floor age ;  Reduced model, without the 𝑋𝑋 variables being tested
run ;

Alternatively,
proc reg data = [Link] ;
model price = size bedrooms floor age ;
test size = 0, bedrooms = 0 ;  List of 𝑋𝑋 variables being tested for slope coefficient
run ; equal to 0

9
Example – SAS Output
Full Model Reduced Model

10
Example – Discussion of Output
Model 𝑲𝑲 𝑹𝑹𝟐𝟐 Adjusted 𝑹𝑹𝟐𝟐 𝑭𝑭-test statistic
(𝒑𝒑-value)

Full 4 0.9346 0.9333 760.54


(< 0.0001)
Reduced 2 0.0876 0.0792 10.33
(< 0.0001)

• Considering size and number of bedrooms significantly increased


the adjust 𝑅𝑅2 from 0.0792 to 0.9333
• The 𝑝𝑝-values of Size and Bedrooms are less than 0.001, showing
both of them are significantly affecting price, but we may not need
to have both of them in the model

11
Example – Partial 𝐹𝐹-test
𝐻𝐻0 : 𝛽𝛽𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = 𝛽𝛽𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 = 0
𝐻𝐻1 : 𝐴𝐴𝐴𝐴 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡 𝛽𝛽𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 , 𝛽𝛽𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 ≠ 0

At 𝛼𝛼 = 5%, df = 2, 213, CV = 3.00

(2035.9835−146.02231)/(4−2)
Partial 𝐹𝐹 = = 1378.425
146.02231/(218−4−1)

Reject 𝐻𝐻0 , “area” is significantly affecting apartment price

12
Example – SAS Output
• From the alternative SAS program using the “test” statement

Partial 𝐹𝐹-test statistic


and its 𝑝𝑝-value

13
Partial 𝐹𝐹-test – Sequential Sum
Squares Regression
• Type I SS
• The 𝑆𝑆𝑆𝑆𝑆𝑆 due to a particular 𝑋𝑋 variable after including all the preceding 𝑋𝑋
variables
Type I SS = 𝑆𝑆𝑆𝑆𝑆𝑆(𝑋𝑋𝐿𝐿+1 |𝑋𝑋1 , … , 𝑋𝑋𝐿𝐿 )
• Increment of 𝑆𝑆𝑆𝑆𝑆𝑆 by having an extra 𝑋𝑋 variable (i.e. 𝑋𝑋𝐿𝐿+1 )
• The sequence of the variables being entered into the model would affect
the Type I SS
• 𝑆𝑆𝑆𝑆𝑆𝑆 𝑋𝑋1 , 𝑆𝑆𝑆𝑆𝑆𝑆 𝑋𝑋2 𝑋𝑋1 ), 𝑆𝑆𝑆𝑆𝑆𝑆(𝑋𝑋3 |𝑋𝑋1 , 𝑋𝑋2 ) vs
• 𝑆𝑆𝑆𝑆𝑆𝑆 𝑋𝑋3 , 𝑆𝑆𝑆𝑆𝑆𝑆 𝑋𝑋2 𝑋𝑋3 ), 𝑆𝑆𝑆𝑆𝑆𝑆(𝑋𝑋1 |𝑋𝑋2 , 𝑋𝑋3 )

14
Type I SS – Sequential Sum
Squares Regression
Price
𝑆𝑆𝑆𝑆𝑆𝑆(𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆)

𝑆𝑆𝑆𝑆𝑆𝑆(𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹|𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆)

Size

Floor

Overlapping area: Considered in the 𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 as Size is the first 𝑋𝑋 variable
being entered 15
I.e. 𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 + 𝑆𝑆𝑆𝑆𝑆𝑆 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑆𝑆𝑆𝑆𝑆𝑆(𝐴𝐴𝐴𝐴𝐴𝐴)
Type I SS – Sequential Sum
Squares Regression
• For a regression model with 2 𝑋𝑋 variables (𝑋𝑋1 and 𝑋𝑋2 )
𝑌𝑌� = 𝑏𝑏0 + 𝑏𝑏1 𝑋𝑋1 + 𝑏𝑏2 𝑋𝑋2
• 𝑆𝑆𝑆𝑆𝑆𝑆 𝐴𝐴𝐴𝐴𝐴𝐴 = 𝑆𝑆𝑆𝑆𝑆𝑆 𝑋𝑋1 + 𝑆𝑆𝑆𝑆𝑆𝑆(𝑋𝑋2 |𝑋𝑋1 )
• Partial 𝐹𝐹-test using Type I SS
𝐻𝐻0 : 𝛽𝛽2 = 0
𝐻𝐻1 : 𝛽𝛽2 ≠ 0
𝑆𝑆𝑆𝑆𝑆𝑆 𝑋𝑋2 𝑋𝑋1 /1
Partial 𝐹𝐹 = with 1, (𝑛𝑛 − 𝐾𝐾 − 1) df
𝑆𝑆𝑆𝑆𝑆𝑆(𝐴𝐴𝐴𝐴𝐴𝐴)/(𝑛𝑛−𝐾𝐾−1)

16
Type I SS – Sequential Sum
Squares Regression
• For a regression model with 𝐾𝐾 𝑋𝑋 variables

𝑌𝑌� = 𝑏𝑏0 + 𝑏𝑏1 𝑋𝑋1 + 𝑏𝑏𝐿𝐿 𝑋𝑋𝐿𝐿 + 𝑏𝑏𝐿𝐿+1 𝑋𝑋𝐿𝐿+1 + ⋯ + 𝑏𝑏𝐾𝐾 𝑋𝑋𝐾𝐾
• Partial 𝐹𝐹-test using Type I SS
𝐻𝐻0 : 𝛽𝛽𝐿𝐿+1 = 𝛽𝛽𝐿𝐿+2 = ⋯ = 𝛽𝛽𝐾𝐾 = 0
𝐻𝐻1 : 𝐴𝐴𝐴𝐴 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡 𝛽𝛽𝐿𝐿+1 , 𝛽𝛽𝐿𝐿+2 , … , 𝛽𝛽𝐾𝐾 ≠ 0
𝑆𝑆𝑆𝑆𝑆𝑆 𝑋𝑋𝐿𝐿+1 , … , 𝑋𝑋𝐾𝐾 𝑋𝑋1 , … , 𝑋𝑋𝐿𝐿 /(𝐾𝐾−𝐿𝐿)
Partial 𝐹𝐹 = with (𝐾𝐾 − 𝐿𝐿), (𝑛𝑛 − 𝐾𝐾 − 1) df
𝑆𝑆𝑆𝑆𝑆𝑆(𝐴𝐴𝐴𝐴𝐴𝐴)/(𝑛𝑛−𝐾𝐾−1)
where 𝑆𝑆𝑆𝑆𝑆𝑆(𝑋𝑋𝐿𝐿+1 , … , 𝑋𝑋𝐾𝐾 |𝑋𝑋1 , … , 𝑋𝑋𝐿𝐿 )
= 𝑆𝑆𝑆𝑆𝑆𝑆 𝑋𝑋𝐿𝐿+1 𝑋𝑋1 , … , 𝑋𝑋𝐿𝐿 + 𝑆𝑆𝑆𝑆𝑆𝑆 𝑋𝑋𝐿𝐿+2 , 𝑋𝑋1 , … , 𝑋𝑋𝐿𝐿+1
+ ⋯ + 𝑆𝑆𝑆𝑆𝑆𝑆(𝑋𝑋𝐾𝐾 |𝑋𝑋1 , … , 𝑋𝑋𝐾𝐾−1 )

17
Example – SAS Program
proc reg data = [Link] ;
model price = floor age size bedrooms / SS1 ;
run ;
𝑋𝑋 variables being tested are
listed towards the end

18
Example – SAS Output

𝑆𝑆𝑆𝑆𝑆𝑆 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹
𝑆𝑆𝑆𝑆𝑆𝑆 𝐴𝐴𝐴𝐴𝐴𝐴 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹
𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹, 𝐴𝐴𝐴𝐴𝐴𝐴
𝑆𝑆𝑆𝑆𝑆𝑆(𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵|𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹, 𝐴𝐴𝐴𝐴𝐴𝐴, 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆)
19
Example – Partial 𝐹𝐹-test
𝐻𝐻0 : 𝛽𝛽𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = 𝛽𝛽𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 = 0
𝐻𝐻1 : 𝐴𝐴𝐴𝐴 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡 𝛽𝛽𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 , 𝛽𝛽𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 ≠ 0

At 𝛼𝛼 = 5%, df = 2, 213, CV = 3.00

[𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆|𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹,𝐴𝐴𝐴𝐴𝐴𝐴 +𝑆𝑆𝑆𝑆𝑆𝑆(𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵|𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹,𝐴𝐴𝐴𝐴𝐴𝐴,𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆)]/(𝐾𝐾−𝐿𝐿)


Partial 𝐹𝐹 =
𝑆𝑆𝑆𝑆𝑆𝑆(𝐴𝐴𝐴𝐴𝐴𝐴)/(𝑛𝑛−𝐾𝐾−1)
[1807.53642+82.42478]/(4−2)
= = 1378.425
146.02231/(218−4−1)

Reject 𝐻𝐻0 , “area” is significantly affecting apartment price

The same partial 𝐹𝐹-test statistic and conclusion as in the analysis


using the change in 𝑆𝑆𝑆𝑆𝐸𝐸
20
Partial 𝐹𝐹-test – Sequential Sum
Squares Regression
• What will happen if we put the 𝑋𝑋 variables being tested at the
beginning of the SAS model statement?

proc reg data = [Link] ;


model price = size bedrooms floor age / SS1 ;
run ;
𝑋𝑋 variables being tested are
listed at the beginning

21
Example – SAS Output

Type I SS changed!
Ordering matters!

𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
𝑆𝑆𝑆𝑆𝑆𝑆 𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
𝑆𝑆𝑆𝑆𝑆𝑆 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆, 𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵
𝑆𝑆𝑆𝑆𝑆𝑆(𝐴𝐴𝐴𝐴𝐴𝐴|𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆, 𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵, 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹)
22
Type I SS vs. Type II SS
• Type I SS is a decomposition of 𝑆𝑆𝑆𝑆𝑆𝑆, measuring the contributions
of 𝑋𝑋 variables in a specific order
• Used for the partial 𝐹𝐹-test

• Type II SS is about the partial contribution of the 𝑋𝑋 variable, after


accounting for other 𝑋𝑋 variables in the model
• Related to the 𝑡𝑡-test

23

You might also like