Regression Analysis Techniques Explained
Regression Analysis Techniques Explained
X X
Y Y
Y
Curvilinear
Relationships
X X X
Y Y
No
Relationship
X X
ABHISHEK VASHISHTH | IIM SHILLONG
Simple Linear Regression Model
Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable
Yi β0 β1Xi ε i
Linear component Random Error
component
Y Yi β0 β1Xi ε i
Observed Value
of Y for Xi
εi Slope = β1
Predicted Value Random Error
of Y for Xi
for this Xi value
Intercept = β0
Xi X
ABHISHEK VASHISHTH | IIM SHILLONG
Simple Linear Regression
Equation (Prediction Line)
The simple linear regression equation provides an
estimate of the population regression line.
Estimated
(or predicted) Estimate of Estimate of the
Y value for the regression regression slope
observation i intercept
Value of X for
Ŷi b0 b1Xi
observation i
450
400
House Price ($1000s)
350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Analysis of Variance
Source DF SS MS F P
Regression 1 18935 18935 11.08 0.010
Residual Error 8 13666 1708
Total 9 32600
450
400
House Price ($1000s)
350
Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
350
300
250
200
150
100
50 Do not try to
0 extrapolate
0 500 1000 1500 2000 2500 3000
beyond the range
Square Feet
of observed X’s
ABHISHEK VASHISHTH | IIM SHILLONG
Computing b0 & b1
For small data sets, b0 & b1 can be calculated using a hand calculator
SSXY
b1 b0 Y b1 X
SSX n n
n n
( X i )( Yi )
Where: SSXY ( X i X )(Yi Y ) X iYi i 1 i 1
i 1 i 1 n
n
n n
( X i )2
SSX ( X i X ) X i2 2 i 1
i 1 i 1 n
n n
Y i X i
Y i 1
and X i 1
n n
n
( X i )( Yi )
( 17 ,150 )( 2,865 )
SSXY X iYi i 1 i 1
5 ,085 ,975 172,500
i 1 n 10
n
( X i )2
n
17 ,150 2
SSX X i2 i 1
30,983,750 1,571,500
i 1 n 10
n n
Y i
2,865 X i
17 ,150
Y i 1
= =286.5 and X i 1
1,715
n 10 n 10
ABHISHEK VASHISHTH | IIM SHILLONG
Computing b0 & b1 -- Home Prices (Con’t)
n n
n
( X i )( Yi )
( 17 ,150 )( 2 ,865 )
SSXY X iYi i 1 i 1
5 ,085 ,975 172 ,500
i 1 n 10
n
( X i )2
n
17 ,150 2
SSX X i
2 i 1
30 ,983,750 1,571,500
i 1 n 10
n n
Y i
2 ,865 X i
17 ,150
Y i1
= =286.5 and X i1
1,715
n 10 n 10
Xi X
ABHISHEK VASHISHTH | IIM SHILLONG
Excel Output Of The Measures Of
Variation
note: 0 r2 1
ABHISHEK VASHISHTH | IIM SHILLONG
Examples of Approximate
r2 Values
Y
X
r2 =1
ABHISHEK VASHISHTH | IIM SHILLONG
Examples of Approximate
r2 Values
Y
0 < r2 < 1
X
ABHISHEK VASHISHTH | IIM SHILLONG
Examples of Approximate
r2 Values
r2 = 0
Y
No linear relationship
between X and Y.
Analysis of Variance
Source DF SS MS F P
Regression 1 18935 18935 11.08 0.010
Residual Error 8 13666 1708
Total 9 32600
SSE
(Yi Yˆi ) 2
i 1
S YX
n2 n2
Where
SSE = error sum of squares
n = sample size
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Analysis of Variance
Source DF SS MS F P
Regression 1 18935 18935 11.08 0.010
Residual Error 8 13666 1708
Total 9 32600
Y Y
x x
residuals
residuals
x x
Not Linear
ABHISHEK VASHISHTH | IIM SHILLONG
Linear
Residual Analysis for
Independence
Cyclical Pattern:
Not Independent
No Cyclical Pattern
Independent
residuals
residuals
X
residuals
Percent
100
0
-3 -2 -1 0 1 2 3
Residual
ABHISHEK VASHISHTH | IIM SHILLONG
Residual Analysis for Equal Variance
Y Y
x x
residuals
residuals
x x
3 284.8535 -5.8535 40
Residuals
4 304.0628 3.9371 20
5 218.9928 -19.9928 0
6 268.3883 -49.3883 0 1000 2000 3000
-20
7 356.2025 48.7975
-40
8 367.1793 -43.1793
-60
9 254.6674 64.3326
Square Feet
10 284.8535 -29.8535
3 284.8535 -5.8535
4 304.0628 3.9371 20
5 218.9928 -19.9928
0
0 2 4 6 8 10 12
6 268.3883 -49.3883
7 356.2025 48.7975 -20
8 367.1793 -43.1793
-40
9 254.6674 64.3326
10 284.8535 -29.8535 -60
Residuals
20
4 304.0628 3.9371
0
5 218.9928 -19.9928
6 268.3883 -49.3883 -20
RESIDUAL OUTPUT
Residuals vs Predicted Value
Predicted
80
House Price Residuals
1 251.9232 -6.9232 60
2 273.8767 38.1233 40
3 284.8535 -5.8535
20
4 304.0628 3.9371
5 218.9928 -19.9928 0
7 356.2025 48.7975
-40
8 367.1793 -43.1793
9 254.6674 64.3326 -60
200 220 240 260 280 300 320 340 360 380
10 284.8535 -29.8535
15
Here, residuals show a 10
5
cyclical pattern, not
random. Cyclical Residuals 0
-5 0 2 4 6 8
patterns are a sign of -10
positive autocorrelation. -15
Time (t)
S YX S YX
Sb1
SSX i
(X X ) 2
where:
Sb1 = Estimate of the standard error of the slope.
Test statistic :
where:
b1 β 1
t STAT b1 = regression slope
coefficient
Sb
1 β1 = hypothesized slope
b1 Sb1
b1 β 1 0.10977 0
t STAT 3.32938
Sb 0.03297
1
d.f. = 10- 2 = 8
/2=.025 /2=.025
Decision: Reject H0.
p-value
Decision: Reject H0, since p-value < α.
There is sufficient evidence that
square footage affects house price.
Regression Statistics
Multiple R 0.76211
MSR 18,934.9348
R Square 0.58082 FSTAT 11.0848
Adjusted R Square 0.52842 MSE 1,708.1957
Standard Error 41.33032
Observations 10 With 1 and 8 degrees p-value for
of freedom the F-Test
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Analysis of Variance
Source DF SS MS F P
Regression 1 18935 18935 11.08 0.010
Residual Error 8 13666 1708
Total 9 32600
ABHISHEK VASHISHTH | IIM SHILLONG
F Test for The Slope
H0 : β 1 = 0 Test Statistic:
H1 : β 1 ≠ 0 MSR
FSTAT 11.08
= .05 MSE
df1= 1 df2 = 8
Decision:
Critical Reject H0 at = 0.05.
Value:
F = 5.32
= .05 Conclusion:
There is sufficient evidence that
0 F house size affects selling price.
Do not Reject H0
reject H0
F.05 = 5.32
ABHISHEK VASHISHTH | IIM SHILLONG