Assignment#4
4/25/2020
---------------------------------------------------------------------------------------------------------------------
a) Draw the scatter plot of the data, if you can plot via R would also be
acceptable.
# Reading data from Excel CSV file
xyData<-[Link]([Link](),header = T)
attach(xyData)
plot(Area_Sq_Feet,Price_Dollars_1000,main = "Scatter Plot: Area and Selling
Price")
*SCATTER PLOT USING EXCEL*
A
r
e
a
S
q.
F
e Pric
e e($ x-
t 100 sq
b) Derive the normal (x 0)- ua equation by least square fit
method. ) (y) xy re
1
6
Normal equations by least 1 9 10 square fit method are given
below; 0 0 00
0 0 00
0 169 0 0
To form above two equations, we need the
following; 1
9
1 0 12
Sum of x, Sum of y, Sum of xy, Sum of x-square
1 3 10
0 0 00
Here: 0 173 0 0
y = Price (Dependent 2 variable) and x = Area
(Independent variable) 1
1 1 14
2 2 40
0 0 00
0 176 0 0
2
4
1 9 19
4 2 60
0 0 00
0 178 0 0
2
9
1 7 25
6 6 60
0 0 00
0 186 0 0
3
5
1 4 32
8 6 40
0 0 00
0 197 0 0
3
8
1 5 36
9 7 10
0 0 00
0 203 0 0
3
9
1 5 36
Formulate above (1) and (2) Normal equations;
11*a + b*18300 = 2216 ..................................(1)
18300 * a + b * 32290000 = 38033400 ................ (2)
c) Solve the normal equations derived from step 2 simultaneously and
compute the unknown parameters.
By solving (1) and (2), we get unknown parameters 'a' and 'b'.
To cancel parameter 'a' in the above equations, we do the following operations;
(1) x 18300
(2) x 11
we get:
''a'' ''b''
Coefficient Coefficient RHS
1st
Equation 201300 334890000 40552800
2nd
Equation 201300 355190000 41837400
(2) - (1) 0 20300000 1284600
Solving the table, we get:
b = 0.0633
Substitute this b in (1) we get,
a = 96.15
d) Find sum of square error (SSE): ∑ (Y − Ŷ )
Area
[Link]
(x) Price (y) xy x-square Y-hat (Y - Y-hat)
1000 169 169000 1000000 159.45 9.55
1100 173 190300 1210000 165.78 7.22
1200 176 211200 1440000 172.11 3.89
1400 178 249200 1960000 184.77 -6.77
1600 186 297600 2560000 197.43 -11.43
1800 197 354600 3240000 210.09 -13.09
1900 203 385700 3610000 216.42 -13.42
1900 208 395200 3610000 216.42 -8.42
2100 229 480900 4410000 229.08 -0.08
2100 237 497700 4410000 229.08 7.92
2200 260 572000 4840000 235.41 24.59
Total 18300 2216 3803400 32290000 -0.04
∑ (Y − Ŷ) = -0.04
e) Predict the price which you have an 1850 square feet area plot.
To predicted value of Y when X=1850, we need to find the regression equation:
Y=a1+b1X+b2X^2
Table for calculation:
Observati
ons X Y X^2 X^3 X^4 XY X^2*Y
10000000 10000000 16900000
1 1000 169 1000000 00 00000 169000 0
13310000 14641000 20933000
2 1100 173 1210000 00 00000 190300 0
17280000 20736000 25344000
3 1200 176 1440000 00 00000 211200 0
27440000 38416000 34888000
4 1400 178 1960000 00 00000 249200 0
40960000 65536000 47616000
5 1600 186 2560000 00 00000 297600 0
58320000 10497600 63828000
6 1800 197 3240000 00 000000 354600 0
68590000 13032100 73283000
7 1900 203 3610000 00 000000 385700 0
68590000 13032100 75088000
8 1900 208 3610000 00 000000 395200 0
92610000 19448100 10098900
9 2100 229 4410000 00 000000 480900 00
92610000 19448100 10451700
10 2100 237 4410000 00 000000 497700 00
10648000 23425600 12584000
11 2200 260 4840000 000 000000 572000 00
59619000 11381650 68922600
Total 18300 2216 32290000 000 0000000 3803400 00
The normal equations for the quadratic equation are as follows
b2 ∑xi^4 + b1 ∑xi^3 + a∑xi^2 = ∑xi^2yi
b2 ∑xi^3 + b1 ∑xi^2 + a∑xi = ∑xiyi
b2 ∑xi^2 + b1 ∑xi + an = ∑yi
So, putting the values we have:
113816500000000 b2 +59619000000 b1+ 32290000 a =6892260000 ...... (1)
59619000000 b2+32290000 b1 + 18300 a = 3803400......(2)
32290000 b2 + 18300 b1 + 11 a = 2216......(3)
Dividing both sides of (1) by 10000 we get,
11381650000 b2 + 5961900 b1 + 3229 a = 689226...... (4)
Dividing both sides of (2) by 100 we get,
596190000 b2 + 322900 b1 +183 a = 38034......(5)
So, solving the final equations (3), (4) and (5)
32290000 b2 + 18300 b1 + 11 a = 2216......(3)
11381650000 b2 + 5961900 b1 + 3229 a = 689226...... (4)
596190000 b2 + 322900 b1 +183 a = 38034......(5)
Solving we get,
Solving a, b and c we have,
b2 = 1523/18260000=0.0000834
b1 = -215397/1059080=-0.20338
a = 7809958/26477 = 294.9714
So, the regression equation is:
Y = 294.97 - 0.20338X + 0.0000834X^2
The predicted value of Y when X = 1850 is
Y = 294.97 - 0.20338 * 1850 + 0.0000834 * 1850^2 = 204.1535
So, Y=204 approximately (in $1000)
f) Verify that you get the following equation at the end.
Y = 294.97 − 203.38X + 83.40X^2
As Y's units was 1000, we have the modified equation as:
Y = 294.9714 - 0.20338 * 1000X + 0.0000834 * 1000^2 * X^2
Y = 294.9714 - 203.38X + 83.40X^2 is the final equation
g) In this case, what will be the impact, when you fit a linear regression
instead of polynomial? Justify your answer in points.
Now, we fit a linear regression equation: Y = a + b * X
b = slope of the regression = r * SD(y) / SD(X)
a = ybar – b * xbar = intercept
ybar = mean of Y = ∑yi / n
xbar = mean of X = ∑xi / n
r = correlation coefficient = ∑(x-xbar) (y-ybar) /√ {(∑(x-xbar) ^2*∑ (y-ybar) ^2}
SD(x) = Standard deviation of x = √(∑(x-xbar) ^2/(n-1)
SD(y) = Standard deviation of y = √(∑(y-ybar) ^2/(n-1)
Table for calculation:
Observa X Y (X-Xbar) (Y-Ybar) (X- (X- (Y-
tions Xbar)*(Y Xbar)^2 Ybar)^2
-Ybar)
21537.9 440412. 1053.29
1 1000 169 -663.636 -32.4545 7456 7405 457
16037.9 317685. 809.658
2 1100 173 -563.636 -28.4545 8056 5405 5703
11801.6 214958. 647.931
3 1200 176 -463.636 -25.4545 2256 3405 5703
6183.45 69503.9 550.113
4 1400 178 -263.636 -23.4545 0562 405 5703
983.462 4049.54 238.841
5 1600 186 -63.636 -15.4545 562 0496 5703
-
607.433 18595.1 19.8425
6 1800 197 136.364 -4.4545 438 405 7025
365.300 55867.9 2.38857
7 1900 203 236.364 1.5455 562 405 025
1547.12 55867.9 42.8435
8 1900 208 236.364 6.5455 0562 405 7025
12019.8 190413. 758.754
9 2100 229 436.364 27.5455 6456 5405 5703
15510.7 190413. 1263.48
10 2100 237 436.364 35.5455 7656 5405 257
31401.6 287686. 3427.57
11 2200 260 536.364 58.5455 9856 3405 557
Xbar=16 Ybar=20 116781. 1845454 8814.72
Total 18300 2216 63.636 1.4545 8182 .545 7273
So b=[Link]/sx
Now r =∑(x-xbar)(y-ybar)/√{( ∑(x-xbar)^2*∑ (y-
ybar)^2}=116781.8181/√(1845454.5455*8814.7273)
=0.9156
Sy=SD(y) =√(∑(y-ybar)^2/(n-1) =√(8814.7273/10)=29.6896
Sx= SD (x)=√(∑(x-xbar)^2/(n-1)=√(1845454.545/10)=429.5875
b=0.9156*429.5875/29.6896=0.06327
a=ybar-b*xbar=201.4545-0.06327*1663.636=96.1963
So, the Linear Regression is
Y=96.1963+0.06327*X
The predicted value of Y for X=1850 in the linear regression is:
Y=96.1963+0.06327*1850=213.2458=213 approximately (in $1000)