Simple linear regression (Chapters 2 to 4)
• Model : Yi = + Xi + Ui , with Ui ~ N(0, ²) and Ui pairwise uncorrelated
• Estimation of the parameters:
a = y −bx
s
b=r y
sx
with r the sample correlation coefficient between x and y
1 xi − x yi − y
r=
n − 1 sx s y
• Estimated model : yˆi = a + b xi
• Residual : eˆi = yi − yˆi = yi − ( a + b xi )
• Estimation of the model standard deviation
1 1
ˆ = ( yi − yˆi ) =
2
eˆi2 (Standard Error of the Regression)
n−2 n−2
• Inference for the regression parameters:
Inference for slope Inference for intercept
Estimator B A
ˆ x2
Standard Error SEB = SEA = ˆ
1
+
n − 1 sx n (n − 1) sx2
B− A −
Test statistic T= ~ tn-2 T= ~ tn-2
SEB SE A
Confidence interval = B tn − 2, / 2 SEB = A tn − 2, / 2 SE A
• Confidence- and prediction interval for a given x
Confidence interval Prediction interval for Y at a
for the mean of Y at a given x given x
Y|X=x Y|X=x
1 (𝑥 − 𝑥̄ )2 1 (𝑥 − 𝑥̄ )2
Standard Error 𝑆𝐸𝜇̂ = 𝜎̂ √ + 𝑆𝐸𝑦̂ = 𝜎̂ √1 + +
𝑛 (𝑛 − 1)𝑠𝑥2 𝑛 (𝑛 − 1)𝑠𝑥2
Conf/pred yˆ tn − 2, /2 SEˆ yˆ tn − 2, /2 SE yˆ
interval
• Assumptions:
o (SR1) Y = + X + U
o (SR2) E(U|X=x) = 0 for every x
o (SR3) Var(U|X=x) = ²
o (SR4) Cov(Ui, Uj) = 0 for i≠j
o (SR5) X is not constant
o (SR6) U|X=x is normally distributed
• Test for correlation: H0 : = 0 versus Ha : ≠ 0
𝑛−2
Test statistic : 𝑇 = 𝑅√1−𝑅2 ~ tn−2
Multiple linear regression (Chapters 5 to 8)
• Model : Yi = + X1i + 2 X2i + ... + k Xk i + Ui , with Ui ~ N(0, ²) and Ui pairwise
uncorrelated
• Estimated model : yˆi = a + b1 x1i + b2 x2i + ... + bk xki
• Residual : eˆi = yi − yˆi = yi − ( a + b1 x1i + b2 x2i + ... + bk xki )
• Estimation of the model standard deviation
1 1
ˆ = ( yi − yˆi ) = eˆi2 = MSE (Regression Standard Error)
2
n − k −1 n − k −1
• ANOVA-table for multiple regression:
SS DF MS = SS/DF F p-waarde
Regression (Model) SSR k MSR = SSR/DFR MSR/MSE p-value
Error (Residual) SSE n–k–1 MSE = SSE/DFE
Total SST n–1
• Explained variation due the model
( yˆ − y )
2
SSR = Sum of Squares Regression = i
• Unexplained variation (variation around the model)
SSE = Sum of Squares Error = ( yi − yˆi ) = eˆi2
2
• Total variation of the dependent variable:
( y − y )
2
SST = Sum of Squares Total = i
• SSR + SSE = SST
• R² = SSR / SST = 1 – SSE/SST
= fraction of the variation in the dependent variable that can be explained by the
model
= ( r ( y, yˆ ) )
2
𝑆𝑆𝐸/(𝑛−𝑘−1)
• 2
𝑅𝐴𝑑𝑗 =1− 𝑆𝑆𝑇/(𝑛−1)
• Overall ANOVA F-test for multiple regression:
H0 : = 2 = ... = k = 0 versus Ha : not all i = 0
• T-test voor the individual parameters:
H0 : i = 0 versus Ha : i ≠ 0
Test statistic:
B − i
T= i ~ tn−k−1
SEBi
• Joint F-test to test j restrictions:
Test statistic:
( SSER − SSEU ) / j ( RU2 − RR2 ) / j
F= = ~ F(j, n−p)
SSEU / (n − p) (1 − RU2 ) / (n − p)
with j the number of restriction, n the number of cases, and p the number of coefficients to
be estimated in the model without restrictions (p=k+1)
Overview of some R commands (examples)
Task R command
Scatter plot plot(x, y)
Correlation cor(x, y)
Linear regression model model <- lm(y ~ x1 + x2 + x3)
Output of the model summary(model)
Confidence intervals for the confint(model, level=0.95)
parameters
Package for heteroskedasticity sandwich, lmtest
corrected (robust) standard errors oproepen via bijv. library(sandwich)
t-tests with heteroskedasticity coeftest(model1,vcov=vcovHC(model1,type="HC1"))
corrected standard errors
Add regression line to scatter plot abline(model)
Residuals of the regression model residuals(model) of model$residuals
Predicted y-values (y^) [Link](model)
of model$[Link]
R² of the model summary(model)$[Link]
Histogram hist(x)
Normal quantile plot qqnorm(x)
F-test for comparing 2 models anova(model1, model2)
F-test for comparing 2 models waldtest(model1,model2,vcov=vcovHC(model2,type="HC1"))
(robust SE)
Creating dummies dummy1 <- (x == cat1)
of dummy1 <- (x == cat1)*1
Descriptive statistics [Link] uit het package pastecs
Descriptive statistics describe uit het package psych
Cumulative distribution function pnorm
of a normal distribution
Inverse cumulative distribution qnorm
function of a normal distribution
t-distribution (c.d.f., inverse c.d.f.) pt, qt
F-distribution (c.d.f., inverse cdf) pf, qf
Binomial distribution (probability dbinom, pbinom, qbinom
mass function, c.d.f., inverse cdf)