Chapter 4
Linear Regression with One Regressor
Linear Regression with One Regressor
(SW Chapter 4)
Linear regression allows us to make inference
about population slope coefficients. How do we
best characterize the following relationship?
2
Income Data 2008 Census
3
Our objectives in linear regression are, in general, the same
as for Chapter 3:
Estimation:
How should we draw a line through the data to estimate
the (population) slope (answer: ordinary least squares).
What are advantages and disadvantages of OLS?
Hypothesis testing:
How to test for a relationship in the population?
Confidence intervals:
How to construct a set of numbers that best
characterize that relationship?
4
Some Notation and Terminology (SW Section 4.1)
5
The Population Linear Regression
Model general notation
6
Linearity
We assume that
X and Y have a
linear
relationship
7
Two components of Y
Remember, Y is a random variable that is composed
of two parts:
1. Systematic component: E(Yi|Xi) = 0 + 1 Xi
This is the mean of Y, given X.
2. Random component: ui = Yi E(Yi|Xi)
This is the error component.
Where does randomness come from?
a) Omitted variables
b) Measurement error
c) Inherent randomness of behavior
8
The Ordinary Least Squares Estimator
(SW Section 4.2)
9
n
The OLS estimator solves: b0 ,b1 i 0 1 i
min [Y ( b b X )]2
i 1
10
11
12
Interpretation of the estimated slope
and intercept
13
Predicted values & residuals:
One individual in the data set is a 47 year old white
male who is married with 2 children.
He received 18 years of education and earns
$100,000 in wage and salary income.
Label him as observation 1 and use a 1 subscript.
Y 1 -42.92417 6.08968*18 $66.69025 thousand dollars
u$1 100 66.69025=$33.30975 thousand dollars
14
OLS regression: SAS output
15
Baseball Application: Team Wins
v. Team Salary (2003)
^
Wins = 65.13 + 0.20*Payroll
Estimated slope 1 0.20
Estimated intercept 65.13
0
Wins 65.13 0.20 Payroll
16
Interpretation of parameter
estimates
Wins= 65.13 + 0.20*Payroll
Teams that spend $5M more for total team payroll
have, on average, 1 more win
The intercept (taken literally) means that, according
to the estimation line, teams with zero team payroll
have a predicted win total of 65.13 wins
The interpretation of the intercept makes no sense it
extrapolates the line outside the range of data here, the
intercept is not economically meaningful
17
Predicted values and residuals
Atlanta
Braves
^
Wins = 65.13 + 0.20*Payroll
The Atlanta Braves spent $103.91M and won 101
games
Predicted value: Y i = 65.13 + 0.20*103.91 = 85.61
Residual: u$i = 101 85.61 = 15.39
18
OLS Regression: SAS Output
19
Measures of Fit
(Section 4.3)
20
21
Another way to calculate R2
Sum of the squared residuals or SSR is the
remaining part of the total sum of squares after we
remove the part of the variation that is being
explained by the model.
ESS SSR
R
2
1
TSS TSS
n 2
SSR u$ i
i 1
22
The Standard Error of the
Regression (SER)
23
24
25
Example of the R and the SER
2
26
OLS Regression: SAS Output
Payroll explains only
a small fraction of the
variation in team
wins. Does this make
it unimportant? Can
small market teams
compete?
27
The Least Squares Assumptions
(SW Section 4.4)
28
The Least Squares Assumptions
29
LSA #1: E(u|X = x) = 0.
30
LSA #2: (Xi,Yi), i = 1,,n are i.i.d.
31
LSA #3: Large outliers are rare
32
OLS can be sensitive to an outlier:
33
The Sampling Distribution of the
OLS Estimator (SW Section 4.5)
34
Probability Framework for Linear
Regression
35
The Sampling Distribution of 1
36
What is the sampling distribution of 1?
37
The sampling distribution of 1 :
38
39
The larger the variance of X, the
smaller the variance of 1
40
The larger the variance of X, the
smaller the variance of 1
41