Least-Squares Regression
Regression Line (Model)
It has the form y = a + bx,
where b is the slope, the amount by which y
changes when x increases by 1 unit
where a is the intercept, the value of y when x =
0
sy
Slope: b r
sx
Intercept: a y bx
The line that makes the sum of the squares of
the vertical distances of the data points from
the line as small as possible.
Linear Regression
Purpose: To predict the value of a difficult to
measure variable, Y, based on an easy to
measure variable, X.
Examples
Predict state revenues
Predict GPA based on SAT
predict reaction time from blood alcohol level
Extrapolation
Extrapolation is the use of a regression line for
prediction far outside the range of values of the
independent variable x that you used to obtain the
line. Such predictions are not accurate.
GRE consideration? Be Careful!
Interpreting Results
The regression line always passes through the point
x, y
The slope ‘says’ that along the regression line, a
change of one standard deviation in x corresponds to
a change of r standard deviations in y
When r = 1 or –1
the change in standard units is the same
Variation
The R squared value, r 2 , is the % of the variation
of Y explained by the model.
The higher the value, the better the model.
2 variance of predicted
r
variance of observed
No Straight Line?
What if the scatterplot shows a straight line
model is not appropriate?
Might see if some function of y is approximately
linear in some function of x.
Examples
Plot y versus ln(x)
Plot 1/y versus 1/x
If so, fit straight line model in terms of new
variables.
Example
Let’s use the alcoholic beverage and recall data
How can we tell if it is reasonable to fit a linear
regression model?
Let’s run the analysis and interpret the results