0% found this document useful (0 votes)
10 views53 pages

Simple Linear Regression

The document provides an overview of simple linear regression, focusing on the relationship between a dependent variable Y and an independent variable X, expressed as Y = β0 + β1X + u. It discusses the estimation of coefficients using Ordinary Least Squares (OLS), the interpretation of parameters, and the importance of goodness of fit through R-squared. Additionally, it covers hypothesis testing for the slope coefficient and the statistical significance of the regression results.

Uploaded by

mhw2xpxpn7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views53 pages

Simple Linear Regression

The document provides an overview of simple linear regression, focusing on the relationship between a dependent variable Y and an independent variable X, expressed as Y = β0 + β1X + u. It discusses the estimation of coefficients using Ordinary Least Squares (OLS), the interpretation of parameters, and the importance of goodness of fit through R-squared. Additionally, it covers hypothesis testing for the slope coefficient and the statistical significance of the regression results.

Uploaded by

mhw2xpxpn7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Simple linear regression

Basic idea

I What is Regression: regression is concerned with describing


and evaluating the relationship between a given variable Y
and one or more other variables X .

Y = f (X )

I Y is called dependent variable or explained variable, X is called


independent variable or explanatory variables or regressors
I Linear regression: the relation is linear (in the coefficients)

Y = β0 + β1 X + u

I Simple linear regression: assume that Y depends on only one


X variable
Simple linear regression

Y = β0 + β1 X + u

I Coefficients (parameters): β0 (intercept), β1 (slope)


I Intepretation of β0 and β1 .
I We focus on β1 : β1 = dYdX , or dY = β1 dX . when X increase
by 1 unit, Y increase by β1 units.
I Error term u.
I We have data on Y and X , not u. Goal is to use the data to
estimate β0 and β1 .
I Example: Y = height of child, X = height of dad,
u = nutrition, life style.... (what should be the sign of β1 ?)
Linear functional form: too simple?

Y = β0 + β1 X + u

I Quadratic form: Y = β0 + β1 X 2 + u
I lnY = β0 + β1 X + u (intepretation)

dlnY
= β1
dX
dlnY dY /Y
but =
dX dX
dY /Y
so = β1
dX
so dY /Y = β1 dX

I Intepretation of β1 : when X increases by 1 unit, Y increases by


100 × β1 percentage (e.g., if β1 = 0.01, Y will increase by 1%).
Linear functional form: too simple? (cont.)

I lnY = β0 + β1 lnX + u (intepretation)

dlnY
= β1
dlnX
dlnY dY /Y
but =
dlnX dX /X
dY /Y
so = β1
dX /X
so dY /Y = β1 dX /X

I Intepretation of β1 : when X increase by 1%, Y increase by


β1 %.
I The model is equivalent to Y = AX β1 e u ,where β0 = lnA
Example: NHL Hockey player salaries

I Begin with a simple example of salary determination for NHL


hockey players.
I Restrict analysis to forwards (i.e., exclude defensemen and
goalies)
I What is the economic return to scoring a goal?
I Consider a simple economic model:

Salaryi = β0 + β1 Goalsi + i
Salary Data
I Data on hockey salaries and past performance from the
NHLPA website.
I Salaries in 2000-2001, with performance covering the
1999-2000 season.
I 418 players who played at least 20 games.
Ln(Salary) versus goals
I It turns out that it’s better to specify the following model:
ln(Salaryi ) = β0 + β1 Goalsi + i
I The slope coefficient gives the % increase in salary associated
with an additional goal (approximately).
Plot of regression results
Simple Regression Results

Estimated Effect of Performance on Player Salary


(standard errors in parentheses)
Salary ($US) Ln(Salary)
Mean = 1.4M Mean=13.78
(1) (2)
Intercept 0.108 −0.712
(0.105) (0.048)
Goals 0.101 0.053
(0.007) (0.003)
R-squared 0.36 0.43
Linear model

I The following regression function is linear in parameters:

Yi = β0 + β1 Xi + ui

I With assumptions about u, we can work out the conditional


expectation:

E [Yi |Xi ] = β0 + β1 Xi + E [ui |Xi ]

I If we assume: E [ui |Xi ] = 0 (error is noise, unpredictable by x)

E [Yi |Xi ] = β0 + β1 Xi
The population regression line
Population regression line, sample data points and the
associated error terms
Estimation of coefficients by Ordinary Least Squares (OLS)

I Consider a random sample {(X1 , Y1 ), . . . , (Xn , Yn )}


I Define an estimator (predictor) of Yi by Ŷi = β̂0 + β̂1 Xi .
I The residual (prediction error) is

ûi = Yi − Ŷi = Yi − β̂0 − β̂1 Xi

I The sum of squared residuals:


n
X n
X
ûi2 = (Yi − Ŷi )2
i=1 i=1

I The least squares estimator finds β̂0 , β̂1 such that the sum of
squared residuals is minimized
The objective function for a regression

I Define the Residual Sum of Squares (SSR)


n
X
SSR = ûi2
i=1
n
X
= (Yi − Ŷi )2
i=1
n
X
= (Yi − β̂0 − β̂1 Xi )2
i=1
Choose estimators to minimize SSR

I Simple calculus yields the first order conditions


Pn 2
 n
∂ i=1 ûi
X
= −2 (Yi − β̂0 − β̂1 Xi )
∂ β̂0 i=1
n
X
= −2 ûi
i=1
Pn 2
 n
∂ i=1 ûi
X
= −2 Xi (Yi − β̂0 − β̂1 Xi )
∂ β̂1 i=1
n
X
= −2 Xi ûi
i=1
Solving for the intercept

I The first equation implies:


n
X n
X
ûi = 0 ⇒ (Yi − β̂0 − β̂1 Xi ) = 0
i=1 i=1
Xn n
X
⇒ Yi = nβ̂0 + β̂1 Xi
i=1 i=1
Pn Pn
i=1 Yi i=1 Xi
⇒ β̂0 = − β̂1
n n
⇒ β̂0 = Ȳ − β̂1 X̄
Solving for the slope

n
X n
X
Xi ûi = 0 ⇒ Xi (Yi − β̂0 − β̂1 Xi ) = 0
i=1 i=1
Xn n
X n
X
⇒ Xi Yi = β̂0 Xi + β̂1 Xi2
i=1 i=1 i=1

I And substituting in the expression for β̂0 = Ȳ − β̂1 X̄


n
X n
X n
X
Xi Yi = (Ȳ − β̂1 X̄ ) Xi + βˆ1 Xi2
i=1 i=1 i=1
n
X n
X n
X
= Ȳ Xi − β̂1 X̄ Xi + β̂1 Xi2
i=1 i=1 i=1
n
X
= nȲ X̄ − β̂1 nX̄ 2 + β̂1 Xi2
i=1
The OLS formula for the slope

Pn Pn Pn
i=1 Xi Yi − nX̄ Ȳ i=1 (Xi − X̄ )(Yi − Ȳ ) i=1 xi yi
β̂1 = Pn 2 2
= P n 2
= P n 2
i=1 Xi − nX̄ i=1 (Xi − X̄ ) i=1 xi

where xi = Xi − X̄ and yi = Yi − Ȳ (deviations from means). So


n
X n
X
xi yi = (Xi − X̄ )(Yi − Ȳ )
i=1 i=1

and similarly
n
X n
X
xi2 = (Xi − X̄ )2
i=1 i=1
In summary, the OLS estimators

β̂0 = Ȳ − β̂1 X̄
Pn
xi yi cov
ˆ (X , Y )
β̂1 = Pi=1
n 2
=
i=1 ix var
ˆ (X )
Properties of OLS estimators

I Three important properties:


Pn
I The residuals sum to 0: i=1 ûi = 0 (from FOC 1)
I Xi and ûi are orthogonal: ni=1 Xi ûi = 0 (from FOC 2)
P
I The fitted regression line passes through the means (X̄ , Ȳ )

Ȳ = β̂0 + β̂1 X̄
OLS Figure
Deviations from mean

I Yi can be expressed in terms of Ŷi and ûi :

Yi = Ŷi + ûi
= β̂0 + β̂1 Xi + ûi

I Yi − Ȳ = β̂0 + β̂1 Xi + ûi − (β̂0 + β̂1 X̄ ) = β̂1 (Xi − X̄ ) + ûi


I Let yi = Yi − Ȳ and xi = Xi − X̄

yi = β̂1 xi + ûi
Goodness of fit and R 2

I How well does the OLS regression line fit the data?
I The following concepts will help us quantify the fit:
I Total Sum of Squares: SST = ni=1 (Yi − Ȳ )2
P
I Residual (unexplained) Sum of Squares: SSR = 2i=1 ûi2
P
I Explained Sum of Squares: SSE = ni=1 (Ŷi − Ȳ )2
P

I It holds that SST = SSE + SSR


Goodness of fit and R 2 (cont.)

I R 2 quantifies the fit of the OLS regression line to the data.


I R 2 is the proportion of total variation that is explained by the
regression.

SSE SSR
R2 ≡ =1−
SST SST
sampling properties of the OLS estimator

I The OLS estimator is unbiased

E [β̂i |X ] = βi
sampling properties of the OLS estimator (cont.)

I Two More assumptions for variance


I Pair Wise Uncorrelated (conditional on X):

cov (ui ,uj ) = 0

this is guaranteed if we have a random sample


I Conditional Homoskedasticity:

var (ui |X ) = σ 2
Conditional Homoskedasticity
Heteroskedasticity
sampling properties of the OLS estimator (cont.)

I The variance of β̂1 is given by:

σ2 σ2
var (β̂1 |X ) = P =
xi2
P
(Xi − X̄ )2
The sampling distribution of the OLS estimator
I Assume that u is distributed Normally

ui ∼ N(0, σ 2 )

I This implies that

Yi = β0 + β1 Xi + ui ∼ N(β0 + β1 Xi , σ 2 )

I We have the result that

σ2
β̂1 ∼ N(β1 , P 2 )
xi
I And, as usual, we can standardize the variable:

β̂1 − β1
z= qP ∼ N(0, 1)
σ/ xi2
The t-statistic

I Of course, we do not know the true variance σ 2 , so we must


employ an estimator
Pn
2 û 2
s = i=1 i
n−2
I And the new “z” is actually a “t-statistic”

β̂1 − β1
t= qP ∼ tn−2
s/ xi2

I The term in the denominator


qP is known as the standard error
of β̂1 : se(β̂1 ) = s/ 2
xi
Large sample properties of β̂1 (when n is large)

I OLS estimator is consistent:

plim(β̂1 ) = β1

I If we do not assume Normal errors, by Central Limit Theorem,


the large sample distribution of β̂1 is approximately normal:

σ2
β̂1 ∼ N(β1 , P 2 )
xi
I Or in terms of our typical t-statistics:

β̂1 − β1
t= qP ∼ N(0, 1)
s/ xi2
What affects precision of the slope estimate?

s
se(β̂1 ) = qP
n
i=1 (Xi − X̄ )2

I Sample size n
I Variance of independent variable
I Standard error of ûi , s
Difference Between Two Graphs?

For which β1 there will be a more precise estimate of β̂1 ?


Testing the Slope

Yi = β0 + β1 Xi + ui

Set of Statistical Hypotheses


Two-sided One-sided One-sided
H0 : β1 = β10 H0 : β1 = β10 H0 : β1 = β10
H1 : β1 6= β10 H1 : β1 > β10 H1 : β1 < β10

I β 10 is any number, does not have to be 0.


β̂1 −β10
I test statistics: t =
se(β̂1 )
I t ∼ t distribution, with n − 2 degrees of freedom, or
t ∼ N(0, 1) if n is large.
Statistical Significance

Most common hypothesis test: test of statistical significance for


the slope coefficient:

H0 : β 1 = 0
H1 : β1 6= 0
(or H1 : β1 > (<)0)

t-statistic:

β̂1
t=
se(β̂1 )

Can use rejection region approach or p-value approach.


Rejection region approach

I Step 1: compute the t-statistic


I Step 2: determine the sampling distribution of the statistic
under H0
I When n is small (say n < 25), t ∼ tn−2
I When n is large, can use either t distribution or normal
distribution: i.e., t ∼ tn−2 or t ∼ N(0, 1)
I Step 3: pick an α, say α = 5% or 1% (what’s the implication
for different values of α?)
I Step 4: according to the sampling distribution chosen in Step
2, find the critical value associated with α. This can be done
by looking up the statistical tables. The value also depends on
the nature of H1 (two-sided or one-sided).
Rejection region approach (cont.)

I Step 5: use the critical value to partition the sample space of


the statistic into acceptance region and rejection region. This
also depends on the nature of H1 (two-sided or one-sided)
I Step 6: conclusion, if the value of the statistic computed in
Step 1 falls in the rejection region, reject the Null, meaning β1
is significant at the level of α; if the statistic falls in the
acceptance region, do not reject the Null, meaning β1 is not
significant at the level of α;
p-value approach

I Recall that the p-value is the tail probability of the test


statistic value given that the null hypothesis is true
I The idea of the p-value approach is to compare the p-value of
the statistic with α.
p-value approach (cont.)

I Step 1: pick an α
I Step 2: determine the sampling distribution of the statistic (t
distribution or normal distribution)
I Step 3: compute the t-statistic, then determine its p-value
based on the sampling distribution chosen in Step 2, which is
the probablity of obtaining a result at least as extreme as that
obtained. This can be done with the help of the statistical
tables. Note the meaning of extreme depends on the nature
of H1 (one-sided or two-sided)
I Step 4: Compare the p-value obtained in Step 3 with α. If
p-value < α, reject the null, meaning β1 is significant at the
level of α.
Statistical vs Economic Significance

I Statistically Significant: The degree of correlation between


explanatory and dependent variables that is not likely observed
due to mere chance. The statistical significance of a variable
is entirely determined by the size of the t-statistic . Statistical
significance improves as sample size increases. Why?
I Statistical significance does not automatically imply that you
have found something important.
I Economically Significant (or practical significance): An effect
large enough in magnitude that decision makers would
consider it important. The economical importance is related
to the magnitude and sign of β̂1 and indicates whether the
explanatory variable has a meaningful and plausible influence
on dependent variable.
Statistical vs Economic Significance: an example

Consider a hypothetical regression of y=size of workforce of a city


(million of people) on x=the city’s annual government spending
(100 million of yuan)

Y = β0 + β1 X + u

We have found that β̂1 = 0.00001 and se(β̂1 ) = 0.000001


I Is β̂1 statistically significant?
I Consider average workforce of 2 million people. Increase in
government spending by 100 million yuan is associated with
increase in workforce in average by 10 people, or only
0.0005%! Is this result economically significant?
Confidence Interval Estimator of β1

I Point estimator is almost never equal to the true value, since


we don’t know the true value, we never know by how much
we are wrong
I As a rule, we would like to have more confidence about our
estimate of the parameter than the point estimate gives us.
I Instead of a point estimate, we turn to an interval estimator -
a range of values around the point estimator
I Our goal is to find a range of values around the point
estimate in such a way that there is a large probability that
this range would include the true population parameter
I The probability in this case is called a confidence level”, and
the range is called a “confidence interval”
Confidence Interval Estimator of β1 (cont.)

I Denote the confidence level by 1 − α


I Objective is to find an interval (lower,upper) based on β̂1 ,
such that P(lower < β1 < upper ) = 1 − α
I Since the sampling distribution of the t-statistic is

β̂1 − β1
t= ∼ tn−2
se(β̂1 )
β̂1 − β1
⇒ P(−t(α/2,n−2) < < t(α/2,n−2) ) = 1 − α
se(β̂1 )
⇒ P(β̂1 − t(α/2,n−2) se(β̂1 ) < β1 < β̂1 + t(α/2,n−2) se(β̂1 ))
=1−α
⇒ The 100 ∗ (1 − α)% CI is therefore
(β̂1 − tα/2 se(β̂1 ), β̂1 + tα/2 se(β̂1 ))
Point Prediction of Y
I Point Prediction: we can use our estimated model to predict
value of Y for any given value of X.
I For instance, let’s take an estimated model of Y and X:

Ŷ = 23.6 + 4.1 ∗ X

I What is the predicted value of Y for X = 12? easy

Ŷ = 23.6 + 4.1 ∗ 12 = 72.8

I But we know predictions are never accurate because of the


error term u
I The true model is:

Yi = β0 + β1 Xi + ui

and X is not the only factor to affect Y.


Prediction interval of Y

I Prediction Interval: for given X = x0 , an interval that


contains Y0 with 1 − α confidence.
s
1 (x0 − X̄ )2
Ŷ0 ± t(α/2,n−2) s 1 + + Pn 2
n i=1 (Xi − X̄ )

where s is the standard error of ui


I What happens to the width of the prediction interval if sample
size goes to infinity?
I What happens as x0 moves further from X̄ ?
Caution in Interpreting Regression Results

A common mistake often made when using regression analysis is to


state that x causes y to happen
I A model with a high R 2 does not automatically mean that a
change in x causes y to change in a very predictable way. It
could be just the opposite, that y causes x to change. A high
correlation goes both ways.
I It could also be that both y and x are changing in response to
a third variable that we don’t know about.
I When forming a simple regression based on a population
economic model, one has to assume that x causes y and be
able to justify this assumption on economic grounds.
Example: Inflation and Money supply
What is the replationship between inflation rate (Infl) and money
supply (M2)? Consider the regression

yt = β0 + β1 xt + ut

Where:
yt = ln(Inflt ) − ln(Inflt−1 )
xt = ln(M2t ) − ln(M2t−1 )

Results
(standard errors in parentheses)
intercept 0.0065
(0.0010)
x −0.5525
(0.0519)
R2 0.5625
Application of simple linear regression: hedging

I A hedge is an investment position intended to offset the risk of


potential losses/gains that may be incurred by an investment
I To hedge a risk is to hold assets with opposite payout
characteristics in response to change in the terminal spot price
I Example: hedging a stock price (index) using futures
Hedging application (cont.)

I Suppose I own a stock and would like to sell it in the future.


At the moment, how can I limit/minimize the risk of variation
in the stock price?
I One way is to enter into a short futures contract. That is, I
hedge a long position in the stock using a short position in
future contracts.
Hedging application (cont.)

I Hedge ratio is the ratio of the futures to underlying asset


position
I A perfect hedge requires that futures and underlying spot
asset price changes are perfectly correlated (it does NOT
alway exists.)
I For imperfect hedge, set hedge ratio to minimize variance of
hedging error
Hedging application (cont.)
I Let St represent the spot stock price and Ft represent the
futures price.
I Let ∆St represent the change in St and ∆Ft represent the
change in Ft
I Suppose the hedge ratio is β, then the hedging error is
et = ∆St − β∆Ft
I The optimal hegding ratio β ∗ is such that minimizes the
variance of et . We have
cov (∆St , ∆Ft )
β∗ = . (why?)
var (∆Ft )
I So β ∗ may be obtained from the estimated slope coefficient β
in the regression

∆St = β0 + β1 ∆Ft + ut . (why?)

You might also like