0% found this document useful (0 votes)
8 views73 pages

Linear Regression and OLS Method Explained

The document provides an overview of linear regression, focusing on the Ordinary Least Squares (OLS) method as a statistical tool for analyzing the relationship between independent and dependent variables. It discusses the Gauss-Markov assumptions necessary for OLS to yield the best linear unbiased estimator (BLUE) and highlights the importance of the error term in regression analysis. Additionally, it covers hypothesis testing, goodness-of-fit measures, and the challenges associated with using R-squared in regression models.

Uploaded by

mwakanemaamad202
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views73 pages

Linear Regression and OLS Method Explained

The document provides an overview of linear regression, focusing on the Ordinary Least Squares (OLS) method as a statistical tool for analyzing the relationship between independent and dependent variables. It discusses the Gauss-Markov assumptions necessary for OLS to yield the best linear unbiased estimator (BLUE) and highlights the importance of the error term in regression analysis. Additionally, it covers hypothesis testing, goodness-of-fit measures, and the challenges associated with using R-squared in regression models.

Uploaded by

mwakanemaamad202
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

RESEARCH METHODS 2

UNILIA

WISDOM MGOMEZULU

MWADZERA
2

LINEAR REGRESSION
What is Linear Regression Model?
 A statistical tool for evaluating the
relationship of one or more independent
variables (x) to a single continuous
dependent variables (y).

 Regression analysis characterizes the


relationship between the dependent
and independent variables by looking at
the extent, direction and strength of the
association.
Ordinary Least Squares Method (OLS):
Estimation Method
 The common estimation method of
linear regression method is called
Ordinary Least Squares (OLS) method.

 In the next few slides we will


mathematically review how the OLS is
applied in Linear Regression.
OLS as an Algebraic Tool
 Suppose we have a sample with N observations on
individual wages and some background characteristics.
 Our main interest is how in this sample wages are related
to other observables.
 Let us denote wages by y and the other K-1
characteristics by x2,…, xK
 OLS helps to come up with the best linear combination of
x2,…, xK and a constant that gives a good approximation
of wages
 This is specified as:


yˆi  1   2 x2  ...   K xK (1)
OLS as an Algebraic Tool
 The difference between an observed value yi and its
linear approximation is:

(2)y
i  yˆi  yi  [ 1   2 x2  ...   K xK ]
 Using vector notation, first we collect the x-values for
individual i in a vector xi, which includes the constant:

xi  (1 xi 2 xi 3 ... xiK ) ' (3)


 Similarly, for the coefficients’ vector:

(4)   (1 ,...,  K ) '


OLS as an Algebraic Tool
 Therefore Equation 2 can be re-written in vector form as:

yi  x  ' (5)
i

 We would like to choose values for the coefficients such


that the differences in (5) are small.

 Different measures can be used to define ‘small’ but the


most common approach is to choose such that the sum
of squared differences is as small as possible hence the
name “Least Squares” or formally Ordinary Least Squares
(OLS).
OLS as an Algebraic Tool
 Thus OLS minimizes the following objective
function:N

S ( )  (y  x' )2
i 1
i i
(6)

 Taking the squares makes sure that positive


and negative deviations do not take each
other when taking the summation.
 The above approach (the minimization of
the squared error differences) is what is
actually referred to as the Ordinary Least
Squares (OLS) Approach
Simple Linear Regression
 In case where K = 2 we only have one
regressor and a constant.
 In this case the observations (yi, xi) can
be drawn on a two-dimensional graph
as shown on the next slide.
Simple Linear Regression: Plotting

•The fitted regression is represented by the straight line, the dots


represent the actual observation.
•The vertical distances between the observations and the fitted line
represents the best linear approximation of y and x, i.e. error sum of
squares.

•From the above two graphs, which is a better approximation of y


Small Sample Properties of the OLS
Estimator: The Gauss-Markov Assumptions

 Full ideal conditions of OLS to be met for the


model to be applicable.
 One has to be aware of the ideal conditions
and their violation to be able to control the
deviations from these conditions and render
the results unbiased or at least consistent
 Gauss-Markov assumptions are about the
error term and the explanatory variables.
 The next four assumptions are what
constitute the Gauss-Markov assumptions
set.
Gauss-Markov Assumptions
 A1: The expected value of the error terms is zero for all
observations: E(i) = 0, i = 1,…,N

 A2: The error terms are not correlated with the


regressors. Alternatively, the explanatory variables are all
exogenous:
{i , …, N } and {xi,…,xN) are independent

 A3: Homoskedasticity: The variance of the error term is


constant in all x over time: V(i) = 2) , i = 1,…,N

 A4: Zero correlation of between different error terms


cov(i, j) = 0, , i,j = 1,…,N, i≠j
Properties of the OLS Estimator: Implications of
Gauss-Markov Assumptions

Under A1 – A4 the OLS estimator is the best


linear unbiased estimator (BLUE).
Best: variance of OLS estimator is minimal, smaller
than the variance of any other estimator.
Linear: if the relationship between x and y is not
linear, OLS is not applicable
Unbiased:
This means that, in repeated sampling, we can expect our
estimator is on average equal to its true value.
 That is, the expected values of the estimates are equal to
the true values describing the relationship between x and y
More Assumptions of OLS

 A5: The explanatory variables are not


correlated with each other.

 A6: Normal distribution: The errors are


jointly normally distributed.
 Without this assumption, it means
assumptions A1, A3 and A4 are violated.
 Thus assumptions A2 and A6 are sufficient
for BLUE OLS estimates.
IMPORTANCE OF THE ERROR
TERM
 1. Vagueness of theory
 The theory, if any, determining the
behavior of Y may be, and often is,
incomplete.
 [Link] of data
 Even if we know what some of the
excluded variables are and therefore
consider a multiple regression rather
than a simple regression, we may not
have quantitative information about
some of these variables
 3. Core variables vs Peripheral variables
 Assume that we want to study consumption-
income relationship and economic theory guides
us that explanatory variables include income,
number of children per household, religion,
education and geographical location.
 It is quite possible that the joint influence of all or
some of these variables may be so small and at
best nonsystematic or random to the extent that
it does not pay to introduce them into the model
explicitly
 [Link] randomness in human behavior
 Even if we succeed in introducing all the relevant
variables into the model, there is bound to be some
“intrinsic” randomness in individual Y’s that cannot
be explained no matter how hard we try. The
disturbances, the ε’s, may very well reflect this
intrinsic randomness.
 5. Poor Proxy Variables
 Although the classical regression model
assumes that the variables Y and X are
measured accurately. In practice the
data may be plagued by errors of
measurement and data on some
variable sis not available.
 But since data on these variables are
not directly observable, in practice we
use proxy variables. For example
expenditure may be used as a proxy for
income. Obviously expenditure may not
always be equal to income as some
people may be saving or donating to
others.
 6. Principle of parsimony
A regression model need to be as simple as possible
7. Wrong functional form
Even if we have theoretically correct variables
explaining a phenomenon and even if we can obtain
data on these variables, very often we do not know
the form of the functional relationship between the
regressand and the regressors
METHOD OF LEAST SQUARES

 Example
 Given the following values of y;
 70, 65, 90, 95, 110, 115, 120, 140,155, 260
 Compute the following
 a. Error sum of squares
 b. Mean sum of squares (Variance)
 c. Standard deviation
ANSWERS
EXAMPLE
EXAM-PLE
WHAT ARE WE EXACTLY
TALKING ABOUT
ASSUMPTIONS OF THE LINEAR
REGRESSION MODEL
[Link]
Y should be linear in parameters
2. Independent values of X
Values taken by the regressor x are fixed in
repeated sampling
[Link] error terms have a mean of zero
i.e E(e)=0
4. Homoskedasticy
Variance of the error term is constant
CONTD
5. No autocorrelation
Cov (ei, ej)=0
6. Number of obs n must be greater than
the number of parameters to be estimated
7. Variability of x
8. No multicollinearity
[Link] specification bias
USING STATA
INTEPRETATION OF OUTPUT
CORRELATION
ANALYSIS OF VARIANCE
 If the variation among the samples (due to
treatment) is equal to variation within samples
(due to error), it means that the treatment did not
have any effect and the F statistic will be equal
to one.

 F statistic close to one, shows that there is


insufficient evidence for us to reject the null
hypothesis of equality of means. But if the F-
statistic is very large, it shows that the variation
due to treatment in much larger than variation
due to error.

 In such a case, we reject the null hypothesis


especially if the calculated F statistic is larger than
the critical value.
Inference
• Inference is the generalization of the regression results
for the sample under observation to the population
from where the sample came from.
• Significance tests are designed to check if inferences
are valid or not.
If a coefficient is significant (p-value<0.10,
0.05, 0.01) then you can draw the
inferences.
Inference
• But …
Only in case the sample matches the
characteristics of the population
This is normally the case if all Gauss-Markov
assumptions of OLS are met by the data
under observation
If this is not the case the standard errors of
the coefficients might be biased and
therefore the result of the significance test
might be wrong as well leading to false
conclusions.
OLS Goodness-of-fit
 R2 is used to test goodness-of fit in an OLS model.
 R2 measures the proportion of the variation in the
dependent variable (y) that is being explained by
the independent variable(s). In other words, R2
measures the explanatory power of the model.
 It is found by the following formula:

R 
2 regression sum of squares (SSR )

 i
(Yˆ  Y ) 2

total sum of squares (SST )  i


(Y  Y ) 2

PRACTICE ASSIGNMENT: Calculate this in Stata using any given


data
Properties of R2
 Lies between 0 and 1 , often R2 is multiplied by 100 to get
the percentage of the variation in y that is explained by x.
 Generally, the bigger the R2 the more the explanatory
power of the model.
 However, a smaller R2 does not automatically imply that the
estimated model is incorrect or useless: it just indicates the
relative (un)importance of the explanatory variables in
explaining the dependent variable.
 This may also imply that there are other variables (factors)
omitted in the model that better explain the dependent variable
 If the data points all lie on the same line, OLS provides a
perfect fit to the data. In this case R2 equals 1 or 100%.
 Adding further explanatory variables lead to an increase in
the R2
Challenge in Using R2
 The major challenge is that R2 is sensitive to the
number of independent variables included in a
regression model. The greater the number of
independent variables the higher the R2 is likely to
be, i.e. the more the independent variables we add
(even if they are not valid), the bigger the R2
becomes.
 This problem arises because R2 does not take into
account the number of degrees of freedom, i.e. R2
is given by the following formula:

R2 
regression sum of squares ( SSR )

 i
(Yˆ  Y ) 2

total sum of squares (SST )  i


(Y  Y ) 2
Challenge in Using R2:
Solution
 To solve this problem, when testing the validity of a
regression model we use the Corrected or
Adjusted R2 (denoted as R2 ) which takes
degrees of freedom into account as given in the
following formula:
(n  1)
R  1  (1  R )
2 2

(n  k  1)
 Where R2 = the coefficient of determination; n =
sample size; and k = number of independent
variables.

PRACTICE ASSIGNMENT: Use the previously


calculated R2 to recalculate adjusted R2
OLS Hypothesis Testing
 Several hypotheses can be tested using
the OLS regression.
 The common ones are those that are
used to check the validity of the overall
model and individual coefficients,
respectively:
F-test for overall significance of the
model (A joint test of significance of
regression coefficients)
t-test for individual coefficients.
Joint Test of Significance of
Regression Coefficients (F-test)
 This measures the overall significance of the model.
 The F-statistic tests:
 The null hypothesis (NH): 1   2  ...   k  0 , i.e. that
the coefficients are equal to zero implying that there is no
relationship between the dependent variable and the
independent variables
 The alternate hypothesis (AH): i  0 , i.e. at least one of the
coefficients is not equal to zero.
 Note that if the NH is accepted it implies that there is no
relation between the dependent and independent
variables even if the coefficients ‘appear’ not to zero.
 If the NH is rejected, i.e. the F-test is valid, then the overall
model is valid and we can go ahead and check the
other two tests (R2 and t-statistic).
How Do We Conduct F-test?
In this lecture we will concentrate on computer
use but you can also ‘manually’ calculate it
using formula (that’s beyond the scope of this
course).
When you are running a regression model in a
computer, by default it gives you all the validity
tests including the F-test. What is important to
know how to use and interpret them.
The F-test is given in the ANOVA [analysis of
variance] tables.
The next slide will show how the F-test results
are presented in SPSS
F-Test Outputs and Interpretation
59 Example 1:

•Your attention is drawn to the last two columns of the above table.
The last but one column (the F column) gives us the estimated F-value.
•When using a computer, for the F-test to be significant, the value in
the last column of the ANOVA table should be less than the level (1%,
5% and 10%) at which you are testing the F-value.
•The above figure (0.027 or 2.7%) is more than 0.01 or 1%. This means
the F-value is not significant at 1% and we need to check at the lower
level of 5% (0.05).
• Certainly, 0.027 (2.7%) is less than 0.05 (5%). Therefore, overall, the
model is significant at 5% (p<0.05). We therefore reject the null
hypothesis that the coefficients are equal to zero and conclude that
the model is valid.
F-Test Outputs and Interpretation
Example 2:
ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 237.308 2 118.654 23.941 .001 a
Residual 34.692 7 4.956
Total 272.000 9
a. Predictors: (Constant), X2, X1
b. Dependent Variable: Y

•Similarly, this model is significant at 1% (p<0.01) (i.e.


0.001<0.01). We therefore reject the null hypothesis and
accept the alternate hypothesis that none of the
coefficients is equal to zero. Thus the model is valid.
•The F-test is a necessary but not a sufficient test for
checking validity of a model. To sufficiently check
regression model validity, we need to check the other
two tests of R2 and t-statistic.
T-test

This is a test of significance for individual


explanatory variables and the constant term
within a model.
The t-statistic is found by coefficient
(unstandardized) divided by the standard error.
The t-statistic is calculated for the constant and
all the other coefficients.
The following examples demonstrate how the t-
test is presented in SPSS
SPSS Outputs for Multiple Regression: Example
Regression
Variables Entered/Removedb
62
Variables Variables
Model Entered Removed Method
1 X2, X1a . Enter
a. All requested vari ables entered.
b. Dependent Vari able: Y

Model Summary

Adjusted Std. Error of


Model R R Square R Square the Estimate
1 .934a .872 .836 2.22621
a. Predictors: (Constant), X2, X1

ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 237.308 2 118.654 23.941 .001 a
Residual 34.692 7 4.956
Total 272.000 9
a. Predictors: (Constant), X2, X1
b. Dependent Variable: Y

Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) .385 3.042 .126 .903
X1 2.176 .329 .895 6.604 .000
X2 .963 .642 .203 1.499 .178
a. Dependent Vari able: Y
Stata OLS Output
Source SS df MS Number of obs = 10
F(2, 7) = 23.94
Model 237.307999 2 118.653999 Prob > F = 0.0007
Residual 34.6920014 7 4.95600021 R-squared = 0.8725
Adj R-squared = 0.8360
Total 272 9 30.2222222 Root MSE = 2.2262

exp Coef. Std. Err. t P>|t| [95% Conf. Interval]

income 2.175534 .3294222 6.60 0.000 1.396574 2.954494


fsize .9627217 .6423018 1.50 0.178 -.5560807 2.481524
_cons .3847267 3.041991 0.13 0.903 -6.80844 7.577893
Presentation of Results in a Report
 Different styles are used to present the results
and there is no one single style that is the most
appropriate.
 Experience has shown that most people despite
coming up with a good regression estimate (this
is also true for the other analyses) do not know
what and how to present the findings.
 The computer gives you so much information.
Not all this information is suitable for the report.
We have to ‘sieve out’ which information to
present and in what style.
 In the next is one of the suggested ways of
presenting the results.
Results Presentation
From Example 1:
Table #: Regression outputs of Y on X
Variable Coefficient Std Error t-statistic
Constant 11.900 1.066 11.162**
x -0.650 0.161 -4.044*
F-value = 16.355*; R2 = 0.845; R2 adjusted = 0.793
* = significant at 5% (p<0.05); ** = Significant at 1% (p<0.01)

•See how all that information has been compressed in the above
table! The table above presents everything that was contained in the
four tables in example 1.
•Note the use of stars (*) to show significance of F-value and t-
statistics. Normally, we use more stars for variables that are significant
at higher levels and then reduce the number of stars. For instance, if
we had significance levels at all the three levels, then we would have
used *** for 1%, ** for 5% and * for 10%.
•There should be a note just under the results table explaining the
meaning of the stars.
Results Presentation
From Example 2:

Table #: Regression Output of Expenditure on Annual Net Income and


Family Size
Variable Coefficient Std Error t-statistic
Constant 0.385 3.042 0.126
Annual Income 2.176 0.329 6.604*
(X1)
Family Size (X2) 0.963 0.642 1.499
F-value = 23.941*; R2 = 0.872; R2 Adjusted = 0.836
* = significant at 1% (p<0.01)

 Note that there are no stars on the coefficient and X2


because the are not significant at all the three levels.
 Also note that only one star has been used because
the only level of significance is one (10%).
Type I and Type II errors in Hypothesis Testing:
Size and Power

 When a hypothesis is statistically tested two types of


error can be made:
1. Type I Error: When we reject the null hypothesis
while it is actually true.
2. Type II Error: The null hypothesis is not rejected
while the alternative is true.
Controlling for Type I and II Errors:
Size and Power of Test
 The probability of type I error is directly
controlled by the researcher through the
choice of significance level, .
 For example, when a test is performed at
the 5% level, the probability of rejecting
the null hypothesis while it is true is 5%.
 This probability (significance level) is called
size of test.
 The reverse probability, that is, the
probability of rejecting the null hypothesis
when it is false is called power of test.
Tradeoff Between Type I and
Type II Errors
 Selection of significance levels increase
or decrease the probability of type I and
II errors:
The lower the significance level the
lower the probability of type I and the
higher the probability of type II errors.
Implications of Type I and
Type II Errors on Sample size
 The probability that we reject the null hypothesis
depends upon the standard error of OLS estimator and
eventually sample size.
 Ceteris paribus, the larger the sample, the smaller the
standard error and more likely we are to reject the null
hypothesis.
 This implies that Type I error becomes increasingly unlikely if we
increase the sample size.
 To compensate for this, researchers typically reduce the
probability of Type I error by lowering the size of their size of
tests (significance level).
 It is therefore recommended that in large samples, choose a
smaller significance level (e.g. 1%).
 Similarly in very small samples we may prefer to work with
higher significance level of 10%)
Asymptotic (Large) Properties of
the OLS Estimator
 Asymptotic theory refers to the question
of what happens if, hypothetically, the
sample size grows infinitely large.

 Asymptotically (when samples grow


infinitely large), econometric estimators
usually have nice properties.
One of the important asymptotic
assumptions is consistency
Consistency
 If we increase the sample size, the probability that
our estimator is some positive number away from the
true value become increasingly unlikely.
That is the econometric estimates are closer to
their true values if we have very large samples.
 In short, generally, the bigger the sample size, the
more relaxed the small property assumptions (Gauss-
Markov), the more reliable the estimates.
Bibliography
• Verbeek, M. (2004). A Guide to Modern
Econometrics. 2nd Edition. West Sussex,
England: John Wiley and Sons Ltd.
 [Link]
331-0102-econometrics-i-fall-2011-lecture-
[Link] (Accessed on 1
May 2017)
 Chilongo.T., (2019)

You might also like