0% found this document useful (0 votes)

8 views73 pages

Linear Regression and OLS Method Explained

The document provides an overview of linear regression, focusing on the Ordinary Least Squares (OLS) method as a statistical tool for analyzing the relationship between independent and dependent variables. It discusses the Gauss-Markov assumptions necessary for OLS to yield the best linear unbiased estimator (BLUE) and highlights the importance of the error term in regression analysis. Additionally, it covers hypothesis testing, goodness-of-fit measures, and the challenges associated with using R-squared in regression models.

Uploaded by

mwakanemaamad202

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views73 pages

Linear Regression and OLS Method Explained

Uploaded by

mwakanemaamad202

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

RESEARCH METHODS 2

UNILIA

WISDOM MGOMEZULU

MWADZERA
2

LINEAR REGRESSION
What is Linear Regression Model?
 A statistical tool for evaluating the
relationship of one or more independent
variables (x) to a single continuous
dependent variables (y).

 Regression analysis characterizes the

relationship between the dependent
and independent variables by looking at
the extent, direction and strength of the
association.
Ordinary Least Squares Method (OLS):
Estimation Method
 The common estimation method of
linear regression method is called
Ordinary Least Squares (OLS) method.

 In the next few slides we will

mathematically review how the OLS is
applied in Linear Regression.
OLS as an Algebraic Tool
 Suppose we have a sample with N observations on
individual wages and some background characteristics.
 Our main interest is how in this sample wages are related
to other observables.
 Let us denote wages by y and the other K-1
characteristics by x2,…, xK
 OLS helps to come up with the best linear combination of
x2,…, xK and a constant that gives a good approximation
of wages
 This is specified as:


yˆi  1   2 x2  ...   K xK (1)
OLS as an Algebraic Tool
 The difference between an observed value yi and its
linear approximation is:

(2)y
i  yˆi  yi  [ 1   2 x2  ...   K xK ]
 Using vector notation, first we collect the x-values for
individual i in a vector xi, which includes the constant:

xi  (1 xi 2 xi 3 ... xiK ) ' (3)

 Similarly, for the coefficients’ vector:

(4)   (1 ,...,  K ) '

OLS as an Algebraic Tool
 Therefore Equation 2 can be re-written in vector form as:

yi  x  ' (5)
i

 We would like to choose values for the coefficients such

that the differences in (5) are small.

 Different measures can be used to define ‘small’ but the

most common approach is to choose such that the sum
of squared differences is as small as possible hence the
name “Least Squares” or formally Ordinary Least Squares
(OLS).
OLS as an Algebraic Tool
 Thus OLS minimizes the following objective
function:N

S ( )  (y  x' )2
i 1
i i
(6)

 Taking the squares makes sure that positive

and negative deviations do not take each
other when taking the summation.
 The above approach (the minimization of
the squared error differences) is what is
actually referred to as the Ordinary Least
Squares (OLS) Approach
Simple Linear Regression
 In case where K = 2 we only have one
regressor and a constant.
 In this case the observations (yi, xi) can
be drawn on a two-dimensional graph
as shown on the next slide.
Simple Linear Regression: Plotting

•The fitted regression is represented by the straight line, the dots

represent the actual observation.
•The vertical distances between the observations and the fitted line
represents the best linear approximation of y and x, i.e. error sum of
squares.

•From the above two graphs, which is a better approximation of y

Small Sample Properties of the OLS
Estimator: The Gauss-Markov Assumptions

 Full ideal conditions of OLS to be met for the

model to be applicable.
 One has to be aware of the ideal conditions
and their violation to be able to control the
deviations from these conditions and render
the results unbiased or at least consistent
 Gauss-Markov assumptions are about the
error term and the explanatory variables.
 The next four assumptions are what
constitute the Gauss-Markov assumptions
set.
Gauss-Markov Assumptions
 A1: The expected value of the error terms is zero for all
observations: E(i) = 0, i = 1,…,N

 A2: The error terms are not correlated with the

regressors. Alternatively, the explanatory variables are all
exogenous:
{i , …, N } and {xi,…,xN) are independent

 A3: Homoskedasticity: The variance of the error term is

constant in all x over time: V(i) = 2) , i = 1,…,N

 A4: Zero correlation of between different error terms

cov(i, j) = 0, , i,j = 1,…,N, i≠j
Properties of the OLS Estimator: Implications of
Gauss-Markov Assumptions

Under A1 – A4 the OLS estimator is the best

linear unbiased estimator (BLUE).
Best: variance of OLS estimator is minimal, smaller
than the variance of any other estimator.
Linear: if the relationship between x and y is not
linear, OLS is not applicable
Unbiased:
This means that, in repeated sampling, we can expect our
estimator is on average equal to its true value.
 That is, the expected values of the estimates are equal to
the true values describing the relationship between x and y
More Assumptions of OLS

 A5: The explanatory variables are not

correlated with each other.

 A6: Normal distribution: The errors are

jointly normally distributed.
 Without this assumption, it means
assumptions A1, A3 and A4 are violated.
 Thus assumptions A2 and A6 are sufficient
for BLUE OLS estimates.
IMPORTANCE OF THE ERROR
TERM
 1. Vagueness of theory
 The theory, if any, determining the
behavior of Y may be, and often is,
incomplete.
 [Link] of data
 Even if we know what some of the
excluded variables are and therefore
consider a multiple regression rather
than a simple regression, we may not
have quantitative information about
some of these variables
 3. Core variables vs Peripheral variables
 Assume that we want to study consumption-
income relationship and economic theory guides
us that explanatory variables include income,
number of children per household, religion,
education and geographical location.
 It is quite possible that the joint influence of all or
some of these variables may be so small and at
best nonsystematic or random to the extent that
it does not pay to introduce them into the model
explicitly
 [Link] randomness in human behavior
 Even if we succeed in introducing all the relevant
variables into the model, there is bound to be some
“intrinsic” randomness in individual Y’s that cannot
be explained no matter how hard we try. The
disturbances, the ε’s, may very well reflect this
intrinsic randomness.
 5. Poor Proxy Variables
 Although the classical regression model
assumes that the variables Y and X are
measured accurately. In practice the
data may be plagued by errors of
measurement and data on some
variable sis not available.
 But since data on these variables are
not directly observable, in practice we
use proxy variables. For example
expenditure may be used as a proxy for
income. Obviously expenditure may not
always be equal to income as some
people may be saving or donating to
others.
 6. Principle of parsimony
A regression model need to be as simple as possible
7. Wrong functional form
Even if we have theoretically correct variables
explaining a phenomenon and even if we can obtain
data on these variables, very often we do not know
the form of the functional relationship between the
regressand and the regressors
METHOD OF LEAST SQUARES

 Example
 Given the following values of y;
 70, 65, 90, 95, 110, 115, 120, 140,155, 260
 Compute the following
 a. Error sum of squares
 b. Mean sum of squares (Variance)
 c. Standard deviation
ANSWERS
EXAMPLE
EXAM-PLE
WHAT ARE WE EXACTLY
TALKING ABOUT
ASSUMPTIONS OF THE LINEAR
REGRESSION MODEL
[Link]
Y should be linear in parameters
2. Independent values of X
Values taken by the regressor x are fixed in
repeated sampling
[Link] error terms have a mean of zero
i.e E(e)=0
4. Homoskedasticy
Variance of the error term is constant
CONTD
5. No autocorrelation
Cov (ei, ej)=0
6. Number of obs n must be greater than
the number of parameters to be estimated
7. Variability of x
8. No multicollinearity
[Link] specification bias
USING STATA
INTEPRETATION OF OUTPUT
CORRELATION
ANALYSIS OF VARIANCE
 If the variation among the samples (due to
treatment) is equal to variation within samples
(due to error), it means that the treatment did not
have any effect and the F statistic will be equal
to one.

 F statistic close to one, shows that there is

insufficient evidence for us to reject the null
hypothesis of equality of means. But if the F-
statistic is very large, it shows that the variation
due to treatment in much larger than variation
due to error.

 In such a case, we reject the null hypothesis

especially if the calculated F statistic is larger than
the critical value.
Inference
• Inference is the generalization of the regression results
for the sample under observation to the population
from where the sample came from.
• Significance tests are designed to check if inferences
are valid or not.
If a coefficient is significant (p-value<0.10,
0.05, 0.01) then you can draw the
inferences.
Inference
• But …
Only in case the sample matches the
characteristics of the population
This is normally the case if all Gauss-Markov
assumptions of OLS are met by the data
under observation
If this is not the case the standard errors of
the coefficients might be biased and
therefore the result of the significance test
might be wrong as well leading to false
conclusions.
OLS Goodness-of-fit
 R2 is used to test goodness-of fit in an OLS model.
 R2 measures the proportion of the variation in the
dependent variable (y) that is being explained by
the independent variable(s). In other words, R2
measures the explanatory power of the model.
 It is found by the following formula:

R 
2 regression sum of squares (SSR )

 i
(Yˆ  Y ) 2

total sum of squares (SST )  i

(Y  Y ) 2

PRACTICE ASSIGNMENT: Calculate this in Stata using any given

data
Properties of R2
 Lies between 0 and 1 , often R2 is multiplied by 100 to get
the percentage of the variation in y that is explained by x.
 Generally, the bigger the R2 the more the explanatory
power of the model.
 However, a smaller R2 does not automatically imply that the
estimated model is incorrect or useless: it just indicates the
relative (un)importance of the explanatory variables in
explaining the dependent variable.
 This may also imply that there are other variables (factors)
omitted in the model that better explain the dependent variable
 If the data points all lie on the same line, OLS provides a
perfect fit to the data. In this case R2 equals 1 or 100%.
 Adding further explanatory variables lead to an increase in
the R2
Challenge in Using R2
 The major challenge is that R2 is sensitive to the
number of independent variables included in a
regression model. The greater the number of
independent variables the higher the R2 is likely to
be, i.e. the more the independent variables we add
(even if they are not valid), the bigger the R2
becomes.
 This problem arises because R2 does not take into
account the number of degrees of freedom, i.e. R2
is given by the following formula:

R2 
regression sum of squares ( SSR )

 i
(Yˆ  Y ) 2

total sum of squares (SST )  i

(Y  Y ) 2
Challenge in Using R2:
Solution
 To solve this problem, when testing the validity of a
regression model we use the Corrected or
Adjusted R2 (denoted as R2 ) which takes
degrees of freedom into account as given in the
following formula:
(n  1)
R  1  (1  R )
2 2

(n  k  1)
 Where R2 = the coefficient of determination; n =
sample size; and k = number of independent
variables.

PRACTICE ASSIGNMENT: Use the previously

calculated R2 to recalculate adjusted R2
OLS Hypothesis Testing
 Several hypotheses can be tested using
the OLS regression.
 The common ones are those that are
used to check the validity of the overall
model and individual coefficients,
respectively:
F-test for overall significance of the
model (A joint test of significance of
regression coefficients)
t-test for individual coefficients.
Joint Test of Significance of
Regression Coefficients (F-test)
 This measures the overall significance of the model.
 The F-statistic tests:
 The null hypothesis (NH): 1   2  ...   k  0 , i.e. that
the coefficients are equal to zero implying that there is no
relationship between the dependent variable and the
independent variables
 The alternate hypothesis (AH): i  0 , i.e. at least one of the
coefficients is not equal to zero.
 Note that if the NH is accepted it implies that there is no
relation between the dependent and independent
variables even if the coefficients ‘appear’ not to zero.
 If the NH is rejected, i.e. the F-test is valid, then the overall
model is valid and we can go ahead and check the
other two tests (R2 and t-statistic).
How Do We Conduct F-test?
In this lecture we will concentrate on computer
use but you can also ‘manually’ calculate it
using formula (that’s beyond the scope of this
course).
When you are running a regression model in a
computer, by default it gives you all the validity
tests including the F-test. What is important to
know how to use and interpret them.
The F-test is given in the ANOVA [analysis of
variance] tables.
The next slide will show how the F-test results
are presented in SPSS
F-Test Outputs and Interpretation
59 Example 1:

•Your attention is drawn to the last two columns of the above table.
The last but one column (the F column) gives us the estimated F-value.
•When using a computer, for the F-test to be significant, the value in
the last column of the ANOVA table should be less than the level (1%,
5% and 10%) at which you are testing the F-value.
•The above figure (0.027 or 2.7%) is more than 0.01 or 1%. This means
the F-value is not significant at 1% and we need to check at the lower
level of 5% (0.05).
• Certainly, 0.027 (2.7%) is less than 0.05 (5%). Therefore, overall, the
model is significant at 5% (p<0.05). We therefore reject the null
hypothesis that the coefficients are equal to zero and conclude that
the model is valid.
F-Test Outputs and Interpretation
Example 2:
ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 237.308 2 118.654 23.941 .001 a
Residual 34.692 7 4.956
Total 272.000 9
a. Predictors: (Constant), X2, X1
b. Dependent Variable: Y

•Similarly, this model is significant at 1% (p<0.01) (i.e.

0.001<0.01). We therefore reject the null hypothesis and
accept the alternate hypothesis that none of the
coefficients is equal to zero. Thus the model is valid.
•The F-test is a necessary but not a sufficient test for
checking validity of a model. To sufficiently check
regression model validity, we need to check the other
two tests of R2 and t-statistic.
T-test

This is a test of significance for individual

explanatory variables and the constant term
within a model.
The t-statistic is found by coefficient
(unstandardized) divided by the standard error.
The t-statistic is calculated for the constant and
all the other coefficients.
The following examples demonstrate how the t-
test is presented in SPSS
SPSS Outputs for Multiple Regression: Example
Regression
Variables Entered/Removedb
62
Variables Variables
Model Entered Removed Method
1 X2, X1a . Enter
a. All requested vari ables entered.
b. Dependent Vari able: Y

Model Summary

Adjusted Std. Error of

Model R R Square R Square the Estimate
1 .934a .872 .836 2.22621
a. Predictors: (Constant), X2, X1

ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 237.308 2 118.654 23.941 .001 a
Residual 34.692 7 4.956
Total 272.000 9
a. Predictors: (Constant), X2, X1
b. Dependent Variable: Y

Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) .385 3.042 .126 .903
X1 2.176 .329 .895 6.604 .000
X2 .963 .642 .203 1.499 .178
a. Dependent Vari able: Y
Stata OLS Output
Source SS df MS Number of obs = 10
F(2, 7) = 23.94
Model 237.307999 2 118.653999 Prob > F = 0.0007
Residual 34.6920014 7 4.95600021 R-squared = 0.8725
Adj R-squared = 0.8360
Total 272 9 30.2222222 Root MSE = 2.2262

exp Coef. Std. Err. t P>|t| [95% Conf. Interval]

income 2.175534 .3294222 6.60 0.000 1.396574 2.954494

fsize .9627217 .6423018 1.50 0.178 -.5560807 2.481524
_cons .3847267 3.041991 0.13 0.903 -6.80844 7.577893
Presentation of Results in a Report
 Different styles are used to present the results
and there is no one single style that is the most
appropriate.
 Experience has shown that most people despite
coming up with a good regression estimate (this
is also true for the other analyses) do not know
what and how to present the findings.
 The computer gives you so much information.
Not all this information is suitable for the report.
We have to ‘sieve out’ which information to
present and in what style.
 In the next is one of the suggested ways of
presenting the results.
Results Presentation
From Example 1:
Table #: Regression outputs of Y on X
Variable Coefficient Std Error t-statistic
Constant 11.900 1.066 11.162**
x -0.650 0.161 -4.044*
F-value = 16.355*; R2 = 0.845; R2 adjusted = 0.793
* = significant at 5% (p<0.05); ** = Significant at 1% (p<0.01)

•See how all that information has been compressed in the above
table! The table above presents everything that was contained in the
four tables in example 1.
•Note the use of stars (*) to show significance of F-value and t-
statistics. Normally, we use more stars for variables that are significant
at higher levels and then reduce the number of stars. For instance, if
we had significance levels at all the three levels, then we would have
used *** for 1%, ** for 5% and * for 10%.
•There should be a note just under the results table explaining the
meaning of the stars.
Results Presentation
From Example 2:

Table #: Regression Output of Expenditure on Annual Net Income and

Family Size
Variable Coefficient Std Error t-statistic
Constant 0.385 3.042 0.126
Annual Income 2.176 0.329 6.604*
(X1)
Family Size (X2) 0.963 0.642 1.499
F-value = 23.941*; R2 = 0.872; R2 Adjusted = 0.836
* = significant at 1% (p<0.01)

 Note that there are no stars on the coefficient and X2

because the are not significant at all the three levels.
 Also note that only one star has been used because
the only level of significance is one (10%).
Type I and Type II errors in Hypothesis Testing:
Size and Power

 When a hypothesis is statistically tested two types of

error can be made:
1. Type I Error: When we reject the null hypothesis
while it is actually true.
2. Type II Error: The null hypothesis is not rejected
while the alternative is true.
Controlling for Type I and II Errors:
Size and Power of Test
 The probability of type I error is directly
controlled by the researcher through the
choice of significance level, .
 For example, when a test is performed at
the 5% level, the probability of rejecting
the null hypothesis while it is true is 5%.
 This probability (significance level) is called
size of test.
 The reverse probability, that is, the
probability of rejecting the null hypothesis
when it is false is called power of test.
Tradeoff Between Type I and
Type II Errors
 Selection of significance levels increase
or decrease the probability of type I and
II errors:
The lower the significance level the
lower the probability of type I and the
higher the probability of type II errors.
Implications of Type I and
Type II Errors on Sample size
 The probability that we reject the null hypothesis
depends upon the standard error of OLS estimator and
eventually sample size.
 Ceteris paribus, the larger the sample, the smaller the
standard error and more likely we are to reject the null
hypothesis.
 This implies that Type I error becomes increasingly unlikely if we
increase the sample size.
 To compensate for this, researchers typically reduce the
probability of Type I error by lowering the size of their size of
tests (significance level).
 It is therefore recommended that in large samples, choose a
smaller significance level (e.g. 1%).
 Similarly in very small samples we may prefer to work with
higher significance level of 10%)
Asymptotic (Large) Properties of
the OLS Estimator
 Asymptotic theory refers to the question
of what happens if, hypothetically, the
sample size grows infinitely large.

 Asymptotically (when samples grow

infinitely large), econometric estimators
usually have nice properties.
One of the important asymptotic
assumptions is consistency
Consistency
 If we increase the sample size, the probability that
our estimator is some positive number away from the
true value become increasingly unlikely.
That is the econometric estimates are closer to
their true values if we have very large samples.
 In short, generally, the bigger the sample size, the
more relaxed the small property assumptions (Gauss-
Markov), the more reliable the estimates.
Bibliography
• Verbeek, M. (2004). A Guide to Modern
Econometrics. 2nd Edition. West Sussex,
England: John Wiley and Sons Ltd.
 [Link]
331-0102-econometrics-i-fall-2011-lecture-
[Link] (Accessed on 1
May 2017)
 Chilongo.T., (2019)

AAE 75202 Topic1c Multiple Regression Model March2025
No ratings yet
AAE 75202 Topic1c Multiple Regression Model March2025
39 pages
Understanding Ordinary Least Squares (OLS)
100% (1)
Understanding Ordinary Least Squares (OLS)
47 pages
Understanding Simple Regression Model
No ratings yet
Understanding Simple Regression Model
24 pages
Poisson Regression Model Overview
No ratings yet
Poisson Regression Model Overview
9 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
17 pages
Understanding Simple Regression Analysis
No ratings yet
Understanding Simple Regression Analysis
42 pages
Understanding Simple Linear Regression
No ratings yet
Understanding Simple Linear Regression
47 pages
Understanding Linear Regression Models
No ratings yet
Understanding Linear Regression Models
41 pages
Econometrics: Regression Analysis Overview
No ratings yet
Econometrics: Regression Analysis Overview
8 pages
Class 1-Simple Regression
No ratings yet
Class 1-Simple Regression
39 pages
Econometrics Notes: Regression Analysis
No ratings yet
Econometrics Notes: Regression Analysis
15 pages
Econometric Regression Analysis Overview
No ratings yet
Econometric Regression Analysis Overview
12 pages
Classical Linear Regression Overview
No ratings yet
Classical Linear Regression Overview
50 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
31 pages
Skedacity in Regression Analysis
No ratings yet
Skedacity in Regression Analysis
25 pages
Simple Regression Model Overview
No ratings yet
Simple Regression Model Overview
41 pages
Understanding OLS Regression Models
No ratings yet
Understanding OLS Regression Models
10 pages
Simple Linear Regression Explained
No ratings yet
Simple Linear Regression Explained
49 pages
Lecture 2
No ratings yet
Lecture 2
12 pages
Clutch Regression Guide
No ratings yet
Clutch Regression Guide
10 pages
Bivariate Regression Analysis Overview
100% (1)
Bivariate Regression Analysis Overview
54 pages
Viva
No ratings yet
Viva
15 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
17 pages
Simple Regression Model Overview
No ratings yet
Simple Regression Model Overview
14 pages
Multiple Regression Analysis Overview
No ratings yet
Multiple Regression Analysis Overview
18 pages
Econometrics Regression Analysis Guide
No ratings yet
Econometrics Regression Analysis Guide
13 pages
Multiple Regression Analysis Explained
No ratings yet
Multiple Regression Analysis Explained
26 pages
Understanding Ordinary Least Squares
No ratings yet
Understanding Ordinary Least Squares
21 pages
OLS Regression Fundamentals by Dr. Mitiku
No ratings yet
OLS Regression Fundamentals by Dr. Mitiku
80 pages
Conditional Mean Independence in Regression
No ratings yet
Conditional Mean Independence in Regression
5 pages
Understanding Simple Linear Regression
No ratings yet
Understanding Simple Linear Regression
61 pages
Simple Linear Regression Model Overview
No ratings yet
Simple Linear Regression Model Overview
12 pages
Expected Value of OLS Estimators
No ratings yet
Expected Value of OLS Estimators
29 pages
Econ 3334: Introduction To Econometrics Linear Regression With One Regressor
No ratings yet
Econ 3334: Introduction To Econometrics Linear Regression With One Regressor
46 pages
Multiple Linear Regression Analysis Guide
No ratings yet
Multiple Linear Regression Analysis Guide
53 pages
Introduction to Advanced Econometrics
No ratings yet
Introduction to Advanced Econometrics
120 pages
Understanding Simple Regression Models
No ratings yet
Understanding Simple Regression Models
26 pages
Understanding Simple Regression Models
No ratings yet
Understanding Simple Regression Models
19 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
56 pages
OLS Estimation in Simple Regression
No ratings yet
OLS Estimation in Simple Regression
39 pages
Cross-Sectional Data in Linear Regression
No ratings yet
Cross-Sectional Data in Linear Regression
113 pages
Causal Inference with Regression Analysis
No ratings yet
Causal Inference with Regression Analysis
29 pages
Chapter 2
No ratings yet
Chapter 2
80 pages
SV e KTL t3 c1 Lrmodel Assump
No ratings yet
SV e KTL t3 c1 Lrmodel Assump
37 pages
Types of Linear Regression Models
No ratings yet
Types of Linear Regression Models
42 pages
Class 2 - Multiple Regression
No ratings yet
Class 2 - Multiple Regression
41 pages
Two-Variable Regression Model Explained
No ratings yet
Two-Variable Regression Model Explained
58 pages
Econometrics II: Regression Analysis Basics
No ratings yet
Econometrics II: Regression Analysis Basics
55 pages
Regression Analysis: Key Concepts & Methods
No ratings yet
Regression Analysis: Key Concepts & Methods
38 pages
Understanding OLS Regression Analysis
No ratings yet
Understanding OLS Regression Analysis
53 pages
Linear Regression Fundamentals in Econometrics
No ratings yet
Linear Regression Fundamentals in Econometrics
12 pages
Simple Linear Regression Analysis Guide
No ratings yet
Simple Linear Regression Analysis Guide
6 pages
Simple Linear Regression Explained
No ratings yet
Simple Linear Regression Explained
39 pages
Summary of Econometrics Chapters 3-5
No ratings yet
Summary of Econometrics Chapters 3-5
64 pages
Overview of Classical Linear Regression
No ratings yet
Overview of Classical Linear Regression
72 pages
Understanding Simple Regression Analysis
No ratings yet
Understanding Simple Regression Analysis
87 pages
Understanding Ordinary Least Squares
No ratings yet
Understanding Ordinary Least Squares
17 pages
Simple Linear Regression Methods Explained
No ratings yet
Simple Linear Regression Methods Explained
19 pages
Classical Regression Analysis Overview
No ratings yet
Classical Regression Analysis Overview
17 pages
P/E Ratio Impact on NSE Share Prices
No ratings yet
P/E Ratio Impact on NSE Share Prices
54 pages
Public Sector Accounting Cash Flow Guide
No ratings yet
Public Sector Accounting Cash Flow Guide
13 pages
Dissertation Guidelines for Students
No ratings yet
Dissertation Guidelines for Students
15 pages
AXY IT Solutions 2024 Purchase Reconciliation
No ratings yet
AXY IT Solutions 2024 Purchase Reconciliation
3 pages
Managerial Economics Case Study Analysis
No ratings yet
Managerial Economics Case Study Analysis
7 pages
Organizational Design and Culture Explained
No ratings yet
Organizational Design and Culture Explained
30 pages
Particle Methods for Parameter Estimation
No ratings yet
Particle Methods for Parameter Estimation
25 pages
Sample Size and Study Power Analysis
No ratings yet
Sample Size and Study Power Analysis
12 pages
Age, Gender, and Finance in Academics
No ratings yet
Age, Gender, and Finance in Academics
5 pages
Adjusting for Attrition in Panel Data
No ratings yet
Adjusting for Attrition in Panel Data
12 pages
Statistics & Probability Pre-Test Guide
No ratings yet
Statistics & Probability Pre-Test Guide
2 pages
Participant Performance Analysis in Psych 103
No ratings yet
Participant Performance Analysis in Psych 103
2 pages
Machine Learning Class Overview
No ratings yet
Machine Learning Class Overview
7 pages
Introduction to Statistical Methods
No ratings yet
Introduction to Statistical Methods
7 pages
Confidence Intervals and Sample Size Effects
No ratings yet
Confidence Intervals and Sample Size Effects
2 pages
Skewness and Kurtosis Analysis Report
No ratings yet
Skewness and Kurtosis Analysis Report
4 pages
Module 5 - Notes
No ratings yet
Module 5 - Notes
39 pages
Social Media's Impact on English Skills
No ratings yet
Social Media's Impact on English Skills
2 pages
Understanding t-Distribution Properties
No ratings yet
Understanding t-Distribution Properties
38 pages
Turmeric Polvoron Sensory Evaluation
No ratings yet
Turmeric Polvoron Sensory Evaluation
1 page
Data Analysis Techniques for Machine Learning
No ratings yet
Data Analysis Techniques for Machine Learning
4 pages
Statistical Analysis of Two Samples
No ratings yet
Statistical Analysis of Two Samples
11 pages
BUSI 1450 Statistics Homework Solutions
No ratings yet
BUSI 1450 Statistics Homework Solutions
3 pages
HIIT Effects on Elite Athletes' Performance
No ratings yet
HIIT Effects on Elite Athletes' Performance
8 pages
Introduction To Econometrics, 5 Edition: Chapter 5: Dummy Variables
No ratings yet
Introduction To Econometrics, 5 Edition: Chapter 5: Dummy Variables
47 pages
ASCO Canada Launches Red Hat Valves
No ratings yet
ASCO Canada Launches Red Hat Valves
5 pages
Health Sciences Statistics: Theory & Applications
100% (1)
Health Sciences Statistics: Theory & Applications
16 pages
Caloocan City Population Analysis
No ratings yet
Caloocan City Population Analysis
2 pages
Inferencia Estadistica
No ratings yet
Inferencia Estadistica
145 pages
Turbidity's Impact on Phytoplankton Biomass
No ratings yet
Turbidity's Impact on Phytoplankton Biomass
3 pages
Intro to Causal Inference with PSM
No ratings yet
Intro to Causal Inference with PSM
16 pages
Proses Utama Penyelidikan Kuantitatif W4
No ratings yet
Proses Utama Penyelidikan Kuantitatif W4
44 pages
Understanding Tests of Significance
No ratings yet
Understanding Tests of Significance
3 pages
Dampak Investasi dan Ekspor 1988-2019
No ratings yet
Dampak Investasi dan Ekspor 1988-2019
36 pages
Understanding the Scientific Method
No ratings yet
Understanding the Scientific Method
19 pages
Statistical Methods for Quality Improvement
No ratings yet
Statistical Methods for Quality Improvement
24 pages

Linear Regression and OLS Method Explained

Uploaded by

Linear Regression and OLS Method Explained

Uploaded by

RESEARCH METHODS 2

 Regression analysis characterizes the

 In the next few slides we will

xi  (1 xi 2 xi 3 ... xiK ) ' (3)

(4)   (1 ,...,  K ) '

 We would like to choose values for the coefficients such

 Different measures can be used to define ‘small’ but the

 Taking the squares makes sure that positive

•The fitted regression is represented by the straight line, the dots

•From the above two graphs, which is a better approximation of y

 Full ideal conditions of OLS to be met for the

 A2: The error terms are not correlated with the

 A3: Homoskedasticity: The variance of the error term is

 A4: Zero correlation of between different error terms

Under A1 – A4 the OLS estimator is the best

 A5: The explanatory variables are not

 A6: Normal distribution: The errors are

 F statistic close to one, shows that there is

 In such a case, we reject the null hypothesis

total sum of squares (SST )  i

PRACTICE ASSIGNMENT: Calculate this in Stata using any given

total sum of squares (SST )  i

PRACTICE ASSIGNMENT: Use the previously

•Similarly, this model is significant at 1% (p<0.01) (i.e.

This is a test of significance for individual

Adjusted Std. Error of

exp Coef. Std. Err. t P>|t| [95% Conf. Interval]

income 2.175534 .3294222 6.60 0.000 1.396574 2.954494

Table #: Regression Output of Expenditure on Annual Net Income and

 Note that there are no stars on the coefficient and X2

 When a hypothesis is statistically tested two types of

 Asymptotically (when samples grow

You might also like