0% found this document useful (0 votes)

17 views39 pages

Understanding Multiple Regression Analysis

1. Regression analysis can be used to estimate the relationships between multiple independent variables (x1, x2, etc.) and a dependent variable (y). Multiple regression models allow us to partial out the effect of each independent variable while controlling for the others. 2. Dummy variables, which take on values of 0 or 1, can represent categorical variables in a regression. They allow the intercept to differ for different groups. Interacting dummy variables further subdivides groups. 3. Goodness-of-fit measures like R-squared indicate how well the regression line fits the data but can increase simply by adding more variables. F-tests evaluate whether groups of variables jointly help explain the dependent variable.

Uploaded by

Taimoor Waraich

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views39 pages

Understanding Multiple Regression Analysis

Uploaded by

Taimoor Waraich

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Regression Analysis

Estimation and Interpretation Of Regression

Equation
Dummy Independent Variable

Instructor
Taimoor Naseer Waraich
Parallels with Simple Regression
• b0 is still the intercept
• b1 to bk all called slope parameters
• u is still the error term (or disturbance)
• Still need to make a zero conditional mean assumption, so now
assume that
• E(u|x1,x2, …,xk) = 0
• Still minimizing the sum of squared residuals, so have k+1 first
order conditions
Interpreting Multiple Regression

yˆ  ˆ0  ˆ1 x1  ˆ 2 x2  ...  ˆ k xk , so

yˆ  ˆ x  ˆ x  ...  ˆ x ,
1 1 2 2 k k

so holding x2 ,..., xk fixed implies that

yˆ  ˆ x , that is each  has
1 1

a ceteris pa ribus interpreta tion

A “Partialling Out” Interpretation

Consider t he case where k  2, i.e.

yˆ  ˆ  ˆ x  ˆ x , then
0 1 1 2 2

ˆ1    rˆi1 yi   rˆ2

i1 , where rˆi1 are
the residuals from the estimated
regression xˆ1  ˆ0  ˆ2 xˆ 2
“Partialling Out” continued
• Previous equation implies that regressing y on x1 and x2 gives
same effect of x1 as regressing y on residuals from a regression
of x1 on x2
• This means only the part of xi1 that is uncorrelated with xi2 are
being related to yi so we’re estimating the effect of x1 on y after
x2 has been “partialled out”
Simple vs Multiple Reg Estimate

~ ~ ~
Compare the simple regression y   0  1 x1
with the multiple regression yˆ  ˆ0  ˆ1 x1  ˆ 2 x2
~
Generally, 1  ˆ1 unless :
ˆ  0 (i.e. no partial effect of x ) OR
2 2

x1 and x2 are uncorrelat ed in the sample

Goodness-of-Fit
We can think of each observatio n as being made
up of an explained part, and an unexplaine d part,
yi  yˆ i  uˆi We then define the following :
  y  y  is the total sum of squares (SST)
2
i

  yˆ  y  is the explained sum of squares (SSE)

2
i

 uˆ is the residual sum of squares (SSR)

2
i

Then SST  SSE  SSR

Goodness-of-Fit (continued)
• How do we think about how well our sample regression line
fits our sample data?

• Can compute the fraction of the total sum of squares (SST) that
is explained by the model, call this the R-squared of regression

• R2 = SSE/SST = 1 – SSR/SST
More about R-squared
• R2 can never decrease when another independent variable is
added to a regression, and usually will increase

• Because R2 will usually increase with the number of

independent variables, it is not a good way to compare models
Assumptions for Unbiasedness
• Population model is linear in parameters: y = b0 + b1x1 + b2x2 +…+
bkxk + u
• We can use a random sample of size n, {(xi1, xi2,…, xik, yi): i=1, 2, …,
n}, from the population model, so that the sample model is yi = b0 +
b1xi1 + b2xi2 +…+ bkxik + ui
• E(u|x1, x2,… xk) = 0, implying that all of the explanatory variables are
exogenous
• None of the x’s is constant, and there are no exact linear relationships
among them
Omitted Variable Bias

Suppose the true model is given as

y   0  1 x1   2 x2  u , but we
~ ~ ~
estimate y     x  u ,
0 1 1
Summary of Direction of Bias

Corr(x1, x2) > 0 Corr(x1, x2) < 0

b2 > 0 Positive bias Negative bias

b2 < 0 Negative bias Positive bias

Omitted Variable Bias Summary
• Two cases where bias is equal to zero
– b2 = 0, that is x2 doesn’t really belong in model
– x1 and x2 are uncorrelated in the sample

• If correlation between x2 , x1 and x2 , y is the same direction,

bias will be positive
• If correlation between x2 , x1 and x2 , y is the opposite
direction, bias will be negative
Dummy Variables
• A dummy variable is a variable that takes on the value 1 or 0
• Examples: male (= 1 if are male, 0 otherwise), south (= 1 if in
the south, 0 otherwise), etc.
• Dummy variables are also called binary variables, for obvious
reasons
A Dummy Independent Variable
• Consider a simple model with one continuous variable (x) and
one dummy (d)
• y = b0 + d0d + b1x + u
• This can be interpreted as an intercept shift
• If d = 0, then y = b0 + b1x + u
• If d = 1, then y = (b0 + d0) + b1x + u
• The case of d = 0 is the base group
Example of d0 > 0

y y = (b0 + d0) + b1x

d=1
slope = b1
d0

{ d=0
y = b0 + b1x
}
b0

x
Dummies for Multiple Categories
• We can use dummy variables to control for something with
multiple categories
• Suppose everyone in your data is either a HS dropout, HS grad
only, or college grad
• To compare HS and college grads to HS dropouts, include 2
dummy variables
• hsgrad = 1 if HS grad only, 0 otherwise; and colgrad = 1 if
college grad, 0 otherwise
Multiple Categories (cont)
• Any categorical variable can be turned into a set of dummy
variables
• Because the base group is represented by the intercept, if there
are n categories there should be n – 1 dummy variables
• If there are a lot of categories, it may make sense to group
some together
• Example: top 10 ranking, 11 – 25, etc.
Interactions Among Dummies
• Interacting dummy variables is like subdividing the group
• Example: have dummies for male, as well as hsgrad and
colgrad
• Add male*hsgrad and male*colgrad, for a total of 5 dummy
variables –> 6 categories
• Base group is female HS dropouts
• hsgrad is for female HS grads, colgrad is for female college
grads
• The interactions reflect male HS grads and male college grads
More on Dummy Interactions
• Formally, the model is y = b0 + d1male + d2hsgrad + d3colgrad
+ d4male*hsgrad + d5male*colgrad + b1x + u, then, for example:
• If male = 0 and hsgrad = 0 and colgrad = 0
• y = b 0 + b 1x + u
• If male = 0 and hsgrad = 1 and colgrad = 0
• y = b0 + d2hsgrad + b1x + u
• If male = 1 and hsgrad = 0 and colgrad = 1
• y = b0 + d1male + d3colgrad + d5male*colgrad + b1x + u
Other Interactions with Dummies
• Can also consider interacting a dummy variable, d, with a
continuous variable, x
• y = b0 + d1d + b1x + d2d*x + u
• If d = 0, then y = b0 + b1x + u
• If d = 1, then y = (b0 + d1) + (b1+ d2) x + u
• This is interpreted as a change in the slope
Example of d0 > 0 and d1 < 0

y
y = b0 + b1x
d=0

d=1
y = (b0 + d0) + (b1 + d1) x

x
Testing for Differences Across Groups
• Testing whether a regression function is different for one group
versus another can be thought of as simply testing for the joint
significance of the dummy and its interactions with all other x
variables
• So, you can estimate the model with all the interactions and
without and form an F statistic, but this could be unwieldy
The F-test
• The F-test is an analysis of the variance of a regression
• It can be used to test for the significance of a group of variables
or for a restriction
• It has a different distribution to the t-test, but can be used to test
at different levels of significance
• When determining the F-statistic we need to collect either the
residual sum of squares (RSS) or the R-squared statistic
• The formula for the F-test of a group of variables can be
expressed in terms of either the residual sum of squares (RSS)
or explained sum of squares (ESS)
F-test of explanatory power
• This is the F-test for the goodness of fit of a regression and in
effect tests for the joint significance of the explanatory variables.

• It is based on the R-squared statistic.

• It is routinely produced by most computer software packages.

• It follows the F-distribution, which is quite different to the t-test

F-test formula
• The formula for the F-test of the goodness
of fit is:
2
R / k 1
F 2
(1  R ) /( n  k )
k 1
Fnk
F-distribution
• To find the critical value of the F-distribution, in general you
need to know the number of parameters and the degrees of
freedom.

• The number of parameters is then read across the top of the

table, the d of f. from the side. Where these two values
intersect, we find the critical value.
F-test critical value

1 2 3 4 5
1 161.4 199.5 215.7 224.6 230.2
2 18.5 19.0 19.2 19.3 19.3
3 10.1 9.6 9.3 9.1 9.0
4 7.7 7.0 6.6 6.4 6.3
5 6.6 5.8 5.4 5.2 5.1
F-statistic
• When testing for the significance of the goodness of fit, our
null hypothesis is that the explanatory variables jointly equal 0.

• If our F-statistic is below the critical value we fail to reject the

null and therefore we say the goodness of fit is not significant.
Joint Significance
• The F-test is useful for testing a number of hypotheses and is
often used to test for the joint significance of a group of
variables.

• In this type of test, we often refer to ‘testing a restriction’.

• This restriction is that a group of explanatory variables are

jointly equal to 0
F-test for joint significance
• The formula for this test can be viewed as:

Improvement in fit/
Extra degrees of freedom used up
Residual sum of squares remaining/
Degrees of freedom remaining
F-tests
• The test for joint significance has its own
formula, which takes the following form:
RSS R  RSSu / m
F
RSSu / n  k
m  number of restrictio ns
k  parameters in unrestrict ed mod el
RSSu  unrestrict ed RSS
RSS R  restricted RSS
Joint Significance of a group of variables
• To carry out this test you need to conduct two separate OLS
regression, one with all the explanatory variables in
(unrestricted equation), the other with the variables whose joint
significance is being tested, removed.
• Then collect the RSS from both equations.
• Put the values in the formula
• Find the critical value and compare with the test statistic. The
null hypothesis is that the variables jointly equal 0.
Joint Significance
• If we have a 3 explanatory variable model and wish
to test for the joint significance of 2 of the variables
(x and z), we need to run the following restricted and
unrestricted models:
yt   0  1wt  ut  restricted
yt   0  1wt   2 xt   3 zt  ut
 unrestrict ed
Example of the F-test for joint significance
• Given the following model, we wish to test the joint
significance of w and z. Having estimated them, we collect
their respective RSSs (n=60).
yt   0  1 xt   2 wt   3 zt  ut  unrestrict ed
 RSSu  0.75
yt   0  1xt  vt  restricted
 RSS R  1.5
Joint significance

where :  0 ,  0 are constants.

1.... 3 , 1 are slope parameters .
ut , vt are error term s
xt , wt , zt are explanator y variables
Joint significance
H 0 :  2  3  0
H1 :  2   3  0

• As the F statistic is greater than the critical

value (28>3.15), we reject the null
hypothesis and conclude that the variables
w and z are jointly significant and should
remain in the model.
Linear Probability Model
• P(y = 1|x) = E(y|x), when y is a binary variable, so we can
write our model as
• P(y = 1|x) = b0 + b1x1 + … + bkxk
• So, the interpretation of bj is the change in the probability of
success when xj changes
• The predicted y is the predicted probability of success
• Potential problem that can be outside [0,1]
Linear Probability Model (cont)
• Even without predictions outside of [0,1], we may estimate
effects that imply a change in x changes the probability by
more than +1 or –1, so best to use changes near mean
• This model will violate assumption of homoskedasticity, so
will affect inference
• Despite drawbacks, it’s usually a good place to start when y is
binary

OLS Estimator Study Guide for Midterm
No ratings yet
OLS Estimator Study Guide for Midterm
7 pages
Multiple Regression Analysis Notes
No ratings yet
Multiple Regression Analysis Notes
8 pages
Dummy Variables in Multiple Regression
No ratings yet
Dummy Variables in Multiple Regression
49 pages
Omitted Variable Bias in Regression
No ratings yet
Omitted Variable Bias in Regression
7 pages
14.310x Spring 2023 Lecture 19 - Practical Issues in Running Regressions
No ratings yet
14.310x Spring 2023 Lecture 19 - Practical Issues in Running Regressions
43 pages
Hypothesis Testing in OLS Regression
No ratings yet
Hypothesis Testing in OLS Regression
10 pages
OLS Assumptions and Parameter Estimates
No ratings yet
OLS Assumptions and Parameter Estimates
4 pages
Understanding Simple and Multiple Regression
No ratings yet
Understanding Simple and Multiple Regression
45 pages
Understanding Simple and Multiple Regression
No ratings yet
Understanding Simple and Multiple Regression
36 pages
Econometrics Ch2 Multiple Regression Analysis
No ratings yet
Econometrics Ch2 Multiple Regression Analysis
91 pages
Regression and Correlation Analysis Guide
No ratings yet
Regression and Correlation Analysis Guide
17 pages
Multiple Regression Analysis Insights
No ratings yet
Multiple Regression Analysis Insights
11 pages
Multiple Regression Analysis Overview
No ratings yet
Multiple Regression Analysis Overview
28 pages
Understanding Ordinary Least Squares
No ratings yet
Understanding Ordinary Least Squares
54 pages
Econometrics Cheat Sheet Overview
No ratings yet
Econometrics Cheat Sheet Overview
3 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
23 pages
Econometrics Cheat Sheet Overview
No ratings yet
Econometrics Cheat Sheet Overview
3 pages
Econometrics Cheat Sheet Overview
No ratings yet
Econometrics Cheat Sheet Overview
3 pages
Multiple Regression Model Overview
No ratings yet
Multiple Regression Model Overview
19 pages
Econometrics Cheat Sheet Overview
No ratings yet
Econometrics Cheat Sheet Overview
3 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
15 pages
Regression and Correlation Analysis Guide
100% (1)
Regression and Correlation Analysis Guide
32 pages
Regression Analysis Techniques Explained
No ratings yet
Regression Analysis Techniques Explained
6 pages
Applied Quantitative Analysis Techniques
No ratings yet
Applied Quantitative Analysis Techniques
51 pages
Econometrics Cheat Sheet Overview
100% (1)
Econometrics Cheat Sheet Overview
3 pages
Correlation and Regression Analysis Guide
100% (1)
Correlation and Regression Analysis Guide
39 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
19 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
28 pages
Linear Regression Assumptions Explained
No ratings yet
Linear Regression Assumptions Explained
27 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
14 pages
Correlation vs. Regression Analysis Explained
No ratings yet
Correlation vs. Regression Analysis Explained
14 pages
EMET2007: Linear Regression Insights
No ratings yet
EMET2007: Linear Regression Insights
6 pages
Understanding Statistical Significance and Regression
No ratings yet
Understanding Statistical Significance and Regression
47 pages
Multiple Regression Analysis of Student Achievement
No ratings yet
Multiple Regression Analysis of Student Achievement
60 pages
Understanding Linear Regression Techniques
No ratings yet
Understanding Linear Regression Techniques
18 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
36 pages
Understanding Econometrics and Goodness-of-Fit
No ratings yet
Understanding Econometrics and Goodness-of-Fit
6 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
102 pages
Income Coefficient in Demand Regression
No ratings yet
Income Coefficient in Demand Regression
9 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
47 pages
Chapter 3 MLRM
No ratings yet
Chapter 3 MLRM
28 pages
Variable Selection in Regression Models
No ratings yet
Variable Selection in Regression Models
30 pages
ST 321 All Notes
No ratings yet
ST 321 All Notes
91 pages
Multiple Regression Analysis Overview
No ratings yet
Multiple Regression Analysis Overview
40 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
28 pages
Introduction to Linear Regression Basics
No ratings yet
Introduction to Linear Regression Basics
38 pages
Essential Guide to Regression Analysis
No ratings yet
Essential Guide to Regression Analysis
36 pages
Pure vs Impure Serial Correlation
No ratings yet
Pure vs Impure Serial Correlation
23 pages
Financial Regression Analysis Insights
No ratings yet
Financial Regression Analysis Insights
39 pages
ANOVA and F-Test Overview
No ratings yet
ANOVA and F-Test Overview
5 pages
Multipleregression Assignment
No ratings yet
Multipleregression Assignment
4 pages
Understanding Simple Regression Analysis
No ratings yet
Understanding Simple Regression Analysis
37 pages
Regression Main
No ratings yet
Regression Main
18 pages
Multiple Regression Analysis Overview
No ratings yet
Multiple Regression Analysis Overview
150 pages
Basic Concepts of Syllogism
No ratings yet
Basic Concepts of Syllogism
26 pages
Scientific Notation and Exponents Guide
100% (2)
Scientific Notation and Exponents Guide
30 pages
Linked List Basics and Operations
No ratings yet
Linked List Basics and Operations
28 pages
Class X Probability Lab Activity Guide
No ratings yet
Class X Probability Lab Activity Guide
3 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
73 pages
GNN-Based Risk Assessment for Power Grids
No ratings yet
GNN-Based Risk Assessment for Power Grids
5 pages
Understanding Form and Shape in Design
No ratings yet
Understanding Form and Shape in Design
20 pages
Logic Gates and Shift Operations in Assembly
No ratings yet
Logic Gates and Shift Operations in Assembly
11 pages
Fourier Series and Trigonometric Identities
No ratings yet
Fourier Series and Trigonometric Identities
10 pages
G13 1941 PUTNAM Web
No ratings yet
G13 1941 PUTNAM Web
2 pages
Linear Prediction in Signal Processing
No ratings yet
Linear Prediction in Signal Processing
31 pages
Inertia Theory of Protons and Time
No ratings yet
Inertia Theory of Protons and Time
10 pages
Linear Algebra for Beginners Guide
No ratings yet
Linear Algebra for Beginners Guide
854 pages
Experimental Modeling in Science
No ratings yet
Experimental Modeling in Science
334 pages
Dokument Koji Vec Postoji
100% (1)
Dokument Koji Vec Postoji
17 pages
Business Mathematics: Capslet
No ratings yet
Business Mathematics: Capslet
7 pages
P.E.R.T. Study Guide Overview
No ratings yet
P.E.R.T. Study Guide Overview
9 pages
Reliability Prediction for DC-Link Capacitors
No ratings yet
Reliability Prediction for DC-Link Capacitors
14 pages
Coordinate Geometry Loney SL
75% (53)
Coordinate Geometry Loney SL
452 pages
(Chapman
No ratings yet
(Chapman
69 pages
Taxation's Impact on Tanzania's Growth
No ratings yet
Taxation's Impact on Tanzania's Growth
10 pages
Green Cover Project for Grade VI
No ratings yet
Green Cover Project for Grade VI
9 pages
Ryan International School Class 6 Syllabus
No ratings yet
Ryan International School Class 6 Syllabus
97 pages
Summer Homework for Class 8 Students
No ratings yet
Summer Homework for Class 8 Students
10 pages
Shan Et Al 2024 Research On Human Posture Recognition Method Based On Deep Learning
No ratings yet
Shan Et Al 2024 Research On Human Posture Recognition Method Based On Deep Learning
12 pages
Mathematics 12 02727
No ratings yet
Mathematics 12 02727
22 pages
FEM Analysis of Circular Hole in Plate
No ratings yet
FEM Analysis of Circular Hole in Plate
7 pages
Differential Equations Assignment 8
No ratings yet
Differential Equations Assignment 8
4 pages
Dynamic Movement Primitives in Robotics
No ratings yet
Dynamic Movement Primitives in Robotics
21 pages
Stepper Motor Control in LabVIEW
No ratings yet
Stepper Motor Control in LabVIEW
5 pages

Understanding Multiple Regression Analysis

Uploaded by

Understanding Multiple Regression Analysis

Uploaded by

Regression Analysis

Estimation and Interpretation Of Regression

yˆ  ˆ0  ˆ1 x1  ˆ 2 x2  ...  ˆ k xk , so

so holding x2 ,..., xk fixed implies that

a ceteris pa ribus interpreta tion

Consider t he case where k  2, i.e.

ˆ1    rˆi1 yi   rˆ2

x1 and x2 are uncorrelat ed in the sample

  yˆ  y  is the explained sum of squares (SSE)

 uˆ is the residual sum of squares (SSR)

Then SST  SSE  SSR

• Because R2 will usually increase with the number of

Suppose the true model is given as

Corr(x1, x2) > 0 Corr(x1, x2) < 0

b2 > 0 Positive bias Negative bias

b2 < 0 Negative bias Positive bias

• If correlation between x2 , x1 and x2 , y is the same direction,

y y = (b0 + d0) + b1x

• It is based on the R-squared statistic.

• It is routinely produced by most computer software packages.

• It follows the F-distribution, which is quite different to the t-test

• The number of parameters is then read across the top of the

• If our F-statistic is below the critical value we fail to reject the

• In this type of test, we often refer to ‘testing a restriction’.

• This restriction is that a group of explanatory variables are

where :  0 ,  0 are constants.

• As the F statistic is greater than the critical

You might also like