0% found this document useful (0 votes)

20 views91 pages

Econometrics Ch2 Multiple Regression Analysis

Chapter 2 discusses Multiple Regression Analysis, emphasizing the importance of including multiple variables to avoid Omitted Variable Bias in economic models. It covers the mathematical derivation of regression models, the significance of key assumptions, and the interpretation of coefficients while introducing hypothesis testing methods like the t-test and F-test. The chapter also addresses issues like multicollinearity and provides practical examples and practice problems related to economic analysis.

Uploaded by

thanhhhk24410e1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views91 pages

Econometrics Ch2 Multiple Regression Analysis

Uploaded by

thanhhhk24410e1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CHAPTER 2 Multiple Regression Analysis

Multiple Regression Analysis

1. Introduction: Why do we need more variables?

Simple Linear Regression (SLR) is rarely enough for economic analysis because of
Omitted Variable Bias.
• Example: If we regress Wage only on Education, the coefficient for Education is
likely "too high."
• Why? It captures the effect of Education plus the effect of omitted variables
correlated with education (like Intelligence or Family Background).
• Solution: We must explicitly include these variables in the model to control for them.
1. Introduction: Why do we need more variables?

The Mathematical Derivation

To show students why the error term u becomes correlated with X, you can compare the
"True World" with the "Model We Estimated."
1. Introduction: Why do we need more variables?

Step A: The True Model

Imagine the true population relationship for the dependent variable Y depends on two
variables, X and Z:

Y = β0 + β1X + β2Z + v
• Here, v is the "true" random noise (uncorrelated with X or Z).

• Z is a relevant variable (β2 ≠ 0).

1. Introduction: Why do we need more variables?

Step B: The Omitted Model (What we actually run)

Suppose we do not have data for Z (or we simply forget to include it). We run this
regression:

Y = α0 + α1X + u
• We omitted Z.
• u is the error term for this specific misspecified model.
1. Introduction: Why do we need more variables?

Step C: What is inside u?

By comparing the True Model and the Omitted Model, we can see exactly what the error
term u is composed of:

u = β2Z + v
The error term u is not just random noise anymore; it effectively "absorbs" the omitted
variable Z.
1. Introduction: Why do we need more variables?

Step D: The Bias Mechanism

Now, we ask: Is X correlated with u?

Cov(X, u) = Cov(X, β2Z + v)

Cov(X, u) = β2Cov(X, Z) + Cov(X, v)
Since v is pure noise, Cov(X, v) = 0. We are left with:
Cov(X, u) = β2Cov(X, Z)
1. Introduction: Why do we need more variables?

Conclusion: The error term u is correlated with X IF AND ONLY IF:

1. The omitted variable affects Y (β2 ≠ 0).

2. The omitted variable is correlated with X (Cov(X, Z) ≠ 0).
Because the correlation between X and u is driven entirely by the relationship between X
and the missing Z, we call it Omitted Variable Bias.
1. Introduction: Why do we need more variables?

What happens instead?

If you omit a variable Z that is uncorrelated with X (Cov(X, Z) = 0), two things happen:
1. Bias: Zero. Your estimate of β1̂ remains correct on average.
2. Variance: The "noise" term (u) gets larger because it now includes Z. This makes
your Standard Errors larger (less precision), but it does not make the coefficient
wrong.
2. The Population Regression Equation (PRE)

We extend the model to include k independent variables.

Yi = β0 + β1X1i + β2 X2i + … + βk Xki + ui

• Y: The Dependent Variable.
• X1, X2, …: The Independent (Explanatory) Variables.
• β0: The Intercept.
• β1 to βk: The Slope Parameters (Partial Regression Coefficients).
• u: The Error Term (captures unobserved factors).
3. Key Assumptions (The Gauss-Markov Extensions)

We keep the assumptions from Simple Regression (Linearity, Random Sampling, Zero
Conditional Mean, Homoskedasticity), but we add a critical new one:
3. Key Assumptions (The Gauss-Markov Extensions)

Assumption: No Perfect Multicollinearity

• The Rule: No independent variable (X) can be a perfect linear combination of the
others.
• Example of Violation: You cannot include both "Expenditure in Dollars" and
"Expenditure in Euros" in the same model. They are perfectly correlated.
• Note: Imperfect multicollinearity (variables are correlated, but not perfectly) is
allowed, though it makes estimation less precise.
4. Estimating Coefficients (OLS)

The Ordinary Least Squares (OLS) principle remains the same: we want to minimize the
Sum of Squared Residuals (SSR).
n
(Yi − β0̂ − β1̂ X1i − … − βk̂ Xki)2
∑
min
i=1

• Geometric Interpretation: In Simple Regression, we fit a line. In Multiple Regression

with two X variables, we fit a plane in a 3D space. With more variables, we fit a
hyperplane.
5. Interpreting Coefficients: "Ceteris Paribus"

This is the most important concept for economics students.

In Simple Regression (Y = β0 + β1X1):

• β1 is the total effect of X1 on Y.
In Multiple Regression (Y = β0 + β1X1 + β2 X2):
• β1 is the partial effect.
• Definition: β1 measures the change in Y for a one-unit change in X1, holding X2
constant (Ceteris Paribus).
5. Interpreting Coefficients: "Ceteris Paribus"

Economic Example:

Wage ̂ = … + 0.08(Education) + 0.05(Experience)

"Holding experience constant, an additional year of education is associated with an 8%
increase in wages."
6. Analyzing Significance (Hypothesis Testing)

We perform two types of tests in Multiple Regression.

6. Analyzing Significance (Hypothesis Testing)

A. The t-Test (Individual Significance)

Tests if one specific variable matters, assuming all other variables are already in the
model.

• H0 : βj = 0
βĵ
t=
• SE( βĵ )
6. Analyzing Significance (Hypothesis Testing)

B. The F-Test (Joint Significance)

Tests if the entire group of variables explains Y, or if the model is useless.

• H0 : β1 = β2 = … = βk = 0 (All slopes are zero).

• If the F-statistic is high (P-value < 0.05), at least one variable helps explain Y.
6. Analyzing Significance (Hypothesis Testing)
There are two common ways to write the F-statistic formula. Both yield the exact same number.
Formula A: Using Sums of Squares (The ANOVA Method)
This formula compares the "Explained Variance" (Signal) to the "Unexplained Variance"
(Noise).
Explained Variance ESS/k
F= =
Unexplained Variance RSS/(n − k − 1)
• ESS: Explained Sum of Squares (Variation captured by the model).
• RSS: Residual Sum of Squares (Variation missed by the model).
• k: Number of independent variables (degrees of freedom for the model).
• n - k - 1: Degrees of freedom for the residuals.
6. Analyzing Significance (Hypothesis Testing)

Formula B: Using R 2 (The "Goodness of Fit" Method)

This version is often easier for students to calculate if they only have the R 2 value.

R 2 /k
F=
(1 − R 2)/(n − k − 1)
7. Variance and Standard Deviation of Estimators

In multiple regression, the precision of our estimates depends on how correlated our X
variables are.
7. Variance and Standard Deviation of Estimators

The Variance of a slope coefficient βĵ is:

2
σ
Var( βĵ ) =
SSTj(1 − Rj2)

• σ 2: The variance of the error term (noise in the data).

• SSTj: The total variation in variable Xj (Total Sum of Squares).
• Rj2: The R-squared obtained from regressing Xj on all other independent variables.
7. Variance and Standard Deviation of Estimators

The Mathematical Logic: "Partialling Out"

• Step 1: Isolate the variation in Xj.

In a multiple regression, Xj might be correlated with other variables. To find the
specific effect of Xj, OLS essentially looks for the variation in Xj that is unique (not
explained by the other variables).
We find this by regressing Xj on all other independent variables.
◦ The "good" variation is the residual from this regression.
◦ The Sum of Squared Residuals from this auxiliary regression is exactly:
SSTj(1 − Rj2).
7. Variance and Standard Deviation of Estimators

• Step 2: The General Variance Formula.

σ2
For a simple regression (one X), the variance is just .
SSTx
For multiple regression, we replace the total variation (SSTx) with the unique variation
we found in Step 1.
7. Variance and Standard Deviation of Estimators

• Step 3: Combine them.

Noise Variance
Var( β ̂ ) =
j
Unique Variation in Xj
2
σ
Var( βĵ ) =
SSTj(1 − Rj2)
7. Variance and Standard Deviation of Estimators

Intuitive Explanation
A. The Numerator (The Bad Stuff)

• σ 2 (Error Variance): This represents the noise in the data.

◦ Logic: If the data points are very scattered around the true line (high σ 2), it is very
hard to pinpoint the exact slope.

◦ Effect: Higher σ 2 → Higher Variance (Less precise).

7. Variance and Standard Deviation of Estimators

B. The Denominator (The Good Stuff)

The denominator represents the quality of the signal. We want this to be as large as
possible to get a small variance.

1. SSTj (Total Variation in X):

◦ Logic: We need X to move around! If X never changes (e.g., everyone has the
same education level), we can't estimate how it affects wages. The more spread
out X is, the easier it is to see the trend.

◦ Effect: Higher SSTj → Lower Variance (More precise).

7. Variance and Standard Deviation of Estimators

2. (1 − Rj2) (Independence of X):

◦ Logic: This measures how distinct Xj is from the other variables.

◦ IfXj is highly correlated with other variables (Multicollinearity), then Rj2 is close to
1, and (1 − Rj2) becomes tiny (close to 0).
◦ This makes the denominator tiny, which makes the Variance HUGE.

◦ Effect: We need Xj to have its own unique variation. High correlation (high Rj2)
kills precision.
7. Variance and Standard Deviation of Estimators

The "Variance Inflation Factor" (VIF):

1
The term is called the VIF.
1 − Rj
2

• If X1 is highly correlated with X2 (Multicollinearity), Rj2 is high.

• This makes the VIF high → Variance increases → Standard Errors increase → t-
statistics get smaller.
• Lesson: Multicollinearity makes it harder to find statistically significant results.
8. Examples (Economics Focus)

Passage 1: The GPA Model

A university researcher wants to predict student GPA (Y).

• Model A: GPA = β0 + β1(HoursStudied) + u

• Model B: GPA = β0 + β1(HoursStudied) + β2(SAT_Score) + u
After running the regressions, the coefficient for HoursStudied (β1) drops from 0.15 in
Model A to 0.05 in Model B.
8. Examples (Economics Focus)

Question 1 Which of the following best explains the decrease in the β1 coefficient in Model
B?

A) Model B has a lower R 2 than Model A.

B) SAT_Score is negatively correlated with GPA.
C) HoursStudied and SAT_Score are positively correlated, and SAT_Score affects GPA,
causing Model A to suffer from Omitted Variable Bias.
D) The sample size was too small to estimate Model B accurately.
8. Examples (Economics Focus)
8. Examples (Economics Focus)

Passage 2: Housing Prices

An economist estimates the following model for house prices in a city:

Price ̂ = 50,000 + 100(Size) − 5,000(Distance)

Where Size is in square feet and Distance is miles from the city center.
8. Examples (Economics Focus)

Question 2 Based on the equation above, what is the interpretation of the coefficient
-5,000?
A) For every additional mile from the city center, the house price decreases by $5,000.
B) For every additional mile from the city center, the house price decreases by $5,000,
holding the size of the house constant.
C) Houses located in the city center cost $5,000 less than houses outside the city.
D) Distance is not a statistically significant predictor of house price.
8. Examples (Economics Focus)
8. Examples (Economics Focus)

Passage 3: Multicollinearity Logic

A researcher attempts to predict total household consumption (C) using two variables:

1. Income_Pre_Tax (X1)

2. Income_Post_Tax (X2)
The regression software returns an "Error" or very strange results with massive standard
errors.
8. Examples (Economics Focus)

Question 3 What is the most likely technical reason for this error?
A) Heteroskedasticity: The variance of consumption is higher for rich people.
B) Perfect Multicollinearity: Pre-tax and Post-tax income are perfectly (or near perfectly)
linearly related.
C) Endogeneity: Consumption causes Income.
D) The sample size is too large.
8. Examples (Economics Focus)
Practice problems

Problem 1: Determinants of Used Car Prices

A researcher estimates a model to predict the price of used cars based on their age and
mileage. The dataset consists of 500 used cars.
Practice problems

a. Write the estimated regression equation.

b. Interpret the coefficient on the age variable.
c. Predict the price of a car that is 5 years old and has 40,000 miles on it.
d. Is the coefficient on mileage statistically significant at the 1% level? Explain.
Practice problems
Practice problems

Problem 2: Wage Equation with Education and Experience

An economist estimates the effect of education and experience on hourly wages using a
sample of 1,000 workers.
Practice problems

a. Construct a 95% confidence interval for the coefficient on exper.

b. A worker has 16 years of education and 10 years of experience. What is their predicted
hourly wage?
c. Another worker has the same experience but 4 fewer years of education (12 years). How
much less is this worker predicted to earn per hour compared to the worker in part (b)?
d. What does the Root MSE of 3.8729 represent?
Practice problems
Practice problems

Problem 3: Advertising and Sales

A marketing analyst studies the impact of TV and Radio advertising spending on product
sales (in thousands of units).
Practice problems

a. Interpret the coefficient on tv_ads.

b. Is the effect of radio_ads statistically significant at the 5% level? Explain using the p-
value.
c. Calculate the t-statistic for radio_ads (verify the value in the table).
d. The company spends an additional $1,000 on TV ads (which corresponds to a 1 unit
increase in tv_ads if units are in thousands). How much are sales predicted to increase?
Practice problems
Practice problems

Problem 4: House Prices with Dummy Variables

A real estate model predicts house prices (in $1000s) based on size (sq ft) and whether the
house has a pool.
Practice problems

a. Write the regression equation.

b. What is the predicted price of a house with 2,000 sq ft and no pool?
c. What is the estimated "premium" for having a pool, holding size constant?
d. Is the pool premium statistically significant at the 1% level?
Practice problems
Practice problems

Problem 5: Test Scores and Student-Teacher Ratio

A policy analyst examines the relationship between district test scores and two variables:
student-teacher ratio (str) and percentage of English learners (el_pct).
Practice problems

a. Interpret the intercept coefficient (700.00). Does it have a realistic interpretation here?
b. If a district reduces its student-teacher ratio by 2 students (e.g., from 22 to 20), what is
the predicted change in test score, holding el_pct constant?
c. District A has str=20 and el_pct=10. District B has str=20 and el_pct=20. What is
the predicted difference in test scores between District A and District B?
d. Calculate the F-statistic using the MS values provided.
Practice problems
CHAPTER 2 Multiple Regression Analysis
Model Selection and Specification Analysis
1. Model Selection Criteria

The most common question students ask is: "Should I keep this variable in my model?"
1. Model Selection Criteria

The Trap of R 2

In Simple Regression, a higher R 2 is generally better. In Multiple Regression, R 2 is

dangerous.

• The Rule: Every time you add a variable (even a random nonsense variable), R2
never decreases. It either stays the same or goes up.

• The Problem: You can get an R 2 of 1.0 just by adding as many variables as you
have observations, creating a meaningless model ("Overfitting").
1. Model Selection Criteria

The Solution: Adjusted R 2 (R̄2)

This metric imposes a penalty for adding useless variables.

2 2n−1
R̄ = 1 − (1 − R )
n−k−1
• If you add a variable and the Adjusted R 2 drops, that variable likely didn't add
enough explanatory power to justify the loss of degrees of freedom.
2. Specification Errors: The Two Sins

In econometrics, not all mistakes are created equal.

2. Specification Errors: The Two Sins

Sin #1: Omitting a Relevant Variable (Underfitting)

Scenario: The true model requires Education and Ability
(wage = β0 + β1Educ + β2 Ability), but you only run wage = β0 + β1Educ.
• Consequence: Bias.
• Because Ability is correlated with Education (students with high ability tend to stay in
school longer), the coefficient for Education (β1̂ ) "steals" the credit for Ability.
• Your estimate is wrong and biased. (e.g., You overestimate the return to schooling).
2. Specification Errors: The Two Sins

Sin #2: Including an Irrelevant Variable (Overfitting)

Scenario: The true model depends only on Income, but you add "Zodiac Sign" to the
regression.
• Consequence: Inefficiency (Higher Variance).
• Bias: None. Your estimates for the important variables are still unbiased (centered on
the truth).
• Variance: The standard errors of all your coefficients will likely increase. This makes
t-stats smaller, making it harder to find statistically significant results.
• Lesson: It is generally "safer" to include a variable if you aren't sure, than to omit it
and risk bias.
3. The Ramsey RESET Test

How do we know if we have the wrong functional form (e.g., we used a straight line when
the data is curved)?
3. The Ramsey RESET Test

RESET (Regression Equation Specification Error Test):

1. Run the original regression and get the predicted values ( Y ).̂

2. Run a second regression adding powers of those predictions (e.g., Y 2̂ , Y 3̂ ) as new

independent variables.
3. Test: Use an F-test to see if these new terms are significant.

◦ Null Hypothesis (H0): Model is correctly specified.

◦ Reject H0: You missed something non-linear (you might need logs or quadratics).
4. Hypothesis Testing for Selection (F-Test)

When choosing between a short model (Restricted) and a long model (Unrestricted), we
use the F-Test for Joint Significance.
(SSRrestricted − SSRunrestricted)/q
F=
SSRunrestricted /(n − k − 1)
• Logic: Does the Sum of Squared Residuals (error) drop enough to justify adding the
group of q new variables?
• If F is high (P-value low), the extra variables are jointly significant. Keep them.
5. Examples (Economics Focus)

Passage 1: The wage gap study

A researcher investigates the gender wage gap.

• Regression 1: Wage = β0 + β1(Female) + u

◦ Result: β1 = − 5.00 (Females earn $5 less/hour).
• Regression 2: Wage = β0 + β1(Female) + β2(Occupation) + u
◦ Result: β1 = − 2.00 (Females earn $2 less/hour).
5. Examples (Economics Focus)

Question 1 Which of the following statements best explains the change in the coefficient
for the Female variable from -5.00 to -2.00?
A) Regression 2 suffers from multicollinearity, making the estimate unreliable.
B) In Regression 1, the Female variable was biased downwards because it suffered from
Omitted Variable Bias regarding Occupation.
C) Occupation is an irrelevant variable and should be removed to restore the efficiency of
the model.

D) The Adjusted R 2 of Regression 2 is definitely lower than Regression 1.

5. Examples (Economics Focus)
5. Examples (Economics Focus)

Passage 2: The Marketing Director's Dilemma

A marketing director is building a model to predict sales. She starts with Price (X1) and
Advertising (X2). She considers adding a third variable: "CEO's Golf Handicap" (X3), which
she knows theoretically has absolutely zero impact on customer behavior.
5. Examples (Economics Focus)

Question 2 If she includes X3 in the regression model, what will be the statistical
consequence?
A) The coefficient for Price (β1̂ ) will become biased.
B) The R 2 of the model will decrease.
C) The Standard Errors of β1̂ and β2̂ will likely increase, reducing the t-statistics.
D) The model will fail the Ramsey RESET test.
5. Examples (Economics Focus)
5. Examples (Economics Focus)

Passage 3: Interpreting the RESET

An economics student estimates a production function:
Output = β0 + β1(Labor) + β2(Capital).
He suspects the relationship might actually be Cobb-Douglas (multiplicative/curved) rather
than linear. He runs a Ramsey RESET test and obtains a P-value of 0.01.
5. Examples (Economics Focus)

Question 3 Based on the P-value of 0.01, what should the student conclude?
A) Fail to reject the Null; the linear model is correctly specified.
B) Reject the Null; the linear specification is likely incorrect and functional forms like logs or
squares should be investigated.
C) Reject the Null; the model suffers from heteroskedasticity.
D) The model is suffering from perfect multicollinearity.
5. Examples (Economics Focus)
More Problems

Problem 1: Omitted Variable Bias and Coefficient Stability

A labor economist is investigating the return to education. She first estimates a "Short
Model" regressing log_wage on education. She then estimates a "Long Model" that
adds ability (a test score measure) to check for specification bias.
More Problems
More Problems

a. Compare the coefficient on education in Model (1) and Model (2). By how much did it
change?
b. Based on the change in coefficients, was the "Short Model" suffering from positive or
negative bias?
c. What two conditions must be true about the ability variable for this bias to exist in the
Short Model?
d. Which model is preferred for estimating the causal effect of education on wages? Explain
briefly using the statistical significance of the added variable.
More Problems
More Problems

Problem 2: Functional Form – Testing for Non-Linearity

A researcher models the relationship between corn yield (bushels/acre) and nitrogen
fertilizer (lbs/acre). They suspect diminishing marginal returns and fit a quadratic model.
More Problems
More Problems

a. Write the estimated regression equation.

b. Is the quadratic term nitrogen_sq statistically significant at the 5% level? What does
this imply about the functional form specification?
c. Calculate the level of nitrogen where yield is maximized (the turning point).
d. If the researcher had only estimated a linear model (yield = b0 + b1*nitrogen),
would that model be considered correctly specified? Explain.
More Problems
More Problems

Problem 3: Joint Hypothesis Testing (F-Test)

An analyst is building a model to predict house prices. They start with square footage
(sqft) and then consider adding neighborhood dummy variables (d_north, d_south,
d_east; West is the base group). They run the Unrestricted Model below.
More Problems
More Problems

a. Are any of the location dummy variables individually significant at the 5% level?
b. Suppose you run a Restricted Model excluding all location dummies (regressing price
only on sqft) and find the Restricted SSR (SSRR) is 79,200,000. The Unrestricted SSR
(SSRU) from the table above is 78,400,000.
Calculate the F-statistic for the joint significance of the neighborhood effects.

c. Based on your F-statistic (assume Critical F ≈ 2.65), should you include the
neighborhood dummies in your final model specification?
More Problems
More Problems

Problem 4: Adjusted R-Squared and Model Penalties

You are selecting between two models for predicting student test scores.
• Model A: score = b0 + b1*study_time

• Model B: score = b0 + b1study_time + b2height + b3*shoe_size

You run the regressions in Stata and obtain the following summary statistics:
More Problems

a. Calculate the standard R 2 for Model A and Model B.

b. Calculate the Adjusted R 2 (R̄2) for Model A and Model B.

c. Which model is preferred based on Adjusted R 2? Explain why this metric is better than
standard R 2 for this comparison.
More Problems
More Problems

Problem 5: Interaction Terms and Slope Specification

A researcher estimates the effect of experience on wages, hypothesizing that the return to
experience is different for men and women.
female = 1 if female, 0 if male.
fem_exper = female × exper.
More Problems
More Problems

a. What is the estimated return to an additional year of experience for Men (female=0)?
b. What is the estimated return to an additional year of experience for Women
(female=1)?
c. Is the difference in the return to experience between men and women statistically
significant at the 5% level? Which variable tells you this?
d. If the researcher removed the interaction term fem_exper, would they be accurately
modeling the wage dynamics? Explain.
More Problems

Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
56 pages
CH3 Econometrics Tar
No ratings yet
CH3 Econometrics Tar
16 pages
Multiple Regression Analysis Basics
No ratings yet
Multiple Regression Analysis Basics
17 pages
Class 2 - Multiple Regression
No ratings yet
Class 2 - Multiple Regression
41 pages
Understanding OLS and Gauss-Markov Assumptions
No ratings yet
Understanding OLS and Gauss-Markov Assumptions
40 pages
Income Coefficient in Demand Regression
No ratings yet
Income Coefficient in Demand Regression
9 pages
EMET2007: Linear Regression Insights
No ratings yet
EMET2007: Linear Regression Insights
6 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
28 pages
Multiple Linear Regression (Economics Dept)
No ratings yet
Multiple Linear Regression (Economics Dept)
12 pages
Multiple Regression Analysis Overview
No ratings yet
Multiple Regression Analysis Overview
73 pages
Multiple Linear Regression Analysis Explained
No ratings yet
Multiple Linear Regression Analysis Explained
48 pages
STAT 353: Expectation, Variance & Regression Guide
No ratings yet
STAT 353: Expectation, Variance & Regression Guide
44 pages
Multiple Regression Analysis Overview
No ratings yet
Multiple Regression Analysis Overview
28 pages
OLS Variance in Matrix Form
No ratings yet
OLS Variance in Matrix Form
42 pages
Understanding Simple Regression Analysis
No ratings yet
Understanding Simple Regression Analysis
37 pages
Multiple Linear Regression Analysis Guide
No ratings yet
Multiple Linear Regression Analysis Guide
51 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
41 pages
Regression Main
No ratings yet
Regression Main
18 pages
Multiple Regression Analysis Explained
No ratings yet
Multiple Regression Analysis Explained
26 pages
Econometrics Formula Sheet for ECMT1020
No ratings yet
Econometrics Formula Sheet for ECMT1020
10 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
86 pages
Multiple Linear Regression Analysis
No ratings yet
Multiple Linear Regression Analysis
55 pages
Advanced Econometrics Overview
No ratings yet
Advanced Econometrics Overview
65 pages
Classical Linear Regression Overview
No ratings yet
Classical Linear Regression Overview
28 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
35 pages
Multiple Linear Regression Overview
No ratings yet
Multiple Linear Regression Overview
11 pages
Finite Sample Properties of LS Estimator
No ratings yet
Finite Sample Properties of LS Estimator
20 pages
Econometrics 242 Cheat Sheet
No ratings yet
Econometrics 242 Cheat Sheet
4 pages
Multiple Linear Regression Analysis Guide
No ratings yet
Multiple Linear Regression Analysis Guide
53 pages
Understanding Simple Linear Regression
No ratings yet
Understanding Simple Linear Regression
36 pages
Simple Linear Regression Explained
No ratings yet
Simple Linear Regression Explained
36 pages
Introduction to Basic Regression Analysis
No ratings yet
Introduction to Basic Regression Analysis
5 pages
Linear Regression Fundamentals in Econometrics
No ratings yet
Linear Regression Fundamentals in Econometrics
12 pages
Understanding Simple Linear Regression
No ratings yet
Understanding Simple Linear Regression
19 pages
Linear Regression II: Hypothesis Testing
No ratings yet
Linear Regression II: Hypothesis Testing
81 pages
Statistical Analysis Overview
No ratings yet
Statistical Analysis Overview
5 pages
Multiple Regression Analysis Explained
No ratings yet
Multiple Regression Analysis Explained
35 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
46 pages
Econometrics Cheat Sheet Overview
No ratings yet
Econometrics Cheat Sheet Overview
3 pages
Econometrics Assignment Overview
No ratings yet
Econometrics Assignment Overview
20 pages
Understanding Multivariate Regression
No ratings yet
Understanding Multivariate Regression
20 pages
Econometrics Formula Sheet: Chapters 1-7
No ratings yet
Econometrics Formula Sheet: Chapters 1-7
7 pages
Poisson Regression Model Overview
No ratings yet
Poisson Regression Model Overview
9 pages
Chapter 3
No ratings yet
Chapter 3
33 pages
Regression Diagnostics and Transformations
No ratings yet
Regression Diagnostics and Transformations
46 pages
Regression Models and Statistical Concepts
No ratings yet
Regression Models and Statistical Concepts
13 pages
Simple Regression in Time Series Analysis
No ratings yet
Simple Regression in Time Series Analysis
18 pages
Econometrics EndSem Notes PDF
No ratings yet
Econometrics EndSem Notes PDF
15 pages
Econometrics Cheat Sheet Overview
No ratings yet
Econometrics Cheat Sheet Overview
3 pages
Summation and Expectation in Econometrics
No ratings yet
Summation and Expectation in Econometrics
32 pages
Regression Diagnostics Overview
100% (1)
Regression Diagnostics Overview
53 pages
Linear Regression and Correlation Analysis
No ratings yet
Linear Regression and Correlation Analysis
34 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
29 pages
Key Concepts in Econometrics Analysis
No ratings yet
Key Concepts in Econometrics Analysis
13 pages
Multiple Regression Analysis Notes
No ratings yet
Multiple Regression Analysis Notes
8 pages
Regression Specification Insights
No ratings yet
Regression Specification Insights
129 pages
Econ F241 1442
No ratings yet
Econ F241 1442
3 pages
Machine Learning - Question
No ratings yet
Machine Learning - Question
5 pages
Machine Learning Final Notes UIU CSE
No ratings yet
Machine Learning Final Notes UIU CSE
76 pages
Pearson's Correlation Coefficient Guide
No ratings yet
Pearson's Correlation Coefficient Guide
3 pages
Validity and Reliability Testing Results
No ratings yet
Validity and Reliability Testing Results
8 pages
Multidimensional Random Variables Explained
No ratings yet
Multidimensional Random Variables Explained
71 pages
Understanding ANOVA in Psychological Statistics
No ratings yet
Understanding ANOVA in Psychological Statistics
8 pages
Machine Learning Lab Manual 2024-25
No ratings yet
Machine Learning Lab Manual 2024-25
25 pages
Logistic Regression and Model Evaluation
No ratings yet
Logistic Regression and Model Evaluation
11 pages
Understanding Linear Regression in ML
No ratings yet
Understanding Linear Regression in ML
17 pages
Econometrics Exercise Solutions 3e
No ratings yet
Econometrics Exercise Solutions 3e
31 pages
ANOVA Analysis of Cake Quality by Temperature
No ratings yet
ANOVA Analysis of Cake Quality by Temperature
6 pages
Geographically Weighted Bivariate Logistic Regression
No ratings yet
Geographically Weighted Bivariate Logistic Regression
9 pages
Supervised Learning Principles in ML
0% (1)
Supervised Learning Principles in ML
7 pages
Linear Classification in Machine Learning
No ratings yet
Linear Classification in Machine Learning
8 pages
Power Estimation in Complex Models
No ratings yet
Power Estimation in Complex Models
17 pages
Understanding Regression Techniques
No ratings yet
Understanding Regression Techniques
5 pages
Applied Econometrics Course Syllabus
No ratings yet
Applied Econometrics Course Syllabus
53 pages
Cross-Validation in Regression Models
No ratings yet
Cross-Validation in Regression Models
22 pages
Overview of GARCH Family Models
No ratings yet
Overview of GARCH Family Models
40 pages
Correlation and Regression Concepts
100% (1)
Correlation and Regression Concepts
7 pages
Statistical Modelling Assignment 5 Analysis
No ratings yet
Statistical Modelling Assignment 5 Analysis
12 pages
Dimensionality Reduction Techniques Overview
No ratings yet
Dimensionality Reduction Techniques Overview
6 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
18 pages
Regression Analysis of Temperature and Sales
No ratings yet
Regression Analysis of Temperature and Sales
7 pages
Factor Analysis in Psychology Explained
No ratings yet
Factor Analysis in Psychology Explained
10 pages
Multiple Linear Regression Analysis
No ratings yet
Multiple Linear Regression Analysis
57 pages
Naïve Bayes for Assessing Student Potential
No ratings yet
Naïve Bayes for Assessing Student Potential
10 pages
Canonical Correlation Analysis Overview
No ratings yet
Canonical Correlation Analysis Overview
12 pages

Econometrics Ch2 Multiple Regression Analysis

Uploaded by

Econometrics Ch2 Multiple Regression Analysis

Uploaded by

CHAPTER 2 Multiple Regression Analysis

Multiple Regression Analysis

The Mathematical Derivation

Step A: The True Model

• Z is a relevant variable (β2 ≠ 0).

Step B: The Omitted Model (What we actually run)

Step C: What is inside u?

Step D: The Bias Mechanism

Cov(X, u) = Cov(X, β2Z + v)

Conclusion: The error term u is correlated with X IF AND ONLY IF:

1. The omitted variable affects Y (β2 ≠ 0).

What happens instead?

We extend the model to include k independent variables.

Yi = β0 + β1X1i + β2 X2i + … + βk Xki + ui

Assumption: No Perfect Multicollinearity

• Geometric Interpretation: In Simple Regression, we fit a line. In Multiple Regression

This is the most important concept for economics students.

In Simple Regression (Y = β0 + β1X1):

Wage ̂ = … + 0.08(Education) + 0.05(Experience)

We perform two types of tests in Multiple Regression.

A. The t-Test (Individual Significance)

B. The F-Test (Joint Significance)

• H0 : β1 = β2 = … = βk = 0 (All slopes are zero).

Formula B: Using R 2 (The "Goodness of Fit" Method)

The Variance of a slope coefficient βĵ is:

• σ 2: The variance of the error term (noise in the data).

The Mathematical Logic: "Partialling Out"

• Step 1: Isolate the variation in Xj.

• Step 2: The General Variance Formula.

• Step 3: Combine them.

• σ 2 (Error Variance): This represents the noise in the data.

◦ Effect: Higher σ 2 → Higher Variance (Less precise).

B. The Denominator (The Good Stuff)

1. SSTj (Total Variation in X):

◦ Effect: Higher SSTj → Lower Variance (More precise).

2. (1 − Rj2) (Independence of X):

The "Variance Inflation Factor" (VIF):

• If X1 is highly correlated with X2 (Multicollinearity), Rj2 is high.

Passage 1: The GPA Model

• Model A: GPA = β0 + β1(HoursStudied) + u

A) Model B has a lower R 2 than Model A.

Passage 2: Housing Prices

Price ̂ = 50,000 + 100(Size) − 5,000(Distance)

Passage 3: Multicollinearity Logic

Problem 1: Determinants of Used Car Prices

a. Write the estimated regression equation.

Problem 2: Wage Equation with Education and Experience

a. Construct a 95% confidence interval for the coefficient on exper.

Problem 3: Advertising and Sales

a. Interpret the coefficient on tv_ads.

Problem 4: House Prices with Dummy Variables

a. Write the regression equation.

Problem 5: Test Scores and Student-Teacher Ratio

In Simple Regression, a higher R 2 is generally better. In Multiple Regression, R 2 is

The Solution: Adjusted R 2 (R̄2)

In econometrics, not all mistakes are created equal.

Sin #1: Omitting a Relevant Variable (Underfitting)

Sin #2: Including an Irrelevant Variable (Overfitting)

RESET (Regression Equation Specification Error Test):

2. Run a second regression adding powers of those predictions (e.g., Y 2̂ , Y 3̂ ) as new

◦ Null Hypothesis (H0): Model is correctly specified.

Passage 1: The wage gap study

• Regression 1: Wage = β0 + β1(Female) + u

D) The Adjusted R 2 of Regression 2 is definitely lower than Regression 1.

Passage 2: The Marketing Director's Dilemma

Passage 3: Interpreting the RESET

Problem 1: Omitted Variable Bias and Coefficient Stability

Problem 2: Functional Form – Testing for Non-Linearity

a. Write the estimated regression equation.

Problem 3: Joint Hypothesis Testing (F-Test)

Problem 4: Adjusted R-Squared and Model Penalties

• Model B: score = b0 + b1*study_time + b2*height + b3*shoe_size

a. Calculate the standard R 2 for Model A and Model B.

b. Calculate the Adjusted R 2 (R̄2) for Model A and Model B.

Problem 5: Interaction Terms and Slope Specification

You might also like

• Model B: score = b0 + b1study_time + b2height + b3*shoe_size