0% found this document useful (0 votes)

12 views32 pages

Lect 12 - Regression Analysis II

The document discusses regression analysis in the context of college affordability and post-college earnings using data from the College Scorecard. It covers the significance of various college factors, prediction intervals, model selection, and testing the significance of regression models. The analysis aims to determine how well different models explain variations in earnings based on factors like cost, graduation rate, debt, and location.

Uploaded by

anasnady2006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views32 pages

Lect 12 - Regression Analysis II

Uploaded by

anasnady2006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Analytics

DS342

12/27/2025
Chapter 7
Regression Analysis

3
Introductory Case: College Scorecard
• College costs and student debt are on the rise
• Students and parents struggle to find clear, reliable data on critical questions of college affordability
and value.
• The Department of Education (DOE) published a redesigned College Scorecard that reports the most
reliable national data on college costs and students’ outcomes at specific colleges.
• Fiona Schmidt, a college counselor, believes that the information from the College Scorecard can help
her as she advises families.
• Fiona wonders what college factors influence post-college earnings and wants answers to the following
questions:
• If a college costs more or has a higher graduation rate, should a student expect to earn more after
graduation?
• If a greater percentage of the students are paying down debt after college, does this somehow
influence post-college earnings?
• And finally, does the location of a college affect post-college earnings?

12/27/2025
Introductory Case: College Scorecard
• Information from 116 colleges on the following variables:
• Annual post-college earnings (Earnings in $)
• The average annual cost (Cost in $)
• The graduation rate (Grad in %)
• The percentage of students paying down debt (Debt in %)
• Whether or not a college is located in a city (City equals 1 if a city location, 0 otherwise)

• Use the information to:

• Make predictions for post-college earnings using regression analysis
• Interpret goodness-of-fit measures for the post-college earnings model
• Determine which factors are statistically significant in explaining post-college earnings

12/27/2025
Prediction
• Predictions are subject to sampling variability.
• The prediction will change if we use a different sample to estimate the
regression model.
• There is a distinction between the interval estimate for the mean of the
response and the interval estimate for the individual value of the
response.
• Confidence interval: mean.
• Prediction interval: individual.

Prediction intervals are always wider than confidence interval.

Prediction
• The point prediction, or best guess, is found by substituting the given
values of the Xs into the estimated regression equation.
• To measure the accuracy of the point predictions, calculate standard errors of
prediction.
• Standard error of prediction for a single Y:

• This error is approximately equal to the standard error of estimate.

• Standard error of prediction for the mean Y:

• This error is approximately equal to the standard error of estimate divided by the square root of
the sample size.
Prediction
• These standard errors can be used to calculate a
95% prediction interval for an individual value and a 95%
confidence interval for a mean value.
• Go out a t-tabulated of the relevant standard error on either side of the
point prediction.
• The term prediction interval (rather than confidence interval) is used for
an individual value because an individual value of Y is not a population
parameter.
• However, the interpretation is basically the same.
Prediction
• Example: Consider the below model from the College Scorecard
case:

Earnings =  0 + 1 Cost +  2 Grad + 3 Debt +  4 City + 

• Construct the 95% confidence interval for the expected Earnings if
Cost equals $25,000, Grad equals 60, Debt equals 80, and City
equals 1.
• Construct the 95% prediction interval for the expected Earnings if
Cost equals $25,000, Grad equals 60, Debt equals 80, and City
equals 1.
Prediction
• df = 111, t0.025,111 = 1.982, se = 5,645.83
For the confidence interval:
• 𝑦ො = 45,408.8
𝑠𝑒 5.645.83
• 𝑦ො ± 𝑡𝛼Τ2,𝑑𝑓 ∗ = 45,408.8 ± 1.982 ∗ = [44,370.06,46,447.54]
𝑛 116
With 95% confidence, we can state that the mean Earnings fall between
$44,370.06 and $46,447.54
For the prediction interval:
• 𝑦ො ± 𝑡𝛼Τ2,𝑑𝑓 ∗ 𝑠𝑒 = 45,408.8 ± 1.982 ∗ 5.645.83 = [34,221.21,56,596.39]
With 95% confidence, the Earnings fall between $34,221.21 and $56,596.39
Model Selection
• Example: Recall the College Scorecard case and
consider three models. Which should we choose?
Model 1: Earnings =  0 + 1 Cost + 
Model 2: Earnings =  0 + 1 Cost +  2 Grad + 3 Debt + 
Model 3: Earnings =  0 + 1 Cost +  2 Grad + 3Debt +  4City + 

• Several “goodness-of-fit” measures summarize how

well the sample regression equation fits the data.
• The standard error of the estimate, se .
2
• The coefficient of determination, .
R
2
• The adjusted coefficient of determination, adjusted R .
Model Selection
• Recall that a residual is the difference between the observed and predicted value of
the response,
→ 𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖 .
• The sample regression equation provides a good fit when the dispersion of the
residuals is relatively small.
• The sample variance, 𝑆𝑒2 ,is the average squared deviation between the observed
and predicted values.
• The standard deviation of the residuals, or standard error of the estimate, has the
same units of measurement as the response.
𝑆𝑆𝐸
→ 𝑠𝑒 =
𝑛−𝑘−1

• 𝑆𝑆𝐸is the error sum of squares, 𝑘denotes the number of predictors, and 𝑛is the
sample size.
Model Selection
•For a fixed sample size, adding predictors changes both the numerator and denominator of
model fit measures.
→ The overall effect helps determine whether new predictors truly improve the model.
•When comparing models with the same response variable, the model with the smaller
standard error of the estimate (𝑠𝑒 ) is preferred.
•The coefficient of determination (𝑅2 ) measures how much of the variation in the
response is explained by the regression model.
•𝑅2 is the ratio of explained variation to total variation in the response variable.
Model Selection
• We cannot use 𝑅2 for model comparison when the competing models do not include the
same number of predictor variables (but have the same response).
𝑹𝟐 never decreases as we add more variables.
→ May include variables with no economic or intuitive foundation.
• Adjusted 𝑅2 explicitly accounts for the sample size 𝑛 and the number of predictor
variables 𝑘.
𝑛−1
Adjusted 𝑅2 = 1 − 1 − 𝑅2
𝑛−𝑘−1
• Imposes a penalty for any additional predictors.
• The higher the adjusted 𝑅2 ,the better the model.
• When comparing models with the same response, the model with the higher adjusted 𝑅2 is
preferred.
Model Selection
• Example: Recall the introductory case and consider three models.
Model 1: Earnings =  0 + 1 Cost + 
Model 2: Earnings =  0 + 1 Cost +  2 Grad + 3Debt + 
Model 3: Earnings =  0 + 1 Cost +  2 Grad + 3 Debt +  4 City + 

Model 1 Model 2 Model 3

Standard error of the estimate s e 6,271.4407 5,751.8065 5,645.8306
Coefficient of determination R 2 0.2767 0.4023 0.4292
Adjusted R 2 0.2703 0.3862 0.4087

• a. Which of the three models is the preferred model?

• b. Interpret the coefficient of determination for the preferred model.
• c. What percentage of the sample variation in annual post-college earnings is unexplained
by the preferred model?
Model Selection
• Example:

Model 1 Model 2 Model 3

Standard error of the estimate s e 6,271.4407 5,751.8065 5,645.8306
Coefficient of determination R 2 0.2767 0.4023 0.4292
Adjusted R 2 0.2703 0.3862 0.4087

• a. Model 3 has the lowest standard error of the estimate and the highest adjusted 𝑅2
• b. Model 3 explains 42.92% of the sample variation in the earnings.
• c. Model 3 does not explain 57.08% of the sample variation in earnings.
7.2: Model Selection 8

• Note that goodness-of-fit measures discussed in this section use the

same sample to build the model to asses it.
• Unfortunately, this procedure does not gauge how well the estimated
model will predict in an unseen sample.
• We will discuss cross-validation techniques that evaluate predictive
models by dividing the original sample into a training set to build (train)
the model and a validation set to evaluate (validate) it.
• The validation set is used to provide an independent performance
assessment by exposing the model to unseen data.
Testing the Model for Significance

Testing the Significance

Testing the Significance
of the coefficients of the
of the Overall Model
Model
Testing the Model for Significance

• When the sample size is too small, you can get good values for
MSE and r2 even if there is no relationship between the variables
• Testing the model for significance helps determine if the values
are meaningful
• We do this by performing a statistical hypothesis test
Regression as Analysis of Variance
ANOVA conducts an F-test to determine whether variation in Y is due to varying
levels of X (to test for significance of regression).
We start with the general linear model
𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + 𝛽𝑘 𝑋𝑘 + ⋯ + 𝜀
H0: all population slope coefficients (i)= 0
H1: at least one of the population slope coefficients (i) ≠ 0
◼ If 1 = 0, the null hypothesis is that there is no relationship between X and Y
◼ The alternate hypothesis is that there is a linear relationship (1 ≠ 0)
◼ If the null hypothesis can be rejected, we have proven there is a relationship
Measuring the Fit of the Regression Model

◼ Regression models can be developed for any variables X

and Y

◼ How do we know the model is actually helpful in predicting

Y based on X?

◼ Three measures of variability are

◼ SST – Total variability about the mean
◼ SSE – Variability about the regression line
◼ SSR – Total variability that is explained by the regression model
Measuring the Fit of the Regression Model
◼Three measures of
variability are 12 –
Y^ = 2 + 1.25X
SST – Total variability about the mean
10 –
SSR – Total Variability that is explained
^
by the regression line Y–Y
8– Y–Y
SSE – variability about the regression ^

Sales ($100,000)
Y–Y Y
model
6–

4–

2–

0– | | | | | | | |
0 1 2 3 4 5 6 7 8
Payroll ($100 million)
Analysis of Variance (ANOVA) Table

◼ When software is used to develop a regression model, an ANOVA table is typically

created that shows the observed significance level (p-value) for the calculated F value
◼ This can be compared to the level of significance () to make a decision

This is a right-tailed F test

I. Testing the Significance of the Overall Model

▪ If there is very little error, the MSE would be small and the F-statistic
would be large indicating the model is useful.

▪ If the F-statistic is large, the significance level (p-value) will be low,

indicating it is unlikely this would have occurred by chance.

▪ So when the F-value is large, we can reject the null hypothesis and
accept that there is a linear relationship between X and Y and the
values of the MSE and r2 are meaningful.
I. Testing the Significance of the Overall Model

Make a decision using one of the following methods

a) Reject the null hypothesis if the test statistic is greater than the F-value
from the statistical tables. Otherwise, do not reject the null hypothesis:
Reject if 𝐹𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 > 𝐹𝛼 ,𝑑𝑓 ,𝑑𝑓
1 2
𝑑𝑓1 = 𝑘
𝑑𝑓2 = 𝑛 − 𝑘 − 1

b) Reject the null hypothesis if the observed significance level, or p-value, is less than
the level of significance (𝛼Τ2). Otherwise, do not reject the null hypothesis:

𝑝−value = 𝑃(𝐹 > calculated test statistic)

Reject if 𝑝−value < 𝛼
College Scorecard Example
𝑬𝒂𝒓𝒏𝒊𝒏𝒈𝒔 = 𝜷𝟎 + 𝜷𝟏 𝑪𝒐𝒔𝒕 + 𝜷𝟐 𝑮𝒓𝒂𝒅 + 𝜷𝟑 𝑫𝒆𝒃𝒕 + 𝜷𝟒 𝑪𝒊𝒕𝒚 + 𝜺
Given  = 0.05

Calculate the value of the test

statistic

𝑀𝑆𝑅 665172989.8
𝐹= = =20.87
𝑀𝑆𝐸 31875403.3

The value of F associated with a 5% level of significance and with degrees of freedom 4 and
111 from statistical tables is: F = 2.4
0.05,4,111
Fcalculated = 20.86
Reject H0 because 20.86 > 2.4
College Scorecard Example
◼ We can conclude there is a statistically
significant relationship between X’s and Y
◼ The r2 value of 0.42 means about 42% of the
variability in earnings (Y) is explained by the
other variables (X’s)

0.05
F = 2.4 20.86
II. Testing the Significance of the coefficients of the Model

◼ Evaluation is similar to simple linear regression models

◼ The p-value for the F-test and r2 are interpreted the same

◼ The hypothesis is different because there is more than one

independent variable
◼ The F-test is investigating whether all the coefficients are equal to 0
II. Testing the Significance of the coefficients of the Model

• There is another important piece of information in regression outputs:

the t-values for the individual regression coefficients.
• Each t-value is the ratio of the estimated coefficient to its standard error.

• It indicates how many standard errors the regression coefficient is from zero.
• A t-value can be used in a hypothesis test for a regression
coefficient.
• If a variable’s coefficient is zero, there is no point in including this variable in the
equation.
• To run this test, simply compare the t-value in the regression output with a
tabulated t-value and reject the null hypothesis only if the t-value from the
computer output is greater in magnitude than the tabulated t-value.
Evaluating Scorecard Example
• All explanatory variables are significant since both have too low p-values
(<0.05) unless one which is the Dept%

• T-tabulated = t0.025,111 = 1.982

• Again, all explanatory variables are significant since both have higher t-values (>1.982)
unless one which is the Dept%
INCLUDE/EXCLUDE DECISIONS
• The t-values of regression coefficients can be used to make
include/exclude decisions for explanatory variables in a
regression equation.
• Finding the best Xs to include in a regression equation is the most
difficult part of any real regression analysis.
• You are always trying to get the best fit possible, but the principle of
parsimony suggests using the fewest number of variables.
• This presents a trade-off, where there are not always easy answers.
• To help with this decision, several guidelines are presented on the next
slide.
Guidelines for Including/Excluding Variables
in a Regression Equation
• Look at a variable’s t-value and its associated p-value. If the p-value is above
some accepted significance level, such as 0.05, this variable is a candidate for
exclusion.

• Check whether a variable’s t-value is less than 1 or greater than 1 in magnitude. If

it is less than 1, then this variable is excluded from the equation.

• When there is a group of variables that are in some sense logically related, it is
sometimes a good idea to include all of them or exclude all of them.
Thank You ☺

Understanding Regression Models
No ratings yet
Understanding Regression Models
7 pages
Analyzing Salary and Education Impact
No ratings yet
Analyzing Salary and Education Impact
19 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
15 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
34 pages
Understanding Regression Analysis Concepts
No ratings yet
Understanding Regression Analysis Concepts
10 pages
ST 321 All Notes
No ratings yet
ST 321 All Notes
91 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
79 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
24 pages
Introduction to Statistical Learning
No ratings yet
Introduction to Statistical Learning
19 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
26 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
57 pages
Understanding Regression Analysis Techniques
No ratings yet
Understanding Regression Analysis Techniques
43 pages
Regression Analysis in Econometrics
100% (1)
Regression Analysis in Econometrics
54 pages
Week 9 A
No ratings yet
Week 9 A
26 pages
FinQuiz - Curriculum Note, Study Session 2, Reading 4
No ratings yet
FinQuiz - Curriculum Note, Study Session 2, Reading 4
5 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
31 pages
Linear Regression Concepts in R
No ratings yet
Linear Regression Concepts in R
24 pages
Linear Regression Analysis in R
No ratings yet
Linear Regression Analysis in R
24 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
26 pages
Estimating Simple Linear Regression
No ratings yet
Estimating Simple Linear Regression
8 pages
Regression Analysis in Construction Studies
No ratings yet
Regression Analysis in Construction Studies
77 pages
Understanding Linear Regression in R
No ratings yet
Understanding Linear Regression in R
23 pages
Simple Linear Regression Explained
No ratings yet
Simple Linear Regression Explained
48 pages
Statistical Concepts: Mean, Median, Mode
No ratings yet
Statistical Concepts: Mean, Median, Mode
15 pages
Cronbach's Alpha & Linear Regression Guide
No ratings yet
Cronbach's Alpha & Linear Regression Guide
49 pages
Understanding Linear Regression Models
No ratings yet
Understanding Linear Regression Models
26 pages
Understanding Regression Analysis Techniques
No ratings yet
Understanding Regression Analysis Techniques
33 pages
Understanding Multiple Regression Analysis
100% (2)
Understanding Multiple Regression Analysis
21 pages
Locating Coefficients in Regression Analysis
No ratings yet
Locating Coefficients in Regression Analysis
104 pages
STAT 353: Expectation, Variance & Regression Guide
No ratings yet
STAT 353: Expectation, Variance & Regression Guide
44 pages
Confidence Intervals in Regression Analysis
No ratings yet
Confidence Intervals in Regression Analysis
64 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
64 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
15 pages
Multiple Regression Analysis Techniques
No ratings yet
Multiple Regression Analysis Techniques
72 pages
Regression Analysis Techniques
No ratings yet
Regression Analysis Techniques
16 pages
Chapter 3 MLRM
No ratings yet
Chapter 3 MLRM
28 pages
Get Your Introduction To Econometrics 3rd Edition James H. Stock - Digital
100% (4)
Get Your Introduction To Econometrics 3rd Edition James H. Stock - Digital
183 pages
Hypothesis Testing and Regression Analysis
No ratings yet
Hypothesis Testing and Regression Analysis
32 pages
Strongest Linear Regression Analysis
No ratings yet
Strongest Linear Regression Analysis
5 pages
Excel Multiple Regression Analysis
No ratings yet
Excel Multiple Regression Analysis
43 pages
Regression
No ratings yet
Regression
38 pages
Statistical Inference and Regression Analysis
No ratings yet
Statistical Inference and Regression Analysis
6 pages
Understanding Regression Analysis Techniques
No ratings yet
Understanding Regression Analysis Techniques
56 pages
Basic Techniques in Regression Analysis
No ratings yet
Basic Techniques in Regression Analysis
27 pages
Regression Analysis for Forecasting
No ratings yet
Regression Analysis for Forecasting
7 pages
Demand Estimation and Regression Analysis
No ratings yet
Demand Estimation and Regression Analysis
41 pages
Basic Techniques in Parameter Estimation
No ratings yet
Basic Techniques in Parameter Estimation
27 pages
Understanding Regression Analysis Concepts
100% (1)
Understanding Regression Analysis Concepts
280 pages
Demand Estimation and Regression Analysis
No ratings yet
Demand Estimation and Regression Analysis
41 pages
Statistical Models in Epidemiology
No ratings yet
Statistical Models in Epidemiology
87 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
78 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
61 pages
Understanding Linear Regression Analysis
No ratings yet
Understanding Linear Regression Analysis
34 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
30 pages
BBL 536E Regression
No ratings yet
BBL 536E Regression
91 pages
Lecture 3
No ratings yet
Lecture 3
30 pages
Lecture 1
No ratings yet
Lecture 1
41 pages
Lect 1 - Introduction
No ratings yet
Lect 1 - Introduction
26 pages
Lect 4 - Data Visualization 2025
No ratings yet
Lect 4 - Data Visualization 2025
59 pages
Lect 7 - BI Tools Part II
No ratings yet
Lect 7 - BI Tools Part II
61 pages
Lect 3 - Summary Measures 2025
No ratings yet
Lect 3 - Summary Measures 2025
51 pages
Chapter 1: Introduction: Silberschatz, Galvin and Gagne ©2013 Operating System Concepts - 9 Edit9on
No ratings yet
Chapter 1: Introduction: Silberschatz, Galvin and Gagne ©2013 Operating System Concepts - 9 Edit9on
58 pages
Power Query - Power Pivot
No ratings yet
Power Query - Power Pivot
2 pages
W-ch1,2,3,5,6,8,9 Merged
No ratings yet
W-ch1,2,3,5,6,8,9 Merged
362 pages
Additional Concepts Related To Scientific Thinking (Part 1)
No ratings yet
Additional Concepts Related To Scientific Thinking (Part 1)
21 pages
Staffing Analysis for Small Accounting Office
No ratings yet
Staffing Analysis for Small Accounting Office
4 pages
Audit Sampling Techniques Explained
No ratings yet
Audit Sampling Techniques Explained
58 pages
EDA Discussion 1
No ratings yet
EDA Discussion 1
72 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
8 pages
Understanding Quantitative Research Basics
No ratings yet
Understanding Quantitative Research Basics
19 pages
Understanding Talent Management Strategies
No ratings yet
Understanding Talent Management Strategies
35 pages
Intra-Industry Trade in Australian Manufacturing
No ratings yet
Intra-Industry Trade in Australian Manufacturing
23 pages
Diffraction Experiments: Data Analysis Report
No ratings yet
Diffraction Experiments: Data Analysis Report
8 pages
Hypothetical Data with One-Hot Encoding
No ratings yet
Hypothetical Data with One-Hot Encoding
3 pages
Key Terms in Probability Explained
No ratings yet
Key Terms in Probability Explained
22 pages
Sampling Distribution of Means Explained
No ratings yet
Sampling Distribution of Means Explained
22 pages
Inventory and Queue Management Models
No ratings yet
Inventory and Queue Management Models
15 pages
Understanding Normal Distribution Basics
No ratings yet
Understanding Normal Distribution Basics
1 page
Continuous Probability Questions and Answers
No ratings yet
Continuous Probability Questions and Answers
4 pages
Sampling Distributions Explained
No ratings yet
Sampling Distributions Explained
19 pages
Cronbach's Alpha Reliability Analysis
No ratings yet
Cronbach's Alpha Reliability Analysis
5 pages
Understanding the Friedman Test
No ratings yet
Understanding the Friedman Test
10 pages
Applied Statistics in Business & Economics: David P. Doane and Lori E. Seward
No ratings yet
Applied Statistics in Business & Economics: David P. Doane and Lori E. Seward
57 pages
Acne Self-Treatment in Buraydah Nursing Students
No ratings yet
Acne Self-Treatment in Buraydah Nursing Students
8 pages
Body Temperature and Dow Statistics Review
100% (1)
Body Temperature and Dow Statistics Review
10 pages
Sample Size Calculation Simplified
No ratings yet
Sample Size Calculation Simplified
5 pages
Browsecomp-V: A Visual, Vertical, and Verifiable Benchmark For Multimodal Browsing Agents
No ratings yet
Browsecomp-V: A Visual, Vertical, and Verifiable Benchmark For Multimodal Browsing Agents
12 pages
Compensation for Stay-at-Home Mothers
No ratings yet
Compensation for Stay-at-Home Mothers
9 pages
Neuromarketing and Consumer Psychology Insights
No ratings yet
Neuromarketing and Consumer Psychology Insights
10 pages
Capstone Project Guide for Data Analysts
No ratings yet
Capstone Project Guide for Data Analysts
18 pages
Turner, Ryan - Python Machine Learning - The Ultimate Beginner's Guide To Learn Python Machine Learning Step by Step Using Scikit-Learn and Tensorflow (2019)
100% (1)
Turner, Ryan - Python Machine Learning - The Ultimate Beginner's Guide To Learn Python Machine Learning Step by Step Using Scikit-Learn and Tensorflow (2019)
144 pages
Fundamentals of Statistics Chapter - 2
No ratings yet
Fundamentals of Statistics Chapter - 2
27 pages
xgamma Distribution: Properties & Applications
No ratings yet
xgamma Distribution: Properties & Applications
17 pages
HR Management's Impact on Kenya's Service Delivery
No ratings yet
HR Management's Impact on Kenya's Service Delivery
15 pages
Maa137 Ten1 251128-2
No ratings yet
Maa137 Ten1 251128-2
8 pages

Lect 12 - Regression Analysis II

Uploaded by

Lect 12 - Regression Analysis II

Uploaded by

Data Analytics

• Use the information to:

Prediction intervals are always wider than confidence interval.

• This error is approximately equal to the standard error of estimate.

• Standard error of prediction for the mean Y:

Earnings =  0 + 1 Cost +  2 Grad + 3 Debt +  4 City + 

• Several “goodness-of-fit” measures summarize how

Model 1 Model 2 Model 3

• a. Which of the three models is the preferred model?

Model 1 Model 2 Model 3

• Note that goodness-of-fit measures discussed in this section use the

Testing the Significance

Testing the Significance

◼ Regression models can be developed for any variables X

◼ How do we know the model is actually helpful in predicting

◼ Three measures of variability are

◼ When software is used to develop a regression model, an ANOVA table is typically

This is a right-tailed F test

▪ If the F-statistic is large, the significance level (p-value) will be low,

Make a decision using one of the following methods

𝑝−value = 𝑃(𝐹 > calculated test statistic)

Calculate the value of the test

◼ Evaluation is similar to simple linear regression models

◼ The hypothesis is different because there is more than one

• There is another important piece of information in regression outputs:

• T-tabulated = t0.025,111 = 1.982

• Check whether a variable’s t-value is less than 1 or greater than 1 in magnitude. If

You might also like