0% found this document useful (0 votes)
25 views94 pages

Multiple Regression Models Overview

The document discusses the effects of changing units of measurement in multiple regression models, emphasizing that while coefficients may change, the interpretation of the model remains consistent. It highlights the use of logarithmic transformations and beta coefficients to better understand the impact of independent variables on the dependent variable. The document also includes examples using statistical software to illustrate these concepts in practice.

Uploaded by

richardtsai1023
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views94 pages

Multiple Regression Models Overview

The document discusses the effects of changing units of measurement in multiple regression models, emphasizing that while coefficients may change, the interpretation of the model remains consistent. It highlights the use of logarithmic transformations and beta coefficients to better understand the impact of independent variables on the dependent variable. The document also includes examples using statistical software to illustrate these concepts in practice.

Uploaded by

richardtsai1023
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Econ 2023 Multiple Regression Models: Functional

Forms

Sheng-Kai Chang

NTU

Spring 2025

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 1 / 64
1. Units of Measurement
We know from simple regression that change the units of measurement
of the dependent variable or some of the independent variables cannot
change the interpretation of the OLS regression line.
This means that if we rescale any of our variables, the coe¢ cients have
to change to keep the interpretation unchanged.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 2 / 64
1. If we multiply the dependent variable y by a nonzero constant, say c,
all of the coe¢ cients – the intercept and the slopes – get multiplied by c.
So do the …tted values and residuals. Nothing happens to the R-squared.
Nothing happens to t statistics (except they change sign if c < 0) or F
statistics.
2. If we multiple an independent variable xj by the nonzero constant c
then the coe¢ cient in xj gets divided by c. Nothing happens to …tted
values, residuals, or R-squared. Nothing happens to t or F statistics,
except the t statistic on the new xj variable changes sign if c < 0.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 3 / 64
3. Suppose y > 0 and we intially use log(y ) as the dependent variable. If
we …rst multiply y by the constant c > 0 and then take the log, only the
intercept in the regression will change. This is because
log(c y ) = log(c ) + log(y ). As an example, suppose y is initially in
thousands of dollars. Then 1, 000 y is in dollars. Now

log(1, 000 y ) = log(1, 000) + log(y ) 6.91 + log(y ),


and so the intercept will increase by about 6.91. None of the slopes
changes. The …tted values each increases by 6.91, but the residuals and
R-squared do not change. Nothing happens to t or F statistics.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 4 / 64
4. If we use log(c xj ) instead of log(xj ), the intercept changes but the
coe¢ cient on xj does not; neither does the R-squared or any test statistics
(except for those relating to the intercept).

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 5 / 64
All of the previous claims can be shown algebraically. The notation is
really the tedious part.
Instead, use the data in [Link] to illustrate.
bwghtlbs = bwght/16 , packs = cigs/20, and famindol = 1, 000 faminc.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 6 / 64
. reg bwght cigs faminc
Source | SS df MS Number of obs = 987
-------------+------------------------------ F( 2, 984) = 16.17
Model | 12839.4858 2 6419.74292 Prob > F = 0.0000
Residual | 390592.2 984 396.943293 R-squared = 0.0318
-------------+------------------------------ Adj R-squared = 0.0299
Total | 403431.686 986 409.159925 Root MSE = 19.923
------------------------------------------------------------------------------
bwght | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigs | -.4579605 .1100615 -4.16 0.000 -.6739428 -.2419782
faminc | .1036546 .0339187 3.06 0.002 .0370932 .170216
_cons | 116.0649 1.257367 92.31 0.000 113.5975 118.5324
------------------------------------------------------------------------------
. reg bwghtlbs cigs faminc
Source | SS df MS Number of obs = 987
-------------+------------------------------ F( 2, 984) = 16.17
Model | 50.1542415 2 25.0771208 Prob > F = 0.0000
Residual | 1525.75078 984 1.55055974 R-squared = 0.0318
-------------+------------------------------ Adj R-squared = 0.0299
Total | 1575.90502 986 1.59828096 Root MSE = 1.2452
------------------------------------------------------------------------------
bwghtlbs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigs | -.0286225 .0068788 -4.16 0.000 -.0421214 -.0151236
faminc | .0064784 .0021199 3.06 0.002 .0023183 .0106385
_cons | 7.254058 .0785854 92.31 0.000 7.099844 7.408273
------------------------------------------------------------------------------

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 6 / 64
. reg bwght packs faminc
Source | SS df MS Number of obs = 987
-------------+------------------------------ F( 2, 984) = 16.17
Model | 12839.4859 2 6419.74293 Prob > F = 0.0000
Residual | 390592.2 984 396.943293 R-squared = 0.0318
-------------+------------------------------ Adj R-squared = 0.0299
Total | 403431.686 986 409.159925 Root MSE = 19.923
------------------------------------------------------------------------------
bwght | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
packs | -9.15921 2.20123 -4.16 0.000 -13.47886 -4.839565
faminc | .1036546 .0339187 3.06 0.002 .0370932 .170216
_cons | 116.0649 1.257367 92.31 0.000 113.5975 118.5324
------------------------------------------------------------------------------
. gen famincdol = 1000*faminc
. gen lfamincdol = log(famincdol)

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 6 / 64
. reg lbwght cigs lfaminc
Source | SS df MS Number of obs = 987
-------------+------------------------------ F( 2, 984) = 15.03
Model | 1.06901908 2 .534509538 Prob > F = 0.0000
Residual | 34.9976702 984 .035566738 R-squared = 0.0296
-------------+------------------------------ Adj R-squared = 0.0277
Total | 36.0666892 986 .036578792 Root MSE = .18859
------------------------------------------------------------------------------
lbwght | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigs | -.0040958 .0010379 -3.95 0.000 -.0061326 -.002059
lfaminc | .0209848 .0067197 3.12 0.002 .0077981 .0341714
_cons | 4.699213 .0222172 211.51 0.000 4.655614 4.742811
------------------------------------------------------------------------------
. reg lbwght cigs lfamincdol
Source | SS df MS Number of obs = 987
-------------+------------------------------ F( 2, 984) = 15.03
Model | 1.06901913 2 .534509563 Prob > F = 0.0000
Residual | 34.9976701 984 .035566738 R-squared = 0.0296
-------------+------------------------------ Adj R-squared = 0.0277
Total | 36.0666892 986 .036578792 Root MSE = .18859
------------------------------------------------------------------------------
lbwght | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigs | -.0040958 .0010379 -3.95 0.000 -.0061326 -.002059
lfamincdol | .0209848 .0067197 3.12 0.002 .0077981 .0341714
_cons | 4.554255 .0680041 66.97 0.000 4.420805 4.687704
------------------------------------------------------------------------------

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 6 / 64
The bottom line is that nothing unexpected happens. We cannot change
the importance of an e¤ect, goodness of …t, or statistical inference by
changing units of measurement of our dependent or independent variables.
Because a change in logs approximates the proportionate or relative
change – which is free of units of measurement – changing units and then
taking logs only changes the intercept.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 7 / 64
Beta, or Standardized, Coe¢ cients
A beta coe¢ cient can be useful when some of the xj or y have units
that are not easily understood. So, for example, we might not have a good
sense for how important is an increase in a test score by one point. Using
logs gives us percentage e¤ects, but we do not always want to do that.
It is often useful to ask the following question, which allows us to see
how important an e¤ect is relative to the population: “How many standard
deviations will y change when xj increases by one standard deviation?”
We call such estimates the beta coe¢ cients.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 8 / 64
We can make such calculations directly, or have Stata compute the beta
coe¢ cients. This is done be standardizing y and each xj by subtracting its
mean and dividing by its sample standard deviation. (There is no beta
coe¢ cient for the intercept because the intercept is zero when all variables
have zero sample averages).

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 9 / 64
EXAMPLE: Final exam performance in [Link]
. reg final missed priGPA ACT, beta
Source | SS df MS Number of obs = 680
-------------+------------------------------ F( 3, 676) = 56.79
Model | 3032.09408 3 1010.69803 Prob > F = 0.0000
Residual | 12029.853 676 17.7956405 R-squared = 0.2013
-------------+------------------------------ Adj R-squared = 0.1978
Total | 15061.9471 679 22.1825435 Root MSE = 4.2185
------------------------------------------------------------------------------
final | Coef. Std. Err. t P>|t| Beta
-------------+----------------------------------------------------------------
missed | -.0793386 .0352349 -2.25 0.025 -.0918918
priGPA | 1.915294 .372614 5.14 0.000 .2215126
ACT | .4010639 .0532268 7.54 0.000 .2972548
_cons | 12.37304 1.171961 10.56 0.000 .
------------------------------------------------------------------------------

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 10 / 64
The quali…er “beta” just gives us a simple way to compute the beta
coe¢ cients. Nothing changes in terms of …t or testing.
The coe¢ cient on priGPA is larger than than on ACT by almost a factor
of …ve. But does this mean priGPA has the more important e¤ect?
d by about .222 sds while a
An increase in one sd of priGPA increases …nal
d by about .297 sds, which is a larger
one sd increase in ACT increases …nal
movement in the distribution of …nal exam scores.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 10 / 64
. sum final missed priGPA ACT
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
final | 680 25.89118 4.709835 10 39
missed | 680 5.852941 5.455037 0 30
priGPA | 680 2.586775 .5447141 .857 3.93
ACT | 680 22.51029 3.490768 13 32

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 11 / 64
For example, holding other factors …xed, if ∆priGPA = .545 (one sd), we
get the same answer as the beta coe¢ cient:

\ = 1.915(.545) = 1.0437
∆…nal
In sd terms, 1.0437/4.7098 .222.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 11 / 64
2. More on Functional Form
We now study the logarithmic transformation in more depth, and also
look at quadratics and interaction terms. These are all ways of improving
the …t, and even interpretation, of multiple regression models.
More on Logarithms
Consider an equation for log(price ) for home prices in [Link]:

log(price ) = β0 + β1 log(nox ) + β2 rooms + u


where nox is amount if nitrous oxide in the community and rooms is the
median number of rooms in houses in the community.
We know that β1 is (approximately) the elasticity of price with respect
to nox (pollution). β2 , when we multiply by 100, is the approximate
percentage change in price when rooms increases by one.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 12 / 64
But the approximation need not be good, especially if we consider larger
changes.
After we estimate the general equation

\
log (y ) = β̂0 + β̂1 log(x1 ) + β̂2 x2 ,
the more precise calculation is

%∆ŷ = 100 [exp( β̂2 ∆x2 ) 1]

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 13 / 64
If ∆x2 = 1, we get

%∆ŷ = 100 [exp( β̂2 ) 1]


If β̂2 is not “too large,” this will be close to the simpler estimate
100 β̂2 . But sometimes it is much di¤erent.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 14 / 64
. reg lprice lnox rooms
Source | SS df MS Number of obs = 506
-------------+------------------------------ F( 2, 503) = 265.69
Model | 43.4513652 2 21.7256826 Prob > F = 0.0000
Residual | 41.1308598 503 .081771093 R-squared = 0.5137
-------------+------------------------------ Adj R-squared = 0.5118
Total | 84.582225 505 .167489554 Root MSE = .28596
------------------------------------------------------------------------------
lprice | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lnox | -.7176736 .0663397 -10.82 0.000 -.8480106 -.5873366
rooms | .3059183 .0190174 16.09 0.000 .268555 .3432816
_cons | 9.233738 .1877406 49.18 0.000 8.864885 9.60259
------------------------------------------------------------------------------
. di 100*(exp(.306) - 1)
35.798231
. di 100*(exp(-.306) - 1)
-26.361338

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 14 / 64
Thus, a more precise estimate of the percentage change from adding one
more room is 35.8% rather than 30.6%.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 15 / 64
Reasons for Using the Natural Log
1. The coe¢ cients have percentage change interpretations. Consequently,
we can be ignorant of the units of measurement of any variable that
appears in logarithmic form.
2. When y > 0, models with log(y ) as the dependent variable often more
closely satisfy the classical linear model assumptions. For example, the
model has a better chance of being linear, homoskedasticity is more likely
to hold, and normality is often much more plausible. (These are
observations based on experience.)
3. In most cases, taking the log greatly reduces the variation of a variable,
making OLS estimates less prone to outlier in‡uence.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 16 / 64
From [Link]:

. sum salary
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
salary | 209 1281.12 1372.345 223 14822
. sum lsalary
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
lsalary | 209 6.950386 .5663741 5.407172 9.603868

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 16 / 64
Limitations of Using the Natural Log
1. If y 0 but y = 0 is possible, we cannot use log(y ). Sometimes
log(1 + y ) is used, but interpretation of the coe¢ cients is di¢ cult, and
log(1 + y ) 0 if y 0.
2. It is harder to predict y when we have estimated a model for log(y ).
(More on this later.)
3. In cases where y is a fraction and close to zero for many observations,
log(yi ) can have more variability than yi .

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 17 / 64
Some Rules on Using Logarithms
1. Logs are often used for dollar amounts that are always positive, as well
as for variables such as population, especially when there is a lot of
variation.
2. Logs are used less often for variables measured in years, such as
schooling, age, and experience. We often want to know what the e¤ect of
increasing schooling by one year (not by a certain percentage).

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 18 / 64
3. Logs are used less infrequently for variables that are already percents or
proportions.
Consider unem, the unemployment rate. Because unem is a percent,
changes are percentage point changes. So, unem going from 8 to 9 is a
one percentage point increase, but it is a 12.5 percent increase from the
starting value of 8. The logarithmic change is log(9) log(8) .118,
which means approximately 11.8% as the approximatation to 12.5%.
Usually we are more interested in the e¤ect of a one percentage point
change, but not always.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 19 / 64
Models with Quadratics
Models with quadratics can deliver increasing or decreasing e¤ects.
Plus, they contain the constant e¤ect as a special case, which can be
easily tested.
Models also allow for a turning point, which is sometimes of interest.
(For example, what is the optimal number of students at a high school for
high school performance?)

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 20 / 64
Consider the model

y = β0 + β1 x + β2 x 2 + u

There is only a single explanatory variable, x, but two regressors, x1 = x


and x2 = x 2 .
The slope of y with respect to x depends on β1 and β2 , and the value of
x:

dy
= β1 + 2β2 x holding u …xed
dx
Estimation is straightforward. We just de…ne a new variable, x 2 , and
include it along with x as a regressor.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 21 / 64
Having estimated the coe¢ cients we write

ŷ = β̂0 + β̂1 x + β̂2 x 2

∆ŷ
β̂1 + 2 β̂2 x
∆x

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 22 / 64
Cases
1. If β̂1 > 0 and β̂2 < 0, the slope is initially positive, but it decreases as
x increases. The function has a hump shape and turns at the value

β̂1
x = ,
2 β̂2
the maximum of the function. After the value x , the slope is negative.
2. If If β̂1 < 0 and β̂2 > 0, the slope is initially negative, but it increases
(gets less negative) as x increases. The function has a U shape and turns
at the x given above. But now x is the minimum of the function. Once
x > x , the slope is positive.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 23 / 64
In Cases 1 and 2, we should ask: Does the turning point makes sense? If
it does not make sense for the slope to change signs, does the sign change
happen only at an extreme value of x?
3. If β̂1 > 0 and β̂2 > 0, the slope ( β̂1 + 2 β̂2 x) is always positive and
increases with x. The turning point occurs at a negative value of x. (In
practice, less common than Case 1.)
4. If β̂1 < 0 and β̂2 < 0, the slope is always negative and becomes more
negative (increases in absolute value) as x in creases. The turning point is
negative.
Adding additional explanatory variables changes nothing of the
calculations, except for giving a ceteris paribus interpretation.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 24 / 64
EXAMPLE: A log(wage ) equation with a quadratic in experience.
([Link])

[ =
lwage .192 + .136 educ + .123 exper .0038 exper 2
(.291 ) (.011 ) (.037 ) (.0017 )
2
n = 759, R = .196

The estimated return to education is 13.6%. The model assumes this is


the same for all years of experience and education.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 25 / 64
By contrast, each year of experience is worth less than the preceding
year. The slope is, using calculus – just take the derivative with respect to
exper – is

∆lwage
[
.123 2(.0038)exper = .123 .0076 exper
∆exper
Can think of 12.3% as approximately the return to the …rst year of
experience – essentially starting o¤ in the workforce. The return in going
from 10 to 11 is about

.123 .0076(10) = .047,


or 4.7%.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 26 / 64
We could be more precise and not use a calculus approximation:

[.123(11) .0038(11)2 ] [.123(10) .0038(10)2 ] .043


or 4.3%, which is reasonably close.
Would want to do the exact calculation for larger changes in exper.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 27 / 64
. gen expersq = exper^2
. reg lwage educ exper expersq
Source | SS df MS Number of obs = 759
-------------+------------------------------ F( 3, 755) = 61.31
Model | 51.4728247 3 17.1576082 Prob > F = 0.0000
Residual | 211.27582 755 .279835523 R-squared = 0.1959
-------------+------------------------------ Adj R-squared = 0.1927
Total | 262.748644 758 .346634095 Root MSE = .52899
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1359599 .0106523 12.76 0.000 .1150483 .1568715
exper | .1228023 .0374959 3.28 0.001 .0491936 .1964109
expersq | -.0037683 .0016924 -2.23 0.026 -.0070906 -.000446
_cons | -.1920207 .2907003 -0.66 0.509 -.7626977 .3786563
------------------------------------------------------------------------------
. sum exper
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
exper | 759 10.70619 2.991078 2 19
. di .123 - 2*10*.0038
.047
. di (.123*11 - .0038*(11^2)) - (.123*10 - .0038*(10^2))
.0432

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 27 / 64
. di .123 /(2*.0038)
16.184211
. tab exper
potential |
experience | Freq. Percent Cum.
------------+-----------------------------------
2 | 2 0.26 0.26
3 | 2 0.26 0.53
4 | 6 0.79 1.32
5 | 25 3.29 4.61
6 | 31 4.08 8.70
7 | 44 5.80 14.49
8 | 74 9.75 24.24
9 | 86 11.33 35.57
10 | 90 11.86 47.43
11 | 74 9.75 57.18
12 | 102 13.44 70.62
13 | 86 11.33 81.95
14 | 55 7.25 89.20
15 | 52 6.85 96.05
16 | 20 2.64 98.68
17 | 3 0.40 99.08
18 | 3 0.40 99.47
19 | 4 0.53 100.00
------------+-----------------------------------
Total | 759 100.00

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 27 / 64
2.6
lwageh

2.4
2.2
2
1.8
1.6

5 10 16.2 20
exper

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 28 / 64
The quadratic turns at about exper = .123/[2 (.0038)] 16.2. But
fewer than 2% of the observations have exper > 16, so not much worry.
Because the quadratic model is more complicated to interepret and
explain, should make sure there is good statistical evidence for keeping x 2
(exper 2 ) in the equation.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 29 / 64
So, we test the null that the equation is linear in exper against the
alternative that it is quadratic:

H0 : βexper 2 = 0
H0 : βexper 2 6= 0

Important: βexper is not restricted. So we use a t test on exper 2 .

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 30 / 64
The t in this example is 2.23 with two-sided p-value = .026, so we
reject H0 at the 5% level. It seems a good idea to keep exper 2 in the
equation.
We already know that exper a¤ects lwage. But if we did want to test

H0 : exper has no e¤ect on lwage


H0 : exper does have an e¤ect on lwage

this would be

H0 : βexper = 0, βexper 2 = 0.
We would use an F test. But usually, if the quadratic term is insigni…cant,
we go back to a linear model.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 31 / 64
EXAMPLE: E¤ects of working on …nal exam performance.
([Link])
The estimated equation is

d
…nal = 11.56 .111 work + .0000034work 2 + 6.79 msugpa + .617 act
(2.66 ) (.091 ) (.0031 ) (0.71 ) (.110 )
2
n = 414, R = .326

The relationship between …nald and work is negative, but the quadratic
means it is becoming less negative as work increases. But the coe¢ cient is
puny and its t statistic is essentially zero.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 32 / 64
The t on work is also not signi…cant: t 1.21. This is another good
example of how looking at two t statistics can be a bad idea.
The joint F for work and work 2 gives p-value = .0093, so we reject the
null at the 1% level.
Best course of action is to drop work 2 . Then work becomes very
statistically signi…cant.
work 2 is not a linear function of work, but they are highly correlated in
this case.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 33 / 64
. reg final work worksq msugpa act
Source | SS df MS Number of obs = 414
-------------+------------------------------ F( 4, 409) = 49.47
Model | 10204.1696 4 2551.0424 Prob > F = 0.0000
Residual | 21092.6903 409 51.5713699 R-squared = 0.3260
-------------+------------------------------ Adj R-squared = 0.3195
Total | 31296.8599 413 75.7793218 Root MSE = 7.1813
------------------------------------------------------------------------------
final | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
work | -.1105152 .0912818 -1.21 0.227 -.2899552 .0689248
worksq | 3.44e-06 .0031071 0.00 0.999 -.0061045 .0061114
msugpa | 6.791875 .7100899 9.56 0.000 5.395993 8.187756
act | .616782 .1095984 5.63 0.000 .4013356 .8322284
_cons | 11.56135 2.657652 4.35 0.000 6.336992 16.78572
------------------------------------------------------------------------------
. test work worksq
( 1) work = 0
( 2) worksq = 0
F( 2, 409) = 4.73
Prob > F = 0.0093

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 33 / 64
. reg final work msugpa act
Source | SS df MS Number of obs = 414
-------------+------------------------------ F( 3, 410) = 66.12
Model | 10204.1696 3 3401.38985 Prob > F = 0.0000
Residual | 21092.6903 410 51.4455862 R-squared = 0.3260
-------------+------------------------------ Adj R-squared = 0.3211
Total | 31296.8599 413 75.7793218 Root MSE = 7.1726
------------------------------------------------------------------------------
final | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
work | -.1104224 .0358544 -3.08 0.002 -.1809038 -.0399409
msugpa | 6.791776 .7035717 9.65 0.000 5.408718 8.174833
act | .616783 .1094611 5.63 0.000 .4016079 .831958
_cons | 11.56137 2.654375 4.36 0.000 6.343488 16.77925
------------------------------------------------------------------------------
. corr work worksq
(obs=470)
| work worksq
-------------+------------------
work | 1.0000
worksq | 0.9205 1.0000

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 33 / 64
Models with Interaction Terms
Suppose we have two explanatory variables and start with the usual
model:

y = β0 + β1 x1 + β2 x2 + u
The partial e¤ect of x1 on y is β1 and the PE of x2 on y is β2 . Even if
we add quadratics, the model has a potentially important restriction: the
partial e¤ect of x1 never depends on x2 , and vice versa.
Sometimes it is natural to think the partial e¤ect of one variable, say
education, could depend on the level of another variable, say intelligence.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 34 / 64
We add an interaction term, x1 x2 , to the usual model:

y = β0 + β1 x1 + β2 x2 + β3 x1 x2 + u
Holding x2 (and u) …xed, the partial e¤ect of x1 on y is now

∆y
= β1 + β3 x2
∆x1
so that e¤ect of x1 depends on x2 unless β3 = 0
Similarly,

∆y
= β2 + β3 x1
∆x2

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 35 / 64
The null hypothesis H0 : β3 = 0 is the same as saying the partial e¤ects
are constant. It should be tested. If it cannot be rejected at a small
signi…cance level (smaller than 5%), it is not worth complicating the
model.
It is easy to get confused in models with interactions. From

∆y
= β1 + β3 x2
∆x1
we see that β1 is now the PE of x1 on y when x2 = 0. But x2 = 0 may be
very far from a legitimate, or at least interesting, part of the population. A
similar comment holds for β1 .

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 36 / 64
Two interesting parameters are the PEs evaluated at the mean of the
other variable:

δ1 = β1 + β3 µ2
δ2 = β2 + β3 µ1

We will estimate these as

δ̂1 = β̂1 + β̂3 x̄2


δ̂2 = β̂2 + β̂3 x̄1

where x̄2 and x̄1 are the sample averages and the β̂j are the OLS estimates.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 37 / 64
It is often easier to let a regression compute δ̂1 and δ̂2 , and to get
appropriate standard errors. We can write in the population

y = α0 + δ1 x1 + δ2 x2 + β3 (x1 µ1 )(x2 µ2 ) + u
In other words, we subtract the means before creating the interactions.
The regression we run is

yi on xi 1 , xi 2 , (xi 1 x̄1 )(xi 2 x̄2 ), i = 1, ..., n


The intercept changes, too, but this is not important. The coe¢ cient on
the interaction does not change.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 38 / 64
EXAMPLE: Do education and IQ have an interactive e¤ect in the
log(wage ) equation? ([Link])

lwage = β0 + β1 educ + β2 IQ + β3 educ IQ + u


First estimate the model where the interaction is educ IQ – that is, the
means have not been removed before creating the interaction.
Then, use (educ 13.23) (IQ 100.06).

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 39 / 64
. reg lwage educ IQ
Source | SS df MS Number of obs = 759
-------------+------------------------------ F( 2, 756) = 88.30
Model | 49.7567642 2 24.8783821 Prob > F = 0.0000
Residual | 212.99188 756 .281735291 R-squared = 0.1894
-------------+------------------------------ Adj R-squared = 0.1872
Total | 262.748644 758 .346634095 Root MSE = .53079
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .0728327 .0097582 7.46 0.000 .0536763 .0919892
IQ | .0076329 .0016143 4.73 0.000 .0044639 .0108019
_cons | .7284623 .1386158 5.26 0.000 .4563446 1.00058
------------------------------------------------------------------------------

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 39 / 64
. reg lwage educ IQ educIQ
Source | SS df MS Number of obs = 759
-------------+------------------------------ F( 3, 755) = 60.52
Model | 50.9344189 3 16.9781396 Prob > F = 0.0000
Residual | 211.814225 755 .280548643 R-squared = 0.1939
-------------+------------------------------ Adj R-squared = 0.1906
Total | 262.748644 758 .346634095 Root MSE = .52967
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1950672 .0604502 3.23 0.001 .0763967 .3137378
IQ | .0221114 .007248 3.05 0.002 .0078828 .0363399
educIQ | -.0011721 .0005721 -2.05 0.041 -.0022951 -.000049
_cons | -.7622374 .7406194 -1.03 0.304 -2.216156 .6916807
------------------------------------------------------------------------------
. sum educ IQ
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
educ | 759 13.22925 2.411524 8 20
IQ | 759 100.0646 14.57751 54 131

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 39 / 64
. reg lwage educ IQ educ0IQ0
Source | SS df MS Number of obs = 759
-------------+------------------------------ F( 3, 755) = 60.52
Model | 50.9344189 3 16.9781396 Prob > F = 0.0000
Residual | 211.814225 755 .280548643 R-squared = 0.1939
-------------+------------------------------ Adj R-squared = 0.1906
Total | 262.748644 758 .346634095 Root MSE = .52967
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .0777905 .0100338 7.75 0.000 .058093 .097488
IQ | .0066049 .0016872 3.91 0.000 .0032928 .0099171
educ0IQ0 | -.0011721 .0005721 -2.05 0.041 -.0022951 -.000049
_cons | .7893339 .1414784 5.58 0.000 .5115962 1.067072
------------------------------------------------------------------------------
. di .078 - .0012*10
.066
. di .078 + .0012*10
.09

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 39 / 64
The interaction term is marginally statistically signi…cant (p-value =
.041). The …rst set of coe¢ cients is hard to interpret: “the return to
education for someone with IQ = 0 is about 19.5%.” The IQ coe¢ cient is
the e¤ect at educ = 0.
Estimates when centering has been done are much closer to the
regression without the interaction. For example, for the average IQ (about
100), the return to education is about 7.8%.
The negative coe¢ cient on the interaction means that the return to
education is higher for people with lower ability.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 40 / 64
The estimated partial e¤ect of educ is

.078 .0012(IQ 100.06)


So, if IQ is say, 10 above its mean value, the return to education is
.0078 .012 = .066, or 6.6%.
For someone with IQ 10 points below the mean, the return to education
is .0078 + .012 = .090, or 9.0%.
If we can believe these estimates, schooling is worth more for those with
lower intelligence.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 41 / 64
3. More on Goodness-of-Fit and Selection of Regressors
We now discuss topics related to choosing regressors to include in
models. First we discuss the …nal statistic in the Stata output, the so
called adjusted R-squared.
Ajusted R-Squared
The motivation for computing a di¤erent goodness-of-…t measure is that
the usual R 2 can never go down, and usually increases, when one or more
variables is added to a regression. (This assumes we do not lose any
observations when adding additional variables.)

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 42 / 64
We know how to decide whether to include a single new explanatory
variable: the t test.
For a group of new variables, we use the F test. This depends on the
R-squareds from the restricted and unrestriced models.
Sometimes we want to compare across models that have di¤erent
numbers of explanatory variables but where one is not a special case of the
other. It is useful to have a goodness-of-…t measure that penalizes adding
additional explanatory variables. (The usual R 2 has no penalty.)

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 43 / 64
As usual, start with

y = β0 + β1 x1 + ... + βk xk + u
Now we need to be more careful with variance labels:

σ2y = Var (y )
σ2u = Var (u )

De…ne

σ2u
ρ2 = 1
σ2y
This is the population R-squared – the amount of population variation in
y explained by x1 , ..., xk .

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 44 / 64
The formula for the R 2 can be written as
SSR (SSR/n)
R2 = 1 =1 ,
SST (SST /n)

which shows we can think of R 2 as using SSR/n to estimate σ2u and


SST /n to estimate σ2y . These are consistent but not unbiased estimators.
Instead, use

SSR/(n k 1)
SST /(n 1)

as the unbiased estimators.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 45 / 64
Plugging in gives the adjusted R-squared, also called “R-bar-squared”:

[SSR/(n k 1)]
R̄ 2 = 1
[SST /(n 1)]
σ̂2
= 1
[SST /(n 1)]

where σ̂2 is the usual variance parameter estimator.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 46 / 64
When more regressors are added, SSR falls, but so does df = n k 1.
R̄ 2 can increase or decrease.
For k 1, R̄ 2 < R 2 unless SSR = 0 (not an interesting case). In
addition, it is possible that R̄ 2 < 0, especially if df is small. Remember
that R 2 0 always.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 47 / 64
More algebraic facts:
1. If a single variable is added to a regression, R̄ 2 increases if and only if
the absolute t statistic of the new variable is greater than one. (Note this
value does not correspond to any commonly used critical value.)
2. If two or more variables are added to a regression, R̄ 2 increases if and
only if the F statistic for joint signi…cance of the new variables is greater
than one. (Again, the value one is a curiosity; it does not correspond to
commonly used critical values for F tests.)

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 48 / 64
Sometimes useful to write

(n 1)
R̄ 2 = 1 (1 R2)
(n k 1)
Important: In the R-squared form of the F statistic that we covered, it
is the usual R-squared, not the adjusted R-squared, that appears.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 49 / 64
Controlling for Too Many Factors in Regression
We have not emphasized goodness of …t because focusing on making R 2
or R̄ 2 as large as possible can lead to silly mistakes.
Need to remember the ceteris paribus interpretation of regression.
Sometimes it does not make sense to hold other factors …xed when
studying the e¤ect of a particular variable.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 50 / 64
EXAMPLE: E¤ects of spending on math pass rates. ([Link])
If want the e¤ects of total spending per student – the idea being we let
districts decide how to allocate resources – the estimated e¤ect is
nontrivial. Spending is in logs, so

∆math4
\ = (11.88/100)%∆spend .119(%∆spend )
\ increases by
So, if spending increases by 10%, holding lunch …xed, math4
about 1.2 percentage points.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 51 / 64
Now suppose we add the student-teacher ratio, str , and the log of the
average teacher salary, lavgsal.
The coe¢ cient on lspend is actually negative (but not statistically
di¤erent from zero). Does spending no longer matter?
The problem is that part of spending goes toward lowering
student-teacher ratios and increasing salaries. Once we hold those …xed,
the role for spending is limited (evidently). We seem to be …nding that
spending other than to lower str or increase avgsal has no e¤ect on
performance. But this does not mean total spending has no e¤ect.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 52 / 64
. reg math4 lspend lunch
Source | SS df MS Number of obs = 923
-------------+------------------------------ F( 2, 920) = 126.32
Model | 77177.02 2 38588.51 Prob > F = 0.0000
Residual | 281045.68 920 305.484435 R-squared = 0.2154
-------------+------------------------------ Adj R-squared = 0.2137
Total | 358222.7 922 388.527874 Root MSE = 17.478
------------------------------------------------------------------------------
math4 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lspend | 11.88045 3.331708 3.57 0.000 5.341826 18.41908
lunch | -.3506483 .0221548 -15.83 0.000 -.3941281 -.3071686
_cons | -25.5355 27.72014 -0.92 0.357 -79.93755 28.86655
------------------------------------------------------------------------------

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 52 / 64
. reg math4 lspend lunch str lavgsal
Source | SS df MS Number of obs = 923
-------------+------------------------------ F( 4, 918) = 69.80
Model | 83536.8357 4 20884.2089 Prob > F = 0.0000
Residual | 274685.864 918 299.222074 R-squared = 0.2332
-------------+------------------------------ Adj R-squared = 0.2299
Total | 358222.7 922 388.527874 Root MSE = 17.298
------------------------------------------------------------------------------
math4 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lspend | -2.91284 4.62719 -0.63 0.529 -11.99394 6.168258
lunch | -.2808465 .0266621 -10.53 0.000 -.3331722 -.2285207
str | -.9392168 .2323314 -4.04 0.000 -1.395179 -.4832545
lavgsal | 20.5769 4.658001 4.42 0.000 11.43533 29.71847
_cons | -103.825 38.54951 -2.69 0.007 -179.4804 -28.16964
------------------------------------------------------------------------------

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 53 / 64
It is tempting to over control because often the goodness-of-…t
statistics increase substantially. Continuing with the previous example,
suppose we go back to just spending but add the 4th grade reading score.
The adjusted R-squared goes from .214 to .645 – a huge increase. The
lspend coe¢ cient goes from 11.88 to 3.44, and it not signi…cant at the
10% level.
But why should we hold read4 …xed when looking at the e¤ects of
spending on math4? We want to allow spending to improve reading pass
rates, too.
read4 can be a dependent variable in its own regression.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 53 / 64
. reg math4 lspend lunch read4
Source | SS df MS Number of obs = 923
-------------+------------------------------ F( 3, 919) = 558.47
Model | 231332.167 3 77110.7225 Prob > F = 0.0000
Residual | 126890.532 919 138.074573 R-squared = 0.6458
-------------+------------------------------ Adj R-squared = 0.6446
Total | 358222.7 922 388.527874 Root MSE = 11.751
------------------------------------------------------------------------------
math4 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lspend | 3.443386 2.25409 1.53 0.127 -.9803767 7.867148
lunch | -.122326 .0163873 -7.46 0.000 -.1544868 -.0901651
read4 | .7784602 .0232978 33.41 0.000 .7327372 .8241832
_cons | -1.562773 18.65002 -0.08 0.933 -38.16435 35.03881
------------------------------------------------------------------------------

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 53 / 64
4. Prediction Intervals and Residual Analysis
We now consider two topics that are more specialized, but are used a lot
practice. For example, someone working in an admissions o¢ ce might
want to compute a con…dence interval for performance in college, given
observable characteristics in high school.
Residual analysis is useful for determining which units in a population
are performing better or worse than expected based on their covariates.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 54 / 64
Prediction Intervals
There are two kinds of intervals we can be interested in. The …rst is just
a con…dence interval for E (y jx1 , x2 , ..., xk ) given speci…c values of the xj .
Suppose those values are x1 = c1 , x2 = c2 , ..., xk = ck . Then we want
to estimate, and …nd a CI, for

θ 0 = β0 + β1 c1 + ... + βk ck
The estimate is just

θ̂ 0 = β̂0 + β̂1 c1 + ... + β̂k ck

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 55 / 64
There are some computational tricks for getting the standard error of θ̂ 0 ,
but Stata’s “lincom” command is the simplest way.
Use the data in [Link].

colGPA = β0 + β1 hsGPA + β2 ACT + β3 skipped + u


Get the 95% CI for the prediction when hsGPA = 3, ACT = 25, and
skipped = 0.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 56 / 64
. sum hsGPA ACT skipped
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
hsGPA | 141 3.402128 .3199259 2.4 4
ACT | 141 24.15603 2.844252 16 33
skipped | 141 1.076241 1.088882 0 5

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 56 / 64
. reg colGPA hsGPA ACT skipped
Source | SS df MS Number of obs = 141
-------------+------------------------------ F( 3, 137) = 13.92
Model | 4.53313314 3 1.51104438 Prob > F = 0.0000
Residual | 14.8729663 137 .108561798 R-squared = 0.2336
-------------+------------------------------ Adj R-squared = 0.2168
Total | 19.4060994 140 .138614996 Root MSE = .32949
------------------------------------------------------------------------------
colGPA | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hsGPA | .4118162 .0936742 4.40 0.000 .2265819 .5970505
ACT | .0147202 .0105649 1.39 0.166 -.0061711 .0356115
skipped | -.0831131 .0259985 -3.20 0.002 -.1345234 -.0317028
_cons | 1.389554 .3315535 4.19 0.000 .7339295 2.045178
------------------------------------------------------------------------------
. lincom _cons + hsGPA*3 + ACT*25
( 1) 3 hsGPA + 25 ACT + _cons = 0
------------------------------------------------------------------------------
colGPA | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 2.993008 .060535 49.44 0.000 2.873304 3.112712
------------------------------------------------------------------------------

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 56 / 64
The prediction is 2.99. But we want the CI, which goes from 2.87 to
3.11. This is fairly tight, but this is just for the average colGPA for those
with hsGPA = 3, ACT = 25, and skipped = 0.
Now suppose skipped = 2:

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 57 / 64
. lincom _cons + hsGPA*3 + ACT*25 + skipped*2
( 1) 3 hsGPA + 25 ACT + 2 skipped + _cons = 0
------------------------------------------------------------------------------
colGPA | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 2.826782 .0526146 53.73 0.000 2.72274 2.930824
------------------------------------------------------------------------------

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 57 / 64
A prediction interval (PI) accounts for the uncertainty in u in addition
to the estimation error in the β̂j . That is, we want in interval for the (yet
unseen) outcome

y 0 = β0 + β1 x10 + ... + βk xk0 + u 0


at the values xj0 , also recognizing the important term, u 0 .
This is one case where normality of u is critical. If u is not normal, we
cannot know how to construct a 95% PI.

ŷ 0 = β̂0 + β̂1 x10 + ... + β̂k xk0

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 58 / 64
The prediction error is

ê 0 = y 0 ŷ 0
In addtion to Var (ŷ 0 ) – the estimated mean at the given values – we
must account for Var (u 0 ).

Var (ê 0 ) = Var (ŷ 0 ) + Var (u 0 )


se (ê 0 ) = f[se (ŷ 0 )]2 + σ̂2 g1/2

where se (ŷ 0 ) is obtained as before and σ̂2 is the usual variance estimator.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 59 / 64
In most cases, except for very small sample sizes, σ̂2 is by far theplargest
component. Remember that se (ŷ 0 ) shrinks to zero at the rate 1/ n,
while σ̂2 is the estimated variance for a single draw.
Let t.025 be the 97.5 percentile in the tn k 1 distribution. (So the 5%
critical value for a two-tailed test.) Then the 95% is

ŷ 0 t.025 se (ê 0 )
Except for small df , a good rule of thumb is ŷ 0 2se (ê 0 )

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 60 / 64
1. We obtain the predicted value, ŷ 0 , and its standard error, se (ŷ 0 ), as
before (using lincom in Stata).
2. We compute se (ê 0 ) = f[se (ŷ 0 )]2 + σ̂2 g1/2 .
3. We do the same calculation as a 95% CI, using se (ê 0 ) as the standard
error.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 61 / 64
. reg colGPA hsGPA ACT skipped
Source | SS df MS Number of obs = 141
-------------+------------------------------ F( 3, 137) = 13.92
Model | 4.53313314 3 1.51104438 Prob > F = 0.0000
Residual | 14.8729663 137 .108561798 R-squared = 0.2336
-------------+------------------------------ Adj R-squared = 0.2168
Total | 19.4060994 140 .138614996 Root MSE = .32949
. lincom _cons + hsGPA*3 + ACT*25
( 1) 3 hsGPA + 25 ACT + _cons = 0
------------------------------------------------------------------------------
colGPA | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 2.993008 .060535 49.44 0.000 2.873304 3.112712
------------------------------------------------------------------------------

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 61 / 64
. di sqrt(.061^2 + .330^2)
.33559052
. * The 95% PI is
. di 2.99 - 1.96*.336
2.33144
. di 2.99 + 1.96*.336
3.64856
. * or about 2.33 to 3.65. Because this is for an individual whose
. * unobservables we do not observe, the interval is much wider than for
. * the average (whose unobservables, we assume, average to zero)

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 61 / 64
. * For skipped = 2:
. lincom _cons + hsGPA*3 + ACT*25 + skipped*2
( 1) 3 hsGPA + 25 ACT + 2 skipped + _cons = 0
------------------------------------------------------------------------------
colGPA | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 2.826782 .0526146 53.73 0.000 2.72274 2.930824
------------------------------------------------------------------------------
. di sqrt(.053^2 + .330^2)
.33422896
. di 2.83 - 1.96*.334
2.17536
. di 2.83 + 1.96*.334
3.48464
. * The interval is shifted to the left, and still very wide: about 2.18
. * to 3.48.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 61 / 64
Residual Analysis
Sometimes we want to know how particular units actually fair compared
with their predicted outcome based on the explanatory variables we
observe.
Consider predict school performance. Given demographic information on
the students and resources used, is a school doing better or worse than
predicted?
It is known that students from disadvantage backgrounds do worse, and
some evidence school inputs also a¤ect outcomes.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 62 / 64
So do not just look a raw scores, but adjust them for observable features.
The residual can be viewed as an estmimate of the “value added.”

ûi = yi ŷi
ûi > 0 =) yi > ŷi
ûi < 0 =) yi < ŷi

Caveats: (1) The residuals have to average to zero, so there will be


some positive and negative residuals no matter what. (2) The residual
depends on the estimates, β̂j . (3) Even if we know the βj , ui is just a
single draw. (4) We may be missing important predictors of y that are
then in the error term.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 63 / 64
. reg math4 lunch lenrol lspend
Source | SS df MS Number of obs = 923
-------------+------------------------------ F( 3, 919) = 86.28
Model | 78722.1856 3 26240.7285 Prob > F = 0.0000
Residual | 279500.514 919 304.135489 R-squared = 0.2198
-------------+------------------------------ Adj R-squared = 0.2172
Total | 358222.7 922 388.527874 Root MSE = 17.439
------------------------------------------------------------------------------
math4 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lunch | -.3525065 .0221212 -15.94 0.000 -.3959203 -.3090926
lenrol | 2.731779 1.211969 2.25 0.024 .3532304 5.110327
lspend | 13.7296 3.424074 4.01 0.000 7.009683 20.44951
_cons | -57.02743 30.98739 -1.84 0.066 -117.8417 3.786824
------------------------------------------------------------------------------
. predict mat4h
(option xb assumed; fitted values)
. predict uh, resid

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 63 / 64
. list math4 math4h uh in 1/10
+------------------------------+
| math4 math4h uh |
|------------------------------|
1. | 90 55.47598 34.52402 |
2. | 60 65.84306 -5.843058 |
3. | 62 66.37138 -4.371378 |
4. | 72 65.4595 6.540503 |
5. | 79 66.29292 12.70709 |
|------------------------------|
6. | 43 60.59083 -17.59083 |
7. | 82 65.59301 16.40699 |
8. | 43 71.4086 -28.4086 |
9. | 65 62.18219 2.817815 |
10. | 73 63.48935 9.510655 |
+------------------------------+

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 63 / 64
The predicted values vary much less than the actual outcomes (the fairly
low R-squared, .220, implies this).
Consider schools 3 and 4. Their predicted outcomes (based on lunch,
lenrol, lspend) are close, about 66.4 and 65.5, respectively. But the actual
outcome for school 4, 72, is 10 points higher than school 3, 62.
With only a few explanatory variables, residual analysis can be
misleading. We may be missing important factors that, if controlled for,
could dramatically change the rankings.

Sheng-Kai Chang (NTU) Econ 2023 Multiple Regression Models: Functional Forms Spring 2025 64 / 64

You might also like