0% found this document useful (0 votes)
4 views36 pages

Regression Output Analysis

The linear regression analysis examines the relationship between various independent variables and the dependent variable pr1, revealing that pr23 (bitcoin's closing price), cpi (inflation index), and ffr (US federal funds rate) are significant predictors. Initial model diagnostics indicated issues with model specification, prompting further refinement by removing insignificant variables and adding new ones, which improved model fit. Ultimately, the final model demonstrated a high R-squared value, indicating it explains a substantial portion of the variation in pr1.

Uploaded by

hsaraabbas04
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views36 pages

Regression Output Analysis

The linear regression analysis examines the relationship between various independent variables and the dependent variable pr1, revealing that pr23 (bitcoin's closing price), cpi (inflation index), and ffr (US federal funds rate) are significant predictors. Initial model diagnostics indicated issues with model specification, prompting further refinement by removing insignificant variables and adding new ones, which improved model fit. Ultimately, the final model demonstrated a high R-squared value, indicating it explains a substantial portion of the variation in pr1.

Uploaded by

hsaraabbas04
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Linear Regression Output Analysis

. tsset date
time variable: date, 1/1/2021 to 7/31/2025
delta: 1 day

. reg pr1 pr1vol pr17 pr23 pr35 cpi ffr

Source SS df MS Number of obs = 1,127


F(6, 1120) = 1557.94
Model 500630527 6 83438421.2 Prob > F = 0.0000
Residual 59983716.7 1,120 53556.8899 R-squared = 0.8930
Adj R-squared = 0.8924
Total 560614244 1,126 497881.211 Root MSE = 231.42

pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

pr1vol 2.05e-06 2.63e-06 0.78 0.436 -3.11e-06 7.21e-06


pr17 -.0751559 .0521459 -1.44 0.150 -.1774705 .0271588
pr23 .0208619 .0006301 33.11 0.000 .0196256 .0220981
pr35 -2.172655 1.093585 -1.99 0.047 -4.318361 -.0269491
cpi 19.53202 2.207843 8.85 0.000 15.20005 23.864
ffr -27.13639 13.03451 -2.08 0.038 -52.7112 -1.56158
_cons -1774.277 475.9955 -3.73 0.000 -2708.22 -840.3338

Initial Results of the Regression model without checking assumptions:


The number of common observations among all selected variables are 1,127.
The F-value is way above 10 meaning the overall structure of the model is very good
with, generally with a good choice of independent variables.
Probability is zero meaning the model is statistically significant as a whole.
R-squared is 89% meaning, the model explains 89% of the variation in the dependent
variable pr1. Adj R-squared confirms this as it considers all indep. variables.
The coefficients show how each independent variable influences the dependent
variable.
Pr1vol: “1 unit increase in pr1vol will increase pr1 by 0.00000205 units assuming all
other variable are held constant.” The associated t- value is less than 1.67 and p-value
is greater than 0.05 (far from zero) indicating that pr1vol (trading volume of S&P500) is
an insignificant variable in the model.
Pr17: “ 1 unit increase in pr17 decreases pr1 by 0.075 units assuming all other variable
are held constant.” Absolute value of t=1.44 which is less than 1.67 and p-value=0.15
which is more than 0.05 (far from zero) indicates that pr17 (gold’s closing price) is not
an important variable in this model.

1|Page
Pr23: “1 unit increase in pr23 increases pr1 by 0.02 units assuming all other variable are
held constant.”” The t-value= 33.11 which is way higher that 1.67 and the p-value is
equal to zero indicating that pr23 (bitcoin’s closing price) is a very significant variable in
this model.
Pr35: “1 unit increase in pr35 decreases pr1 by 2.17 units, assuming all other variables
are held constant.” The t-value=1.99 which is greater than 1.67 and p-value=0.047
which is less than 0.05 (closer to zero) indicating that pr35 (crude oil’s closing price) is
an important variable in this model.
Cpi: “1 unit increase in cpi increases pr1 by 19.5 units, assuming all other variables are
held constant.” The t-value = 8.85 which is greater than 1.67 and p-value = 0 indicating
that inflation index is a significant factor in this model.
Ffr: “1 unit increase in ffr decreases pr1 by 27 units, assuming all other variables are
held constant.” The t-value= 2.08 which is greater than 1.67 and p-value= 0.038 which
is closer to zero indicating that US federal funds rates plays a significant role in this
regression model.
_cons: This is the y-int of the model, the value of pr1 when all independent variables are
zero.

Assumptions
1. Linearity/ Model Specification
Ho: There are no omitted variables in the regression model, the model is well-specified.
H1: There are some omitted variables in the regression model, the model is
misspecified.
Tests
1) Ramsey RESET test
. est store OLS

. estat ovtest

Ramsey RESET test using powers of the fitted values of pr1


Ho: model has no omitted variables
F(3, 1117) = 110.63
Prob > F = 0.0000

Interpretation: P-value
Probability = 0 we must reject the Null Hypothesis. The alterative hypothesis is true,
meaning the model has some omitted variables that needs to be improved as variables
likely miss powers, interactions, or functional form. Any conclusion made on this model
would be biased, distorted, and inaccurate inferences would be made.

2|Page
2) Link test
Ho: There are no omitted variables in the regression model, the model is well-specified.
H1: There are some omitted variables in the regression model, the model is
misspecified.
. linktest

Source SS df MS Number of obs = 1,127


F(2, 1124) = 5310.81
Model 506966150 2 253483075 Prob > F = 0.0000
Residual 53648093.6 1,124 47729.6206 R-squared = 0.9043
Adj R-squared = 0.9041
Total 560614244 1,126 497881.211 Root MSE = 218.47

pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

_hat 2.903715 .1655229 17.54 0.000 2.578946 3.228483


_hatsq -.0001907 .0000166 -11.52 0.000 -.0002232 -.0001582
_cons -4645.852 405.8539 -11.45 0.000 -5442.169 -3849.536

Interpretation: _hatsq
The _hatsq has an absolute t-value of 11.52 which is greater than 1.67 and a p-value=0.
These results also indicated that we must reject the null hypothesis. The alternative
hypothesis is true, and the model is misspecified. Inferences made on this model would
be distorted and inaccurate.
3) Residuals plot: residuals are the error terms left unexplained by the model
Ho: There are no omitted variables in the regression model, the model is well-specified.
H1: There are some omitted variables in the regression model, the model is
misspecified.
500
0
Residuals

-500
-1000

4000 5000 6000 7000


Fitted values

3|Page
The residuals are symmetrically distributed in the first section; however second & third parts are
not symmetrical above and below zero. So, since there is some structure/curvature, we reject
the null hypothesis. The Alternative Hypothesis is true as the model is misspecified.
According to all three tests, the model is not well-specified, non-linearity is proved and any
conclusions made from this model would be distorted and biased and inferences would be
inaccurate.

Remedies
1) Adding/removing variables
. reg pr1 pr17 pr23 pr35 cpi ffr sent

Source SS df MS Number of obs = 1,107


F(6, 1100) = 2015.81
Model 478829494 6 79804915.7 Prob > F = 0.0000
Residual 43548415.1 1,100 39589.4683 R-squared = 0.9166
Adj R-squared = 0.9162
Total 522377909 1,106 472312.757 Root MSE = 198.97

pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

pr17 .215239 .0482553 4.46 0.000 .1205562 .3099218


pr23 .0113691 .0007179 15.84 0.000 .0099605 .0127778
pr35 -2.995421 .9494848 -3.15 0.002 -4.858427 -1.132415
cpi 41.43276 2.234994 18.54 0.000 37.04742 45.81809
ffr -169.9609 13.41389 -12.67 0.000 -196.2806 -143.6412
sent 23.20333 1.171719 19.80 0.000 20.90427 25.50238
_cons -9500.752 571.6217 -16.62 0.000 -10622.34 -8379.16

By removing the insignificant variable “pr1vol” (traded volume) and adding “sent” (Consumer
sentiment index), the F significantly from 1557 to 2015 and the R-squared also shot up from
89% to 91% indicating that the model had improved. All the t-values and p-values of every
variable has also become significant.

4|Page
2) Including Polynomial or interaction terms
. gen pr17sq= pr17^2
(493 missing values generated)

. reg pr1 pr1vol pr17sq pr23 pr35 cpi ffr

Source SS df MS Number of obs = 1,127


F(6, 1120) = 1609.01
Model 502336423 6 83722737.2 Prob > F = 0.0000
Residual 58277820.3 1,120 52033.7681 R-squared = 0.8960
Adj R-squared = 0.8955
Total 560614244 1,126 497881.211 Root MSE = 228.11

pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

pr1vol 1.98e-06 2.59e-06 0.76 0.445 -3.11e-06 7.07e-06


pr17sq -.000058 9.82e-06 -5.91 0.000 -.0000773 -.0000388
pr23 .0218648 .0005854 37.35 0.000 .0207163 .0230134
pr35 -5.097492 1.109232 -4.60 0.000 -7.273899 -2.921086
cpi 26.57953 2.224097 11.95 0.000 22.21566 30.94339
ffr -62.55884 13.44199 -4.65 0.000 -88.93316 -36.18451
_cons -3477.363 516.3865 -6.73 0.000 -4490.556 -2464.169

The F has only slightly increased from1557 to 1609 and all variables except pr1vol have
become significant to the model.
. reg pr1 pr17sq pr23 pr35 cpi ffr

Source SS df MS Number of obs = 1,127


F(5, 1121) = 1931.41
Model 502306089 5 100461218 Prob > F = 0.0000
Residual 58308154.7 1,121 52014.411 R-squared = 0.8960
Adj R-squared = 0.8955
Total 560614244 1,126 497881.211 Root MSE = 228.07

pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

pr17sq -.0000581 9.82e-06 -5.92 0.000 -.0000773 -.0000388


pr23 .0218682 .0005852 37.37 0.000 .0207199 .0230165
pr35 -5.13373 1.10801 -4.63 0.000 -7.307736 -2.959723
cpi 26.61905 2.223081 11.97 0.000 22.25718 30.98092
ffr -62.91643 13.43133 -4.68 0.000 -89.26981 -36.56304
_cons -3474.144 516.2733 -6.73 0.000 -4487.115 -2461.173

Removing pr1vol increased the F more and all variables became significant to the model.

5|Page
. reg pr1 pr1vol pr17 pr23 pr35 cpi ffr c.pr1#[Link]

Source SS df MS Number of obs = 1,127


F(7, 1119) > 99999.00
Model 560253854 7 80036264.9 Prob > F = 0.0000
Residual 360389.542 1,119 322.063934 R-squared = 0.9994
Adj R-squared = 0.9994
Total 560614244 1,126 497881.211 Root MSE = 17.946

pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

pr1vol 7.45e-08 2.04e-07 0.36 0.715 -3.26e-07 4.75e-07


pr17 -.0624392 .0040439 -15.44 0.000 -.0703736 -.0545049
pr23 -.0004819 .0000696 -6.92 0.000 -.0006185 -.0003453
pr35 .1119708 .08497 1.32 0.188 -.0547477 .2786892
cpi -13.75049 .1878742 -73.19 0.000 -14.11911 -13.38186
ffr -4.024639 1.01221 -3.98 0.000 -6.010681 -2.038596

c.pr1#[Link] .0033376 7.76e-06 430.27 0.000 .0033223 .0033528

_cons 4253.434 39.48096 107.73 0.000 4175.969 4330.899

The interaction term c.pr1#[Link]  multiplied continuous pr1 with continuous cpi. This
interaction term has improved the model significantly (F is 99999 and R-squared is 99% of
variability explained in pr1). The t-value of this interaction term is also very high.

This is an interaction term between two continuous variables:

 c.pr1 → continuous version of pr1


 [Link] → continuous version of cpi

The c. prefix tells Stata: “These are continuous variables — do not create categorical (dummy)
interactions.”

The regression will estimate:

𝑌 = 𝛽0 + 𝛽1 pr1 + 𝛽2 cpi + 𝛽3 (pr1 ⋅ cpi) + 𝜖

Meaning:

The effect of pr1 on Y depends on the level of cpi and also:

The effect of cpi on Y depends on the level of pr1

So each variable’s marginal impact changes with the other.

6|Page
. gen log_pr35 = log( pr35)
(471 missing values generated)

. reg pr1 pr1vol pr17 pr23 log_pr35 cpi ffr

Source SS df MS Number of obs = 1,127


F(6, 1120) = 1551.86
Model 500420721 6 83403453.4 Prob > F = 0.0000
Residual 60193523 1,120 53744.217 R-squared = 0.8926
Adj R-squared = 0.8921
Total 560614244 1,126 497881.211 Root MSE = 231.83

pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

pr1vol 2.29e-06 2.63e-06 0.87 0.384 -2.88e-06 7.46e-06


pr17 -.0134535 .054863 -0.25 0.806 -.1210993 .0941924
pr23 .0212055 .0006239 33.99 0.000 .0199815 .0224296
log_pr35 15.6122 90.8464 0.17 0.864 -162.6361 193.8605
cpi 15.60404 2.291464 6.81 0.000 11.10799 20.10008
ffr -4.953489 13.04125 -0.38 0.704 -30.54153 20.63455
_cons -1055.569 323.8789 -3.26 0.001 -1691.047 -420.0916

Including the log of pr35 worsened the model.

. probit pr1 pr1vol pr17 pr23 log_pr35 cpi ffr

outcome does not vary; remember:


0 = negative outcome,
all other nonmissing values = positive outcome
r(2000);

Probit is for non-liner regression (logistic regression, which is for categorical outcomes, so it
doesn’t work on this model)

[Link] of Errors
Ho: There is no serial correlation/autocorrelation present in the data. The past values don’t
predict the present values.

7|Page
H1: There is serial correlation/autocorrelation present in the data. The present values tend to
depend on the past values.

Tests
1) Durbin-Watson – AR for only 1 lag period
. estat dwatson

Number of gaps in sample: 245

Durbin-Watson d-statistic( 7, 1127) = .0404429

The d-stat is 0.04 which between 0 to 2 indicating positive autocorrelation but it is very close to
zero, almost no autocorrelation between the independent variables.

2) Breusch-Godfrey – can specify more than 1 lag period


. estat bgodfrey, lags(1)

Number of gaps in sample: 245

Breusch-Godfrey LM test for autocorrelation

lags(p) chi2 df Prob > chi2

1 845.356 1 0.0000

H0: no serial correlation

P-value = 0. Reject Null Hypothesis. Accept Alternative Hypothesis indicating that there is some
autocorrelation among the variables and the previous values influence the present values.
. estat bgodfrey, lags(1/4)

Number of gaps in sample: 245

Breusch-Godfrey LM test for autocorrelation

lags(p) chi2 df Prob > chi2

1 845.356 1 0.0000
2 845.576 2 0.0000
3 949.908 3 0.0000
4 974.490 4 0.0000

H0: no serial correlation

This shows 4 lag periods and the probability of each one is 0, confirming that autocorrelation
exists in the dataset. Alt. Hypothesis is true and previous period’s values affect the current
period’s values and people are influenced by the past.

8|Page
Remedies
1) Adding lag of dependant variable
. reg pr1 L.pr1 pr1vol pr17 pr23 pr35 cpi ffr, vce(robust)

Linear regression Number of obs = 881


F(7, 873) = 24467.51
Prob > F = 0.0000
R-squared = 0.9946
Root MSE = 51.793

Robust
pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

pr1
L1. .971071 .0102461 94.78 0.000 .9509612 .9911808

pr1vol -1.51e-07 6.87e-07 -0.22 0.826 -1.50e-06 1.20e-06


pr17 -.0103441 .0138052 -0.75 0.454 -.0374394 .0167512
pr23 .000759 .0001968 3.86 0.000 .0003727 .0011453
pr35 -.1831498 .3531524 -0.52 0.604 -.8762767 .5099771
cpi .4913743 .7229101 0.68 0.497 -.9274707 1.910219
ffr .5553355 4.050776 0.14 0.891 -7.395061 8.505732
_cons -10.96678 144.2849 -0.08 0.939 -294.1526 272.219

Lag of 1 period of pr1 has significantly improved the model and the F-stat rose to 22,876 and R-
squared shot up 99%. The t-value and p-value of L.pr1 also shows that the previous values of
pr1 significantly influence the current period’s value of pr1.
. reg pr1 L(1/2).pr1 pr1vol pr17 pr23 pr35 cpi ffr, vce(robust)

Linear regression Number of obs = 644


F(8, 635) = 13469.24
Prob > F = 0.0000
R-squared = 0.9939
Root MSE = 54.808

Robust
pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

pr1
L1. .9299759 .0870672 10.68 0.000 .7590015 1.10095
L2. .0388219 .0842142 0.46 0.645 -.1265502 .204194

pr1vol -4.39e-07 8.35e-07 -0.53 0.600 -2.08e-06 1.20e-06


pr17 -.0148414 .0166373 -0.89 0.373 -.0475122 .0178294
pr23 .0008265 .000238 3.47 0.001 .0003591 .001294
pr35 -.2566515 .4388852 -0.58 0.559 -1.118493 .6051904
cpi .6096767 .8995358 0.68 0.498 -1.156748 2.376101
ffr .3945114 5.033191 0.08 0.938 -9.489201 10.27822
_cons -21.07642 179.8085 -0.12 0.907 -374.1676 332.0147

By adding 2 periods lag of pr1, the model only improved slightly as the F-stat is lower than lag of
one period. Further, the t-value and p-value show that only only the 1 period’s lag is significant

9|Page
and 2nd period lag is not. Meaning that only the previous period’s value influences pr1, not the
period before that.
. reg pr1 L.pr1 L(0/2).pr1vol L(0/1).pr35 pr17 pr23 cpi L(0/2).ffr, robust

Linear regression Number of obs = 881


F(12, 868) = 15156.21
Prob > F = 0.0000
R-squared = 0.9948
Root MSE = 51.097

Robust
pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

pr1
L1. .9688955 .0099521 97.36 0.000 .9493625 .9884286

pr1vol
--. 2.06e-08 6.69e-07 0.03 0.975 -1.29e-06 1.33e-06
L1. 1.18e-07 6.81e-07 0.17 0.862 -1.22e-06 1.46e-06
L2. -8.38e-07 7.11e-07 -1.18 0.239 -2.23e-06 5.58e-07

pr35
--. 3.716762 1.500399 2.48 0.013 .7719268 6.661597
L1. -4.044794 1.569698 -2.58 0.010 -7.125642 -.9639455

pr17 -.0149864 .0141174 -1.06 0.289 -.0426946 .0127218


pr23 .000756 .0001933 3.91 0.000 .0003767 .0011354
cpi .8377892 .7257497 1.15 0.249 -.5866403 2.262219

ffr
--. -88.81447 36.88618 -2.41 0.016 -161.211 -16.41793
L1. 91.45081 42.86266 2.13 0.033 7.324231 175.5774
L2. -3.697402 23.88526 -0.15 0.877 -50.57702 43.18222

_cons -74.62391 142.8536 -0.52 0.602 -355.0027 205.7549

Adding multiple lags to several different variables.

2) Newey-West or Cluster-robust standard errors


. newey pr1 L.pr1 pr1vol pr17 pr23 pr35 cpi ffr, lag(4)
date is not regularly spaced
r(198);

Didn’t work

10 | P a g e
3) Prais-Winsten
. prais pr1 pr1vol pr17 pr23 pr35 cpi ffr

Number of gaps in sample: 245


(note: computations for rho restarted at each gap)

Iteration 0: rho = 0.0000


Iteration 1: rho = 0.9781
Iteration 2: rho = 0.9820
Iteration 3: rho = 0.9823
Iteration 4: rho = 0.9823
Iteration 5: rho = 0.9823
Iteration 6: rho = 0.9823

Prais-Winsten AR(1) regression -- iterated estimates

Source SS df MS Number of obs = 1,127


F(6, 1120) = 9082.13
Model 125994211 6 20999035.1 Prob > F = 0.0000
Residual 2589581.15 1,120 2312.12603 R-squared = 0.9799
Adj R-squared = 0.9798
Total 128583792 1,126 114195.197 Root MSE = 48.085

pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

pr1vol -7.78e-08 4.40e-07 -0.18 0.860 -9.40e-07 7.85e-07


pr17 .2694597 .0522901 5.15 0.000 .1668621 .3720572
pr23 .0155405 .0007355 21.13 0.000 .0140975 .0169836
pr35 .9720767 .7835428 1.24 0.215 -.5653004 2.509454
cpi 15.60294 2.255966 6.92 0.000 11.17654 20.02933
ffr -23.6609 14.83973 -1.59 0.111 -52.77771 5.455905
_cons -1307.148 572.6658 -2.28 0.023 -2430.766 -183.5291

rho .9822682

Durbin-Watson statistic (original) 0.040443


Durbin-Watson statistic (transformed) 1.614785

Model improved according to F-stat and R-squared but the D-stat increased showing positive
autocorrelation.. why? This is supposed to be a fix for autocorrelation

11 | P a g e
4) Cochrane-Orcutt
. prais pr1 pr1vol pr17 pr23 pr35 cpi ffr,corc

Number of gaps in sample: 245


(note: computations for rho restarted at each gap)

Iteration 0: rho = 0.0000


Iteration 1: rho = 0.9781
Iteration 2: rho = 0.9889
Iteration 3: rho = 0.9936
Iteration 4: rho = 0.9955
Iteration 5: rho = 0.9962
Iteration 6: rho = 0.9965
Iteration 7: rho = 0.9966
Iteration 8: rho = 0.9967
Iteration 9: rho = 0.9967
Iteration 10: rho = 0.9968
Iteration 11: rho = 0.9968
Iteration 12: rho = 0.9968
Iteration 13: rho = 0.9968
Iteration 14: rho = 0.9968
Iteration 15: rho = 0.9968
Iteration 16: rho = 0.9968
Iteration 17: rho = 0.9968
Iteration 18: rho = 0.9968

Cochrane-Orcutt AR(1) regression -- iterated estimates

Source SS df MS Number of obs = 881


F(6, 874) = 26.90
Model 372487.939 6 62081.3232 Prob > F = 0.0000
Residual 2017013.3 874 2307.79554 R-squared = 0.1559
Adj R-squared = 0.1501
Total 2389501.24 880 2715.34232 Root MSE = 48.04

pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

pr1vol -2.01e-08 4.38e-07 -0.05 0.963 -8.79e-07 8.39e-07


pr17 .131606 .0741673 1.77 0.076 -.0139608 .2771728
pr23 .0112919 .000989 11.42 0.000 .0093507 .013233
pr35 2.998696 .9066227 3.31 0.001 1.219284 4.778108
cpi -3.034908 6.314241 -0.48 0.631 -15.42776 9.35794
ffr -76.67128 25.95728 -2.95 0.003 -127.6172 -25.72539
_cons 5417.637 2019.016 2.68 0.007 1454.951 9380.323

rho .9968243

Durbin-Watson statistic (original) 0.040443


Durbin-Watson statistic (transformed) 1.584615

Overall model worsened and d-stat lower than prais probably bcz first observation got dropped

12 | P a g e
5) ARIMA
ARIMA regression

Sample: 1/4/2021 - 6/30/2025, but with gaps Number of obs = 1127


Wald chi2(7) = 207146.34
Log likelihood = -6004.74 Prob > chi2 = 0.0000

OPG
pr1 Coef. Std. Err. z P>|z| [95% Conf. Interval]

pr1
pr1vol -2.89e-08 3.86e-07 -0.07 0.940 -7.85e-07 7.27e-07
pr17 .0978586 .0431353 2.27 0.023 .013315 .1824021
pr23 .0103066 .000669 15.41 0.000 .0089954 .0116177
pr35 3.105158 .6061175 5.12 0.000 1.91719 4.293126
cpi 2.020064 4.811579 0.42 0.675 -7.410458 11.45059
ffr -69.31445 18.88191 -3.67 0.000 -106.3223 -32.30659
_cons 3344.671 1421.074 2.35 0.019 559.4169 6129.925

ARMA
ar
L1. .9974239 .0022842 436.67 0.000 .992947 1.001901

/sigma 44.06809 .5749338 76.65 0.000 42.94124 45.19494

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.

idk

13 | P a g e
ARIMA regression

Sample: 1/4/2021 - 6/30/2025, but with gaps Number of obs = 1127


Wald chi2(9) = 473176.53
Log likelihood = -5993.874 Prob > chi2 = 0.0000

OPG
pr1 Coef. Std. Err. z P>|z| [95% Conf. Interval]

pr1
pr1vol 2.88e-08 4.22e-07 0.07 0.946 -7.98e-07 8.56e-07
pr17 .1089123 .0452624 2.41 0.016 .0201995 .197625
pr23 .0101711 .0007284 13.96 0.000 .0087435 .0115986
pr35 3.174707 .6345796 5.00 0.000 1.930953 4.41846
cpi 1.43755 4.815993 0.30 0.765 -8.001622 10.87672
ffr -66.3247 19.36738 -3.42 0.001 -104.2841 -28.36534
_cons 3488.328 1434.181 2.43 0.015 677.385 6299.27

ARMA
ar
L1. 1.193346 .2039139 5.85 0.000 .7936818 1.59301
L2. -.1946532 .2034065 -0.96 0.339 -.5933227 .2040162

ma
L1. -.3353107 .2022208 -1.66 0.097 -.7316563 .0610349

/sigma 44.60379 .5983192 74.55 0.000 43.4311 45.77647

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.

Idk

14 | P a g e
3. No perfect Multicollinearity
H0: There is no harmful multicollinearity. Independent variables are not interdependent.
H1: There is harmful multicollinearity. Independent variables are interdependent.

Tests
1) Variance Inflation Factor (vif)
. vif

Variable VIF 1/VIF

cpi 31.03 0.032223


ffr 17.49 0.057176
pr17 11.02 0.090718
pr23 4.99 0.200352
pr35 3.69 0.271358
pr1vol 1.01 0.991712

Mean VIF 11.54

Vif is way more than 4 meaning there is harmful multicollinearity leading to inflated standard
errors. Reject null. Accept alternative

2) Pairwise Correlation
🔹 What does pairwise correlation show?
Pairwise correlation checks the strength of the linear relationship between two variables (X
and Y) at a time.
 Value range: –1 to +1
 +1 → perfectly positive linear relationship
 –1 → perfectly negative linear relationship
 0 → no linear relationship

🔹 Why look at this for multicollinearity?

The No Perfect Multicollinearity assumption means: No regressor can be a perfect linear


function of another regressor.
Pairwise correlation helps detect this because:

✔ If two predictors have a near-perfect correlation (e.g., > 0.95 or < –0.95),
they carry almost the same information → OLS cannot separate their effects.
Above 50% is a problem

15 | P a g e
. pwcorr pr1 pr1vol pr17 pr23 pr35 cpi ffr, star(3)

pr1 pr1vol pr17 pr23 pr35 cpi ffr

pr1 1.0000
pr1vol 0.0545 1.0000
pr17 0.8880* 0.0455 1.0000
pr23 0.8947* 0.0690* 0.8668* 1.0000
pr35 -0.2993* -0.0720* -0.3480* -0.4180* 1.0000
cpi 0.6701* 0.0220 0.7087* 0.4191* 0.0602 1.0000
ffr 0.4543* -0.0002 0.4749* 0.1925* -0.1031* 0.8938* 1.0000

We can see that pr17 is 89% correlated with pr1 and ffr is also 89% correlated with cpi.
Pr23 is also highly correlated with pr1 (89%) and pr23 (86%)
Highly correlated variables make it difficult for the regression to separate their effects, therefore,
their coefficients become unreliable. We may think smth is significant when it’s not.
Pr17 is not that significant according to the initial regression results so we could remove that as
a remedy.

Remedies
1) Drop highly correlated variables
. reg pr1 pr1vol pr23 pr35 cpi

Source SS df MS Number of obs = 1,127


F(4, 1122) = 2330.50
Model 500387499 4 125096875 Prob > F = 0.0000
Residual 60226744.6 1,122 53678.0255 R-squared = 0.8926
Adj R-squared = 0.8922
Total 560614244 1,126 497881.211 Root MSE = 231.69

pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

pr1vol 2.23e-06 2.63e-06 0.85 0.397 -2.94e-06 7.39e-06


pr23 .0210223 .0003527 59.61 0.000 .0203303 .0217143
pr35 -.2960559 .6486685 -0.46 0.648 -1.568796 .9766839
cpi 14.93146 .4533201 32.94 0.000 14.04201 15.82091
_cons -799.0853 128.1459 -6.24 0.000 -1050.518 -547.6528

. pwcorr pr1 pr1vol pr23 pr35 cpi

pr1 pr1vol pr23 pr35 cpi

pr1 1.0000
pr1vol 0.0545 1.0000
pr23 0.8947 0.0690 1.0000
pr35 -0.2993 -0.0720 -0.4180 1.0000
cpi 0.6701 0.0220 0.4191 0.0602 1.0000

16 | P a g e
. vif

Variable VIF 1/VIF

pr23 1.56 0.640880


cpi 1.31 0.766090
pr35 1.29 0.773006
pr1vol 1.01 0.993189

Mean VIF 1.29

Removing the variables pr17 and pr23 improved the model and dropped vif to below 4 meaning
we elimated the harmful multicollinearity

2) Combining highly correlated variables to make a single one


3) Centraling variables
4) Transforming variable (log)
. gen log_pr23= log( pr23)

. reg pr1 pr1vol pr17 log_pr23 pr35 cpi ffr

Source SS df MS Number of obs = 1,127


F(6, 1120) = 2112.46
Model 515097742 6 85849623.7 Prob > F = 0.0000
Residual 45516501.2 1,120 40639.7332 R-squared = 0.9188
Adj R-squared = 0.9184
Total 560614244 1,126 497881.211 Root MSE = 201.59

pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

pr1vol 2.15e-06 2.29e-06 0.94 0.348 -2.34e-06 6.65e-06


pr17 -.1018828 .0444848 -2.29 0.022 -.1891658 -.0145998
log_pr23 904.8238 21.32236 42.44 0.000 862.9876 946.6601
pr35 -6.024964 .9296015 -6.48 0.000 -7.84892 -4.201007
cpi 31.36614 1.90312 16.48 0.000 27.63206 35.10022
ffr -89.00014 10.96715 -8.12 0.000 -110.5186 -67.48167
_cons -13393.72 455.6803 -29.39 0.000 -14287.8 -12499.64

. vif

Variable VIF 1/VIF

cpi 30.39 0.032909


ffr 16.32 0.061284
pr17 10.57 0.094590
pr35 3.51 0.284963
log_pr23 3.23 0.309548
pr1vol 1.01 0.991736

Mean VIF 10.84

Adding a log pr 23 model improved but vif still above 4 still multicollinearity

17 | P a g e
. reg pr1 pr1vol log_pr23 pr35 cpi

Source SS df MS Number of obs = 1,127


F(4, 1122) = 2947.19
Model 511894466 4 127973616 Prob > F = 0.0000
Residual 48719778 1,122 43422.2621 R-squared = 0.9131
Adj R-squared = 0.9128
Total 560614244 1,126 497881.211 Root MSE = 208.38

pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

pr1vol 3.08e-06 2.37e-06 1.30 0.194 -1.57e-06 7.72e-06


log_pr23 985.4953 14.44087 68.24 0.000 957.1611 1013.829
pr35 -.8210773 .5762337 -1.42 0.154 -1.951694 .3095396
cpi 18.50115 .3832257 48.28 0.000 17.74923 19.25307
_cons -11317.09 173.8847 -65.08 0.000 -11658.27 -10975.92

. vif

Variable VIF 1/VIF

log_pr23 1.39 0.721062


pr35 1.26 0.792405
cpi 1.15 0.867154
pr1vol 1.01 0.993851

Mean VIF 1.20

Removing pr17 which was highly correlated with pr1 and changing pr23 to log improved the
model and reduced the vif to below 4.

18 | P a g e
4. Homoskedasticity
Ho: Variance of errors are constant. Homoskedastic errors exist.
H1: Variance of errors are not constant. Heteroskedastic errors exist.

Tests
1) Breusch-Pagan / Cook Weisberg test
. estat hettest

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity


Ho: Constant variance
Variables: fitted values of pr1

chi2(1) = 22.63
Prob > chi2 = 0.0000

P=0. Reject Null. Accept alternative hypothesis. Variance of errors is not constant. Assumption
not satisfied.

2) White test
. estat imtest, white

White's test for Ho: homoskedasticity


against Ha: unrestricted heteroskedasticity

chi2(27) = 531.40
Prob > chi2 = 0.0000

Cameron & Trivedi's decomposition of IM-test

Source chi2 df p

Heteroskedasticity 531.40 27 0.0000


Skewness 59.20 6 0.0000
Kurtosis 2.76 1 0.0968

Total 593.36 34 0.0000

P=0. Homoskedasticity does not exist.

19 | P a g e
500 3) Rvf plot (visual check)
0
Residuals

-500
-1000

4000 5000 6000 7000


Fitted values

Variance has a fan shape, not random scatter  heteroskedastic

Remedies
1) HuberWhite Robust
. reg pr1 pr1vol pr17 pr23 pr35 cpi ffr, vce(robust)

Linear regression Number of obs = 1,127


F(6, 1120) = 2179.25
Prob > F = 0.0000
R-squared = 0.8930
Root MSE = 231.42

Robust
pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

pr1vol 2.05e-06 2.67e-06 0.77 0.442 -3.18e-06 7.28e-06


pr17 -.0751559 .055468 -1.35 0.176 -.1839889 .0336771
pr23 .0208619 .000628 33.22 0.000 .0196297 .022094
pr35 -2.172655 1.037867 -2.09 0.037 -4.209038 -.1362725
cpi 19.53202 2.256203 8.66 0.000 15.10516 23.95889
ffr -27.13639 13.52386 -2.01 0.045 -53.67135 -.6014344
_cons -1774.277 493.8594 -3.59 0.000 -2743.271 -805.2833

20 | P a g e
2) Transform (Use logs)
. reg pr1 pr1vol log_pr23 pr35 cpi

Source SS df MS Number of obs = 1,127


F(4, 1122) = 2947.19
Model 511894466 4 127973616 Prob > F = 0.0000
Residual 48719778 1,122 43422.2621 R-squared = 0.9131
Adj R-squared = 0.9128
Total 560614244 1,126 497881.211 Root MSE = 208.38

pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

pr1vol 3.08e-06 2.37e-06 1.30 0.194 -1.57e-06 7.72e-06


log_pr23 985.4953 14.44087 68.24 0.000 957.1611 1013.829
pr35 -.8210773 .5762337 -1.42 0.154 -1.951694 .3095396
cpi 18.50115 .3832257 48.28 0.000 17.74923 19.25307
_cons -11317.09 173.8847 -65.08 0.000 -11658.27 -10975.92

. rvfplot, yline(0)
500
0
Residuals

-500
-1000

3500 4000 4500 5000 5500 6000


Fitted values

Heteroskedascticity slight changed

21 | P a g e
5. Normality of Errors
Ho: Residuals are normally distributed
H1: Residuals are not normally distributed. Skewness and Kurtosis exist

Tests
1) Jarque-Bera test
. sktest ehat

Skewness/Kurtosis tests for Normality


joint
Variable Obs Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2

ehat 1,127 0.0005 0.2730 12.43 0.0020

Skewness p-value=0.0005 which is very close to zero meaning we reject null and accept alt.
Skewness exists
Kurtosis p-value=0.2730 which is more that 0.05 meaning we fail to reject null. Null hypothesis
is true and there is no kurtosis
Joint stat says 0.0020. Reject null. Alterative true. Residuals are not normally distributed.

2) Shapiro-Wilk (for small samples)


. swilk ehat

Shapiro-Wilk W test for normal data

Variable Obs W V z Prob>z

ehat 1,127 0.98956 7.342 4.961 0.00000

z-value is more than 1.67 and p =0. Reject null accept alterative. Residuals are not normally
distributed.

22 | P a g e
1000
500 3) Q-Q plot
Residuals

0
-500
-1000

-1000 -500 0 500 1000


Inverse Normal
1.00
0.75
0.50
0.25
0.00

0.00 0.25 0.50 0.75 1.00


Empirical P[i] = i/(N+1)

Remedies
1) Use larger sample size if data available
2) Apply vce,robust to correct sdt errors
3) Apply Transformations to dependant variable
Log()
L.()

23 | P a g e
6. Outliers
Ho= No influential outlier observations
H1= Few influential outliers exist

Tests

24 | P a g e
Non-linear Regression Output Analysis
When dependant Y-variable is categorical (discrete), not continuous

Testable Hypothesis
Null Hypothesis
Ho: All the coefficients of the independent variables are equal to each other/zero. The
independent variables as a group do not have an effect on the probability of Y. Dependant
variable cannot be predicted from any of the independent variables.

Alternate Hypothesis
H1: At least one of the coefficients is different from zero. At least one of the independent
variables affect the probability of Y. At least one variable matters and can be used to predict the
dependant variable.

Running the Regression


1) Create categorical dependant variable
First, create a binary OR ordinal variable you want to predict (create categorical dependant
variable Y)
Binary
. gen pr5up= ( pr5>0) if !missing(pr5)
(525 missing values generated)

. tab pr5up

pr5up Freq. Percent Cum.

0 537 46.78 46.78


1 611 53.22 100.00

Total 1,148 100.00

. move pr5up pr1vol

Pr5 is the price change/returns of S&P500 during the day


Pr5up generated to categorize negative returns and positive returns in a binary manner
0= negative returns
1= positive returns
47% of returns are negative and 53% have positive returns.

25 | P a g e
Ordinal
. xtile pr5ord= pr5, n(5)

. label define pr5ord 1 "lowest" 2 "low" 3 "medium" 4 "high" 5 "highest"

. label values pr5ord pr5ord

. move pr5ord pr1vol

. tab pr5ord

5 quantiles
of pr5 Freq. Percent Cum.

lowest 231 20.12 20.12


low 233 20.30 40.42
medium 227 19.77 60.19
high 231 20.12 80.31
highest 226 19.69 100.00

Total 1,148 100.00

This table represents the percentage of returns under each category

2) Run LOGIT regression


Binary logit
. logit pr5up pr1vol pr17 pr23 pr35 cpi ffr

Iteration 0: log likelihood = -779.06331


Iteration 1: log likelihood = -773.27047
Iteration 2: log likelihood = -773.27013
Iteration 3: log likelihood = -773.27013

Logistic regression Number of obs = 1,127


LR chi2(6) = 11.59
Prob > chi2 = 0.0719
Log likelihood = -773.27013 Pseudo R2 = 0.0074

pr5up Coef. Std. Err. z P>|z| [95% Conf. Interval]

pr1vol 1.06e-08 2.29e-08 0.46 0.643 -3.42e-08 5.55e-08


pr17 -.0000267 .0004546 -0.06 0.953 -.0009178 .0008644
pr23 9.84e-06 5.51e-06 1.79 0.074 -9.56e-07 .0000206
pr35 -.0005704 .0095148 -0.06 0.952 -.019219 .0180783
cpi -.0204071 .0192304 -1.06 0.289 -.0580981 .0172838
ffr .1502956 .1134569 1.32 0.185 -.0720759 .372667
_cons 5.303272 4.148913 1.28 0.201 -2.828448 13.43499

Number of observations = 1127


You have 1,127 usable data points.

LR chi²(6) = 11.59

26 | P a g e
This is the "overall significance test" of the model.
It tests:
𝐻0 : 𝛽𝑎𝑙𝑙 𝑋 = 0

(meaning none of the independent variables affect pr5up)

Prob > chi² = 0.0719


This is the p-value for the LR test.
Interpretation:
 At 5% significance, p = 0.0719 → NOT significant
→ You cannot reject the null hypothesis.
 At 10% significance, p < 0.10 → weakly significant
Meaning:
As a group, the independent variables have a weak (but not strong) effect on the probability of a
positive return.

Pseudo R² = 0.0074
Extremely small.
Interpretation: The model explains less than 1% of the variation in the probability of a positive
return. This is very common in financial return prediction models.

Coefficient table only shows the significance of the variable and the direction it affects the
dependant variable.
Pr1vol: effect is essentially zero. P=0.643 so it is an insignificant variable that doesn’t predict the
return direction (whether returns are positive or negative).
Pr27: negative coefficient means higher pr17 lowers the chance of positive returns but p=0.935
means pr17 is totally insignificant and has no effect
Pr23: positive coeff which increases probability of positive returns. P=0.074 not significant at 5%
probability but slightly significant at 10% level. Slightly effective but coeff very small
Pr35: negative coef. p=0.952. Totally insignificant. Pr35 does not help predict the outcome.
Cpi: negative coeff, higher cpi slightly reduces probability of positive returns. P=0.289 which is
not significant at 5% or 10% level. Inflation does noe meaningfully drive short term return
direction.

27 | P a g e
Ffr:positive coeff, higher interest rate increases probability of positive ret but p=0.185 not
significant
. margins, dydx( pr1vol pr17 pr23 pr35 cpi ffr) atmeans

Conditional marginal effects Number of obs = 1,127


Model VCE : OIM

Expression : Pr(pr5up), predict()


dy/dx w.r.t. : pr1vol pr17 pr23 pr35 cpi ffr
at : pr1vol = 5582368 (mean)
pr17 = 2106.253 (mean)
pr23 = 48438.06 (mean)
pr35 = 77.14654 (mean)
cpi = 298.2124 (mean)
ffr = 3.129459 (mean)

Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]

pr1vol 2.64e-09 5.70e-09 0.46 0.643 -8.53e-09 1.38e-08


pr17 -6.65e-06 .0001132 -0.06 0.953 -.0002286 .0002153
pr23 2.45e-06 1.37e-06 1.79 0.074 -2.38e-07 5.14e-06
pr35 -.000142 .0023696 -0.06 0.952 -.0047865 .0045024
cpi -.0050824 .0047893 -1.06 0.289 -.0144692 .0043045
ffr .0374309 .0282563 1.32 0.185 -.0179504 .0928122

Pr1vol: “1 unit increase in pr1vol increases the probability of positive returns by 0.000000264
units”. Z-value is less than 1.67 and p=0.643. Pr1vol is completely insignificant and has no
effect on the probability of positive returns.
Pr17: “1 unit increase in pr17 decreases the probability of positive returns by 0.000665 units”. Z
value less than 1.67 and p-value almost equal to 1. Totally insignificant variable and has no
effect on probability of positive returns
Pr23: “1 unit increase in pr23 increases the probability of positive returns by 0.000245 units”. Z-
value is greater than 1.67 and p-value is no significant at 5% level but significant at 10% level as
it is less than 0.10. This is the only variable that has slight effect of dependant variable but the
coefficient is till very small.

28 | P a g e
. estat classification

Logistic model for pr5up

True
Classified D ~D Total

+ 477 379 856


- 121 150 271

Total 598 529 1127

Classified + if predicted Pr(D) >= .5


True D defined as pr5up != 0

Sensitivity Pr( +| D) 79.77%


Specificity Pr( -|~D) 28.36%
Positive predictive value Pr( D| +) 55.72%
Negative predictive value Pr(~D| -) 55.35%

False + rate for true ~D Pr( +|~D) 71.64%


False - rate for true D Pr( -| D) 20.23%
False + rate for classified + Pr(~D| +) 44.28%
False - rate for classified - Pr( D| -) 44.65%

Correctly classified 55.63%

It evaluates how well your logit model classifies the outcome pr5up (0 or 1) using a default
cutoff of 0.5.
If predicted probability ≥ 0.5 → Model predicts 1
If predicted probability < 0.5 → Model predicts 0
It shows how well the model have correctly predicted/ explained the binaries (0,1).
55.63% is far from 100% so the model does not correctly explain positive and negative returns.
. lroc

Logistic model for pr5up

number of observations = 1127


area under ROC curve = 0.5579

This shows how well the model can separate the two groups
1.00

in a binary outcome.
0.75

AUC=0.5  by chance
Sensitivity

0.50

AUC closer to 1  model is very good at distinguishing the 2


0.25

groups
0.00

0.00 0.25 0.50


1 - Specificity
0.75 1.00 The greater the distance between blue and black line, the
Area under ROC curve = 0.5579
worse off the model is.

29 | P a g e
Ordinal logit
. ologit pr5ord pr1vol pr17 pr23 pr35 cpi ffr

Iteration 0: log likelihood = -1813.705


Iteration 1: log likelihood = -1809.103
Iteration 2: log likelihood = -1809.1008
Iteration 3: log likelihood = -1809.1008

Ordered logistic regression Number of obs = 1,127


LR chi2(6) = 9.21
Prob > chi2 = 0.1622
Log likelihood = -1809.1008 Pseudo R2 = 0.0025

pr5ord Coef. Std. Err. z P>|z| [95% Conf. Interval]

pr1vol 8.64e-09 2.02e-08 0.43 0.668 -3.09e-08 4.81e-08


pr17 8.09e-06 .0004058 0.02 0.984 -.0007872 .0008034
pr23 8.58e-06 4.75e-06 1.81 0.071 -7.21e-07 .0000179
pr35 .001944 .0089701 0.22 0.828 -.015637 .019525
cpi -.0227048 .0177524 -1.28 0.201 -.0574989 .0120893
ffr .1615914 .104973 1.54 0.124 -.0441519 .3673347

/cut1 -6.991007 3.80584 -14.45032 .4683025


/cut2 -5.995004 3.802594 -13.44795 1.457944
/cut3 -5.198263 3.801926 -12.6499 2.253374
/cut4 -4.224411 3.802879 -11.67792 3.229094

The number of common observations among all selected variables is 1,127.


The Likelihood Ratio test LR chi^2 (overall significance test) is 9.21 meaning the
independent variables do not jointly improve the prediction of pr5up.
Probability of LR test is 0.162 which is not significant at the 5% level as it is not less
than 0.05, we fail to reject null hypothesis.
Pseudo R2 is 0.0025 meaning, the model explains less than 1% of the variation in the
probability of pr5up being in a higher return category.
The coefficients show how each independent variable influences the probability of
dependent variables.
Pr23 appears to be the only significant variable at 10% level. 1 unit increase in Pr23 increases
the probability of pr5up being in higher returns category.

30 | P a g e
. margins, dydx (pr1vol pr17 pr23 pr35 cpi ffr) atmeans

Conditional marginal effects Number of obs = 1,127


Model VCE : OIM

dy/dx w.r.t. : pr1vol pr17 pr23 pr35 cpi ffr


1._predict : Pr(pr5ord==1), predict(pr outcome(1))
2._predict : Pr(pr5ord==2), predict(pr outcome(2))
3._predict : Pr(pr5ord==3), predict(pr outcome(3))
4._predict : Pr(pr5ord==4), predict(pr outcome(4))
5._predict : Pr(pr5ord==5), predict(pr outcome(5))
at : pr1vol = 5582368 (mean)
pr17 = 2106.253 (mean)
pr23 = 48438.06 (mean)
pr35 = 77.14654 (mean)
cpi = 298.2124 (mean)
ffr = 3.129459 (mean)

Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]

pr1vol
_predict
1 -1.41e-09 3.28e-09 -0.43 0.668 -7.84e-09 5.03e-09
2 -6.85e-10 1.60e-09 -0.43 0.668 -3.81e-09 2.45e-09
3 3.07e-11 8.56e-11 0.36 0.720 -1.37e-10 1.98e-10
4 6.98e-10 1.63e-09 0.43 0.668 -2.49e-09 3.89e-09
5 1.36e-09 3.18e-09 0.43 0.668 -4.87e-09 7.59e-09

pr17
_predict
1 -1.32e-06 .0000661 -0.02 0.984 -.0001308 .0001282
2 -6.41e-07 .0000321 -0.02 0.984 -.0000636 .0000623
3 2.88e-08 1.44e-06 0.02 0.984 -2.80e-06 2.86e-06
4 6.53e-07 .0000328 0.02 0.984 -.0000636 .0000649
5 1.28e-06 .000064 0.02 0.984 -.0001242 .0001267

pr23
_predict
1 -1.40e-06 7.75e-07 -1.80 0.072 -2.92e-06 1.23e-07
2 -6.80e-07 3.80e-07 -1.79 0.074 -1.42e-06 6.56e-08
3 3.05e-08 4.96e-08 0.62 0.538 -6.66e-08 1.28e-07
4 6.93e-07 3.88e-07 1.79 0.074 -6.77e-08 1.45e-06
5 1.35e-06 7.47e-07 1.81 0.070 -1.12e-07 2.82e-06

pr35
_predict
1 -.0003166 .0014609 -0.22 0.828 -.0031798 .0025467
2 -.000154 .0007105 -0.22 0.828 -.0015465 .0012386
3 6.91e-06 .0000337 0.21 0.837 -.0000591 .0000729
4 .000157 .0007244 0.22 0.828 -.0012628 .0015767
5 .0003066 .0014149 0.22 0.828 -.0024666 .0030799

cpi
_predict
1 .0036972 .0028971 1.28 0.202 -.001981 .0093755
2 .0017982 .001414 1.27 0.203 -.0009731 .0045696
3 -.0000807 .0001395 -0.58 0.563 -.0003542 .0001927
4 -.0018333 .001442 -1.27 0.204 -.0046596 .000993
5 -.0035814 .0027978 -1.28 0.201 -.0090651 .0019022

ffr
_predict
1 -.0263135 .0171487 -1.53 0.125 -.0599244 .0072974
2 -.0127981 .0083814 -1.53 0.127 -.0292252 .0036291
3 .0005744 .000961 0.60 0.550 -.0013091 .0024579
4 .0130478 .0085503 1.53 0.127 -.0037106 .0298061
5 .0254894 .0165366 1.54 0.123 -.0069218 .0579006

31 | P a g e
3) PROBIT models
Binary
. probit pr5up pr1vol pr17 pr23 pr35 cpi ffr

Iteration 0: log likelihood = -779.06331


Iteration 1: log likelihood = -773.2774
Iteration 2: log likelihood = -773.27722
Iteration 3: log likelihood = -773.27722

Probit regression Number of obs = 1,127


LR chi2(6) = 11.57
Prob > chi2 = 0.0722
Log likelihood = -773.27722 Pseudo R2 = 0.0074

pr5up Coef. Std. Err. z P>|z| [95% Conf. Interval]

pr1vol 6.68e-09 1.43e-08 0.47 0.640 -2.13e-08 3.47e-08


pr17 -.0000159 .0002837 -0.06 0.955 -.0005719 .0005401
pr23 6.12e-06 3.43e-06 1.78 0.074 -6.04e-07 .0000128
pr35 -.0003825 .0059444 -0.06 0.949 -.0120333 .0112683
cpi -.0127067 .011989 -1.06 0.289 -.0362047 .0107912
ffr .0935977 .0707439 1.32 0.186 -.0450578 .2322532
_cons 3.302711 2.584956 1.28 0.201 -1.763709 8.369132

Overall significance structure of the model is 11.57, meaning the independent variables do not
jointly predict the positivity or negativity of the returns
Probability of the LR test is 0.07 which is more than 0.05 so we fail to reject the null hypothesis.
The null hypothesis is true, and the independent variables have no effect on the probability of Y
happening

32 | P a g e
Probit model for pr5up

True
Classified D ~D Total

+ 477 379 856


- 121 150 271

Total 598 529 1127

Classified + if predicted Pr(D) >= .5


True D defined as pr5up != 0

Sensitivity Pr( +| D) 79.77%


Specificity Pr( -|~D) 28.36%
Positive predictive value Pr( D| +) 55.72%
Negative predictive value Pr(~D| -) 55.35%

False + rate for true ~D Pr( +|~D) 71.64%


False - rate for true D Pr( -| D) 20.23%
False + rate for classified + Pr(~D| +) 44.28%
False - rate for classified - Pr( D| -) 44.65%

Correctly classified 55.63%

This shows that the model only explained 55.63% of the binary outcomes. Not a good model.
Needs improvement
. lroc

Probit model for pr5up

number of observations = 1127


area under ROC curve = 0.5579
1.00
0.75
Sensitivity

0.50
0.25
0.00

0.00 0.25 0.50 0.75 1.00


1 - Specificity
Area under ROC curve = 0.5579

AUC= 0.5  model by chance and is useless

33 | P a g e
Ordinal
. oprobit pr5ord pr1vol pr17 pr23 pr35 cpi ffr

Iteration 0: log likelihood = -1813.705


Iteration 1: log likelihood = -1809.624
Iteration 2: log likelihood = -1809.624

Ordered probit regression Number of obs = 1,127


LR chi2(6) = 8.16
Prob > chi2 = 0.2265
Log likelihood = -1809.624 Pseudo R2 = 0.0023

pr5ord Coef. Std. Err. z P>|z| [95% Conf. Interval]

pr1vol 6.23e-09 1.20e-08 0.52 0.604 -1.73e-08 2.98e-08


pr17 -4.58e-06 .0002392 -0.02 0.985 -.0004733 .0004642
pr23 5.03e-06 2.87e-06 1.75 0.079 -5.88e-07 .0000106
pr35 .0019541 .0050873 0.38 0.701 -.0080168 .011925
cpi -.0134837 .0102166 -1.32 0.187 -.0335079 .0065405
ffr .0959916 .0603408 1.59 0.112 -.0222742 .2142574

/cut1 -4.131872 2.20007 -8.443929 .180186


/cut2 -3.533609 2.198354 -7.842304 .7750866
/cut3 -3.037025 2.198208 -7.345433 1.271383
/cut4 -2.454788 2.199103 -6.764951 1.855375

34 | P a g e
. margins, dydx (pr1vol pr17 pr23 pr35 cpi ffr) atmeans

Conditional marginal effects Number of obs = 1,127


Model VCE : OIM

dy/dx w.r.t. : pr1vol pr17 pr23 pr35 cpi ffr


1._predict : Pr(pr5ord==1), predict(pr outcome(1))
2._predict : Pr(pr5ord==2), predict(pr outcome(2))
3._predict : Pr(pr5ord==3), predict(pr outcome(3))
4._predict : Pr(pr5ord==4), predict(pr outcome(4))
5._predict : Pr(pr5ord==5), predict(pr outcome(5))
at : pr1vol = 5582368 (mean)
pr17 = 2106.253 (mean)
pr23 = 48438.06 (mean)
pr35 = 77.14654 (mean)
cpi = 298.2124 (mean)
ffr = 3.129459 (mean)

Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]

pr1vol
_predict
1 -1.76e-09 3.40e-09 -0.52 0.604 -8.41e-09 4.90e-09
2 -6.58e-10 1.27e-09 -0.52 0.605 -3.15e-09 1.84e-09
3 1.90e-11 5.55e-11 0.34 0.732 -8.98e-11 1.28e-10
4 6.63e-10 1.28e-09 0.52 0.605 -1.85e-09 3.17e-09
5 1.74e-09 3.35e-09 0.52 0.604 -4.83e-09 8.30e-09

pr17
_predict
1 1.29e-06 .0000676 0.02 0.985 -.0001311 .0001337
2 4.85e-07 .0000253 0.02 0.985 -.0000491 .0000501
3 -1.40e-08 7.30e-07 -0.02 0.985 -1.45e-06 1.42e-06
4 -4.88e-07 .0000255 -0.02 0.985 -.0000504 .0000494
5 -1.28e-06 .0000667 -0.02 0.985 -.000132 .0001294

pr23
_predict
1 -1.42e-06 8.10e-07 -1.75 0.080 -3.01e-06 1.67e-07
2 -5.32e-07 3.07e-07 -1.73 0.083 -1.13e-06 7.03e-08
3 1.53e-08 3.48e-08 0.44 0.660 -5.29e-08 8.36e-08
4 5.35e-07 3.08e-07 1.74 0.082 -6.85e-08 1.14e-06
5 1.40e-06 8.00e-07 1.75 0.080 -1.65e-07 2.97e-06

pr35
_predict
1 -.0005521 .0014375 -0.38 0.701 -.0033695 .0022653
2 -.0002067 .0005384 -0.38 0.701 -.001262 .0008486
3 5.96e-06 .0000203 0.29 0.769 -.0000339 .0000458
4 .000208 .0005416 0.38 0.701 -.0008536 .0012696
5 .0005449 .0014186 0.38 0.701 -.0022356 .0033253

cpi
_predict
1 .0038097 .0028883 1.32 0.187 -.0018513 .0094706
2 .0014262 .0010892 1.31 0.190 -.0007085 .003561
3 -.0000412 .0000958 -0.43 0.668 -.000229 .0001467
4 -.0014351 .0010927 -1.31 0.189 -.0035767 .0007065
5 -.0037597 .0028508 -1.32 0.187 -.009347 .0018277

ffr
_predict
1 -.0271213 .0170638 -1.59 0.112 -.0605657 .0063231
2 -.0101535 .0064551 -1.57 0.116 -.0228052 .0024983
3 .000293 .0006708 0.44 0.662 -.0010219 .0016078
4 .0102164 .0064681 1.58 0.114 -.0024608 .0228936
5 .0267654 .016842 1.59 0.112 -.0062444 .0597752

35 | P a g e
36 | P a g e

You might also like