Regression Output Analysis
Regression Output Analysis
. tsset date
time variable: date, 1/1/2021 to 7/31/2025
delta: 1 day
1|Page
Pr23: “1 unit increase in pr23 increases pr1 by 0.02 units assuming all other variable are
held constant.”” The t-value= 33.11 which is way higher that 1.67 and the p-value is
equal to zero indicating that pr23 (bitcoin’s closing price) is a very significant variable in
this model.
Pr35: “1 unit increase in pr35 decreases pr1 by 2.17 units, assuming all other variables
are held constant.” The t-value=1.99 which is greater than 1.67 and p-value=0.047
which is less than 0.05 (closer to zero) indicating that pr35 (crude oil’s closing price) is
an important variable in this model.
Cpi: “1 unit increase in cpi increases pr1 by 19.5 units, assuming all other variables are
held constant.” The t-value = 8.85 which is greater than 1.67 and p-value = 0 indicating
that inflation index is a significant factor in this model.
Ffr: “1 unit increase in ffr decreases pr1 by 27 units, assuming all other variables are
held constant.” The t-value= 2.08 which is greater than 1.67 and p-value= 0.038 which
is closer to zero indicating that US federal funds rates plays a significant role in this
regression model.
_cons: This is the y-int of the model, the value of pr1 when all independent variables are
zero.
Assumptions
1. Linearity/ Model Specification
Ho: There are no omitted variables in the regression model, the model is well-specified.
H1: There are some omitted variables in the regression model, the model is
misspecified.
Tests
1) Ramsey RESET test
. est store OLS
. estat ovtest
Interpretation: P-value
Probability = 0 we must reject the Null Hypothesis. The alterative hypothesis is true,
meaning the model has some omitted variables that needs to be improved as variables
likely miss powers, interactions, or functional form. Any conclusion made on this model
would be biased, distorted, and inaccurate inferences would be made.
2|Page
2) Link test
Ho: There are no omitted variables in the regression model, the model is well-specified.
H1: There are some omitted variables in the regression model, the model is
misspecified.
. linktest
Interpretation: _hatsq
The _hatsq has an absolute t-value of 11.52 which is greater than 1.67 and a p-value=0.
These results also indicated that we must reject the null hypothesis. The alternative
hypothesis is true, and the model is misspecified. Inferences made on this model would
be distorted and inaccurate.
3) Residuals plot: residuals are the error terms left unexplained by the model
Ho: There are no omitted variables in the regression model, the model is well-specified.
H1: There are some omitted variables in the regression model, the model is
misspecified.
500
0
Residuals
-500
-1000
3|Page
The residuals are symmetrically distributed in the first section; however second & third parts are
not symmetrical above and below zero. So, since there is some structure/curvature, we reject
the null hypothesis. The Alternative Hypothesis is true as the model is misspecified.
According to all three tests, the model is not well-specified, non-linearity is proved and any
conclusions made from this model would be distorted and biased and inferences would be
inaccurate.
Remedies
1) Adding/removing variables
. reg pr1 pr17 pr23 pr35 cpi ffr sent
By removing the insignificant variable “pr1vol” (traded volume) and adding “sent” (Consumer
sentiment index), the F significantly from 1557 to 2015 and the R-squared also shot up from
89% to 91% indicating that the model had improved. All the t-values and p-values of every
variable has also become significant.
4|Page
2) Including Polynomial or interaction terms
. gen pr17sq= pr17^2
(493 missing values generated)
The F has only slightly increased from1557 to 1609 and all variables except pr1vol have
become significant to the model.
. reg pr1 pr17sq pr23 pr35 cpi ffr
Removing pr1vol increased the F more and all variables became significant to the model.
5|Page
. reg pr1 pr1vol pr17 pr23 pr35 cpi ffr c.pr1#[Link]
The interaction term c.pr1#[Link] multiplied continuous pr1 with continuous cpi. This
interaction term has improved the model significantly (F is 99999 and R-squared is 99% of
variability explained in pr1). The t-value of this interaction term is also very high.
The c. prefix tells Stata: “These are continuous variables — do not create categorical (dummy)
interactions.”
Meaning:
6|Page
. gen log_pr35 = log( pr35)
(471 missing values generated)
Probit is for non-liner regression (logistic regression, which is for categorical outcomes, so it
doesn’t work on this model)
[Link] of Errors
Ho: There is no serial correlation/autocorrelation present in the data. The past values don’t
predict the present values.
7|Page
H1: There is serial correlation/autocorrelation present in the data. The present values tend to
depend on the past values.
Tests
1) Durbin-Watson – AR for only 1 lag period
. estat dwatson
The d-stat is 0.04 which between 0 to 2 indicating positive autocorrelation but it is very close to
zero, almost no autocorrelation between the independent variables.
1 845.356 1 0.0000
P-value = 0. Reject Null Hypothesis. Accept Alternative Hypothesis indicating that there is some
autocorrelation among the variables and the previous values influence the present values.
. estat bgodfrey, lags(1/4)
1 845.356 1 0.0000
2 845.576 2 0.0000
3 949.908 3 0.0000
4 974.490 4 0.0000
This shows 4 lag periods and the probability of each one is 0, confirming that autocorrelation
exists in the dataset. Alt. Hypothesis is true and previous period’s values affect the current
period’s values and people are influenced by the past.
8|Page
Remedies
1) Adding lag of dependant variable
. reg pr1 L.pr1 pr1vol pr17 pr23 pr35 cpi ffr, vce(robust)
Robust
pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]
pr1
L1. .971071 .0102461 94.78 0.000 .9509612 .9911808
Lag of 1 period of pr1 has significantly improved the model and the F-stat rose to 22,876 and R-
squared shot up 99%. The t-value and p-value of L.pr1 also shows that the previous values of
pr1 significantly influence the current period’s value of pr1.
. reg pr1 L(1/2).pr1 pr1vol pr17 pr23 pr35 cpi ffr, vce(robust)
Robust
pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]
pr1
L1. .9299759 .0870672 10.68 0.000 .7590015 1.10095
L2. .0388219 .0842142 0.46 0.645 -.1265502 .204194
By adding 2 periods lag of pr1, the model only improved slightly as the F-stat is lower than lag of
one period. Further, the t-value and p-value show that only only the 1 period’s lag is significant
9|Page
and 2nd period lag is not. Meaning that only the previous period’s value influences pr1, not the
period before that.
. reg pr1 L.pr1 L(0/2).pr1vol L(0/1).pr35 pr17 pr23 cpi L(0/2).ffr, robust
Robust
pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]
pr1
L1. .9688955 .0099521 97.36 0.000 .9493625 .9884286
pr1vol
--. 2.06e-08 6.69e-07 0.03 0.975 -1.29e-06 1.33e-06
L1. 1.18e-07 6.81e-07 0.17 0.862 -1.22e-06 1.46e-06
L2. -8.38e-07 7.11e-07 -1.18 0.239 -2.23e-06 5.58e-07
pr35
--. 3.716762 1.500399 2.48 0.013 .7719268 6.661597
L1. -4.044794 1.569698 -2.58 0.010 -7.125642 -.9639455
ffr
--. -88.81447 36.88618 -2.41 0.016 -161.211 -16.41793
L1. 91.45081 42.86266 2.13 0.033 7.324231 175.5774
L2. -3.697402 23.88526 -0.15 0.877 -50.57702 43.18222
Didn’t work
10 | P a g e
3) Prais-Winsten
. prais pr1 pr1vol pr17 pr23 pr35 cpi ffr
rho .9822682
Model improved according to F-stat and R-squared but the D-stat increased showing positive
autocorrelation.. why? This is supposed to be a fix for autocorrelation
11 | P a g e
4) Cochrane-Orcutt
. prais pr1 pr1vol pr17 pr23 pr35 cpi ffr,corc
rho .9968243
Overall model worsened and d-stat lower than prais probably bcz first observation got dropped
12 | P a g e
5) ARIMA
ARIMA regression
OPG
pr1 Coef. Std. Err. z P>|z| [95% Conf. Interval]
pr1
pr1vol -2.89e-08 3.86e-07 -0.07 0.940 -7.85e-07 7.27e-07
pr17 .0978586 .0431353 2.27 0.023 .013315 .1824021
pr23 .0103066 .000669 15.41 0.000 .0089954 .0116177
pr35 3.105158 .6061175 5.12 0.000 1.91719 4.293126
cpi 2.020064 4.811579 0.42 0.675 -7.410458 11.45059
ffr -69.31445 18.88191 -3.67 0.000 -106.3223 -32.30659
_cons 3344.671 1421.074 2.35 0.019 559.4169 6129.925
ARMA
ar
L1. .9974239 .0022842 436.67 0.000 .992947 1.001901
Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.
idk
13 | P a g e
ARIMA regression
OPG
pr1 Coef. Std. Err. z P>|z| [95% Conf. Interval]
pr1
pr1vol 2.88e-08 4.22e-07 0.07 0.946 -7.98e-07 8.56e-07
pr17 .1089123 .0452624 2.41 0.016 .0201995 .197625
pr23 .0101711 .0007284 13.96 0.000 .0087435 .0115986
pr35 3.174707 .6345796 5.00 0.000 1.930953 4.41846
cpi 1.43755 4.815993 0.30 0.765 -8.001622 10.87672
ffr -66.3247 19.36738 -3.42 0.001 -104.2841 -28.36534
_cons 3488.328 1434.181 2.43 0.015 677.385 6299.27
ARMA
ar
L1. 1.193346 .2039139 5.85 0.000 .7936818 1.59301
L2. -.1946532 .2034065 -0.96 0.339 -.5933227 .2040162
ma
L1. -.3353107 .2022208 -1.66 0.097 -.7316563 .0610349
Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.
Idk
14 | P a g e
3. No perfect Multicollinearity
H0: There is no harmful multicollinearity. Independent variables are not interdependent.
H1: There is harmful multicollinearity. Independent variables are interdependent.
Tests
1) Variance Inflation Factor (vif)
. vif
Vif is way more than 4 meaning there is harmful multicollinearity leading to inflated standard
errors. Reject null. Accept alternative
2) Pairwise Correlation
🔹 What does pairwise correlation show?
Pairwise correlation checks the strength of the linear relationship between two variables (X
and Y) at a time.
Value range: –1 to +1
+1 → perfectly positive linear relationship
–1 → perfectly negative linear relationship
0 → no linear relationship
✔ If two predictors have a near-perfect correlation (e.g., > 0.95 or < –0.95),
they carry almost the same information → OLS cannot separate their effects.
Above 50% is a problem
15 | P a g e
. pwcorr pr1 pr1vol pr17 pr23 pr35 cpi ffr, star(3)
pr1 1.0000
pr1vol 0.0545 1.0000
pr17 0.8880* 0.0455 1.0000
pr23 0.8947* 0.0690* 0.8668* 1.0000
pr35 -0.2993* -0.0720* -0.3480* -0.4180* 1.0000
cpi 0.6701* 0.0220 0.7087* 0.4191* 0.0602 1.0000
ffr 0.4543* -0.0002 0.4749* 0.1925* -0.1031* 0.8938* 1.0000
We can see that pr17 is 89% correlated with pr1 and ffr is also 89% correlated with cpi.
Pr23 is also highly correlated with pr1 (89%) and pr23 (86%)
Highly correlated variables make it difficult for the regression to separate their effects, therefore,
their coefficients become unreliable. We may think smth is significant when it’s not.
Pr17 is not that significant according to the initial regression results so we could remove that as
a remedy.
Remedies
1) Drop highly correlated variables
. reg pr1 pr1vol pr23 pr35 cpi
pr1 1.0000
pr1vol 0.0545 1.0000
pr23 0.8947 0.0690 1.0000
pr35 -0.2993 -0.0720 -0.4180 1.0000
cpi 0.6701 0.0220 0.4191 0.0602 1.0000
16 | P a g e
. vif
Removing the variables pr17 and pr23 improved the model and dropped vif to below 4 meaning
we elimated the harmful multicollinearity
. vif
Adding a log pr 23 model improved but vif still above 4 still multicollinearity
17 | P a g e
. reg pr1 pr1vol log_pr23 pr35 cpi
. vif
Removing pr17 which was highly correlated with pr1 and changing pr23 to log improved the
model and reduced the vif to below 4.
18 | P a g e
4. Homoskedasticity
Ho: Variance of errors are constant. Homoskedastic errors exist.
H1: Variance of errors are not constant. Heteroskedastic errors exist.
Tests
1) Breusch-Pagan / Cook Weisberg test
. estat hettest
chi2(1) = 22.63
Prob > chi2 = 0.0000
P=0. Reject Null. Accept alternative hypothesis. Variance of errors is not constant. Assumption
not satisfied.
2) White test
. estat imtest, white
chi2(27) = 531.40
Prob > chi2 = 0.0000
Source chi2 df p
19 | P a g e
500 3) Rvf plot (visual check)
0
Residuals
-500
-1000
Remedies
1) HuberWhite Robust
. reg pr1 pr1vol pr17 pr23 pr35 cpi ffr, vce(robust)
Robust
pr1 Coef. Std. Err. t P>|t| [95% Conf. Interval]
20 | P a g e
2) Transform (Use logs)
. reg pr1 pr1vol log_pr23 pr35 cpi
. rvfplot, yline(0)
500
0
Residuals
-500
-1000
21 | P a g e
5. Normality of Errors
Ho: Residuals are normally distributed
H1: Residuals are not normally distributed. Skewness and Kurtosis exist
Tests
1) Jarque-Bera test
. sktest ehat
Skewness p-value=0.0005 which is very close to zero meaning we reject null and accept alt.
Skewness exists
Kurtosis p-value=0.2730 which is more that 0.05 meaning we fail to reject null. Null hypothesis
is true and there is no kurtosis
Joint stat says 0.0020. Reject null. Alterative true. Residuals are not normally distributed.
z-value is more than 1.67 and p =0. Reject null accept alterative. Residuals are not normally
distributed.
22 | P a g e
1000
500 3) Q-Q plot
Residuals
0
-500
-1000
Remedies
1) Use larger sample size if data available
2) Apply vce,robust to correct sdt errors
3) Apply Transformations to dependant variable
Log()
L.()
23 | P a g e
6. Outliers
Ho= No influential outlier observations
H1= Few influential outliers exist
Tests
24 | P a g e
Non-linear Regression Output Analysis
When dependant Y-variable is categorical (discrete), not continuous
Testable Hypothesis
Null Hypothesis
Ho: All the coefficients of the independent variables are equal to each other/zero. The
independent variables as a group do not have an effect on the probability of Y. Dependant
variable cannot be predicted from any of the independent variables.
Alternate Hypothesis
H1: At least one of the coefficients is different from zero. At least one of the independent
variables affect the probability of Y. At least one variable matters and can be used to predict the
dependant variable.
. tab pr5up
25 | P a g e
Ordinal
. xtile pr5ord= pr5, n(5)
. tab pr5ord
5 quantiles
of pr5 Freq. Percent Cum.
LR chi²(6) = 11.59
26 | P a g e
This is the "overall significance test" of the model.
It tests:
𝐻0 : 𝛽𝑎𝑙𝑙 𝑋 = 0
Pseudo R² = 0.0074
Extremely small.
Interpretation: The model explains less than 1% of the variation in the probability of a positive
return. This is very common in financial return prediction models.
Coefficient table only shows the significance of the variable and the direction it affects the
dependant variable.
Pr1vol: effect is essentially zero. P=0.643 so it is an insignificant variable that doesn’t predict the
return direction (whether returns are positive or negative).
Pr27: negative coefficient means higher pr17 lowers the chance of positive returns but p=0.935
means pr17 is totally insignificant and has no effect
Pr23: positive coeff which increases probability of positive returns. P=0.074 not significant at 5%
probability but slightly significant at 10% level. Slightly effective but coeff very small
Pr35: negative coef. p=0.952. Totally insignificant. Pr35 does not help predict the outcome.
Cpi: negative coeff, higher cpi slightly reduces probability of positive returns. P=0.289 which is
not significant at 5% or 10% level. Inflation does noe meaningfully drive short term return
direction.
27 | P a g e
Ffr:positive coeff, higher interest rate increases probability of positive ret but p=0.185 not
significant
. margins, dydx( pr1vol pr17 pr23 pr35 cpi ffr) atmeans
Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]
Pr1vol: “1 unit increase in pr1vol increases the probability of positive returns by 0.000000264
units”. Z-value is less than 1.67 and p=0.643. Pr1vol is completely insignificant and has no
effect on the probability of positive returns.
Pr17: “1 unit increase in pr17 decreases the probability of positive returns by 0.000665 units”. Z
value less than 1.67 and p-value almost equal to 1. Totally insignificant variable and has no
effect on probability of positive returns
Pr23: “1 unit increase in pr23 increases the probability of positive returns by 0.000245 units”. Z-
value is greater than 1.67 and p-value is no significant at 5% level but significant at 10% level as
it is less than 0.10. This is the only variable that has slight effect of dependant variable but the
coefficient is till very small.
28 | P a g e
. estat classification
True
Classified D ~D Total
It evaluates how well your logit model classifies the outcome pr5up (0 or 1) using a default
cutoff of 0.5.
If predicted probability ≥ 0.5 → Model predicts 1
If predicted probability < 0.5 → Model predicts 0
It shows how well the model have correctly predicted/ explained the binaries (0,1).
55.63% is far from 100% so the model does not correctly explain positive and negative returns.
. lroc
This shows how well the model can separate the two groups
1.00
in a binary outcome.
0.75
AUC=0.5 by chance
Sensitivity
0.50
groups
0.00
29 | P a g e
Ordinal logit
. ologit pr5ord pr1vol pr17 pr23 pr35 cpi ffr
30 | P a g e
. margins, dydx (pr1vol pr17 pr23 pr35 cpi ffr) atmeans
Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]
pr1vol
_predict
1 -1.41e-09 3.28e-09 -0.43 0.668 -7.84e-09 5.03e-09
2 -6.85e-10 1.60e-09 -0.43 0.668 -3.81e-09 2.45e-09
3 3.07e-11 8.56e-11 0.36 0.720 -1.37e-10 1.98e-10
4 6.98e-10 1.63e-09 0.43 0.668 -2.49e-09 3.89e-09
5 1.36e-09 3.18e-09 0.43 0.668 -4.87e-09 7.59e-09
pr17
_predict
1 -1.32e-06 .0000661 -0.02 0.984 -.0001308 .0001282
2 -6.41e-07 .0000321 -0.02 0.984 -.0000636 .0000623
3 2.88e-08 1.44e-06 0.02 0.984 -2.80e-06 2.86e-06
4 6.53e-07 .0000328 0.02 0.984 -.0000636 .0000649
5 1.28e-06 .000064 0.02 0.984 -.0001242 .0001267
pr23
_predict
1 -1.40e-06 7.75e-07 -1.80 0.072 -2.92e-06 1.23e-07
2 -6.80e-07 3.80e-07 -1.79 0.074 -1.42e-06 6.56e-08
3 3.05e-08 4.96e-08 0.62 0.538 -6.66e-08 1.28e-07
4 6.93e-07 3.88e-07 1.79 0.074 -6.77e-08 1.45e-06
5 1.35e-06 7.47e-07 1.81 0.070 -1.12e-07 2.82e-06
pr35
_predict
1 -.0003166 .0014609 -0.22 0.828 -.0031798 .0025467
2 -.000154 .0007105 -0.22 0.828 -.0015465 .0012386
3 6.91e-06 .0000337 0.21 0.837 -.0000591 .0000729
4 .000157 .0007244 0.22 0.828 -.0012628 .0015767
5 .0003066 .0014149 0.22 0.828 -.0024666 .0030799
cpi
_predict
1 .0036972 .0028971 1.28 0.202 -.001981 .0093755
2 .0017982 .001414 1.27 0.203 -.0009731 .0045696
3 -.0000807 .0001395 -0.58 0.563 -.0003542 .0001927
4 -.0018333 .001442 -1.27 0.204 -.0046596 .000993
5 -.0035814 .0027978 -1.28 0.201 -.0090651 .0019022
ffr
_predict
1 -.0263135 .0171487 -1.53 0.125 -.0599244 .0072974
2 -.0127981 .0083814 -1.53 0.127 -.0292252 .0036291
3 .0005744 .000961 0.60 0.550 -.0013091 .0024579
4 .0130478 .0085503 1.53 0.127 -.0037106 .0298061
5 .0254894 .0165366 1.54 0.123 -.0069218 .0579006
31 | P a g e
3) PROBIT models
Binary
. probit pr5up pr1vol pr17 pr23 pr35 cpi ffr
Overall significance structure of the model is 11.57, meaning the independent variables do not
jointly predict the positivity or negativity of the returns
Probability of the LR test is 0.07 which is more than 0.05 so we fail to reject the null hypothesis.
The null hypothesis is true, and the independent variables have no effect on the probability of Y
happening
32 | P a g e
Probit model for pr5up
True
Classified D ~D Total
This shows that the model only explained 55.63% of the binary outcomes. Not a good model.
Needs improvement
. lroc
0.50
0.25
0.00
33 | P a g e
Ordinal
. oprobit pr5ord pr1vol pr17 pr23 pr35 cpi ffr
34 | P a g e
. margins, dydx (pr1vol pr17 pr23 pr35 cpi ffr) atmeans
Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]
pr1vol
_predict
1 -1.76e-09 3.40e-09 -0.52 0.604 -8.41e-09 4.90e-09
2 -6.58e-10 1.27e-09 -0.52 0.605 -3.15e-09 1.84e-09
3 1.90e-11 5.55e-11 0.34 0.732 -8.98e-11 1.28e-10
4 6.63e-10 1.28e-09 0.52 0.605 -1.85e-09 3.17e-09
5 1.74e-09 3.35e-09 0.52 0.604 -4.83e-09 8.30e-09
pr17
_predict
1 1.29e-06 .0000676 0.02 0.985 -.0001311 .0001337
2 4.85e-07 .0000253 0.02 0.985 -.0000491 .0000501
3 -1.40e-08 7.30e-07 -0.02 0.985 -1.45e-06 1.42e-06
4 -4.88e-07 .0000255 -0.02 0.985 -.0000504 .0000494
5 -1.28e-06 .0000667 -0.02 0.985 -.000132 .0001294
pr23
_predict
1 -1.42e-06 8.10e-07 -1.75 0.080 -3.01e-06 1.67e-07
2 -5.32e-07 3.07e-07 -1.73 0.083 -1.13e-06 7.03e-08
3 1.53e-08 3.48e-08 0.44 0.660 -5.29e-08 8.36e-08
4 5.35e-07 3.08e-07 1.74 0.082 -6.85e-08 1.14e-06
5 1.40e-06 8.00e-07 1.75 0.080 -1.65e-07 2.97e-06
pr35
_predict
1 -.0005521 .0014375 -0.38 0.701 -.0033695 .0022653
2 -.0002067 .0005384 -0.38 0.701 -.001262 .0008486
3 5.96e-06 .0000203 0.29 0.769 -.0000339 .0000458
4 .000208 .0005416 0.38 0.701 -.0008536 .0012696
5 .0005449 .0014186 0.38 0.701 -.0022356 .0033253
cpi
_predict
1 .0038097 .0028883 1.32 0.187 -.0018513 .0094706
2 .0014262 .0010892 1.31 0.190 -.0007085 .003561
3 -.0000412 .0000958 -0.43 0.668 -.000229 .0001467
4 -.0014351 .0010927 -1.31 0.189 -.0035767 .0007065
5 -.0037597 .0028508 -1.32 0.187 -.009347 .0018277
ffr
_predict
1 -.0271213 .0170638 -1.59 0.112 -.0605657 .0063231
2 -.0101535 .0064551 -1.57 0.116 -.0228052 .0024983
3 .000293 .0006708 0.44 0.662 -.0010219 .0016078
4 .0102164 .0064681 1.58 0.114 -.0024608 .0228936
5 .0267654 .016842 1.59 0.112 -.0062444 .0597752
35 | P a g e
36 | P a g e