0% found this document useful (0 votes)
21 views8 pages

Comparative Analysis of Structural Change Tests

This document summarizes a study that compares the power of four statistical tests for detecting structural changes in regression models: the Chow test, linear test, predictive failure test, and cusum of squares test. Through Monte Carlo experiments using a random walk model and once-for-all parameter shift model, the study finds that the linear test has the greatest power to detect structural changes, followed by the Chow test. It suggests using the linear test as the basic test for structural changes in time series data.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views8 pages

Comparative Analysis of Structural Change Tests

This document summarizes a study that compares the power of four statistical tests for detecting structural changes in regression models: the Chow test, linear test, predictive failure test, and cusum of squares test. Through Monte Carlo experiments using a random walk model and once-for-all parameter shift model, the study finds that the linear test has the greatest power to detect structural changes, followed by the Chow test. It suggests using the linear test as the basic test for structural changes in time series data.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Journal of Forecasting, Vol.

9, 437-444 (1990)

Some Comparisons of the Relative Power of


Simple Tests for Structural Change in
Regression Models
MICHAEL BLEANEY
University of Nottingham, U.K.

ABSTRACT
The power of Chow, linear, predictive failure and cusum of squares tests
to detect structural change is compared in a two-variable random walk
model and a once-for-all parameter shift model. In each case the linear test
has greatest power, followed by the Chow test. It is suggested that the
linear test be used as the basic general test for structural change in time
series data, and tests of forecasting performance be confined to the last few
observations. Analysis of recursive residuals and recursive parameter esti-
mates should be regarded as forms of exploratory data analysis and tools
for understanding discrepancies with previous results rather than a basis
for formal tests of structural change.

K E Y WORDS Cusum OLS regression Predictive failure


Recursive residuals Structural stability

INTRODUCTION

Parameter constancy is widely recognized as an important attribute of an acceptable


econometric model. In practice it is nevertheless probable that instability of the parameters is
a major cause of poor out-of-sample model performance. In recent years the portfolio of
relatively simple test procedures for diagnosing structural change in a regression model has
expanded rapidly, though the most frequently used approach remains that of Chow (1960): i.e.
to split the data into two subsets at the suspected shift point and to test for significant reduction
in the residual sum of squares relative to the combined regression by using an F-test. The Chow
procedure may be expanded to allow for more than two subsets, and requires amendment in
the presence of heteroscedasticity (for a useful summary see Pesaran el a/., 1985). More recent
research has added more general procedures for detecting structural change based on out-of-
sample forecasting performance (Hendry, 1979) and recursive residuals (Brown et al., 1985;
Dufour, 1982, 1986). Recursive residuals, in particular, are a rich source of information about
model performance, especially if examined in conjunction with recursive parameter estimates,
and tests based on them are becoming a standard feature of regression packages.
However, despite these advances, comparatively little research has been done into the relative
power of various tests. Some practitioners prefer to rely on a Chow test at a strategic point in
0277-6693/90/050437-08$05 .OO Received August I989
0 1990 by John Wiley & Sons, Ltd.
438 Journal of Forecasting Vol. 9, Iss. No. 5

the data, others on examination of recursive residuals, and still others on retaining a few
observations to test out-of-sample forecasting performance. Yet virtually nothing is known
about the relative merits of these different procedures. This is disquieting, particularly when
it is realized that there is evidence to suggest that none of these tests have as much power as
we would like to think (Farley et a/., 1975; Garbade, 1977). In this paper some Monte Carlo
experiments are carried out on a two-variable regression model in an attempt to gain insight
into some of these issues. It emerges that out-of-sample forecasting tests are relatively weak,
and that the most powerful general test of structural stability is one that is rarely used.

EXPERIMENTAL DESIGN

The principal tests were carried out on data generated artificially according to the random walk
model described in equations ( I ) to (4):
Y , = ar + b,X, + U , t = 1,2, ..., n
a, = at-l + u,
b, = br- I + w,
a1 = bl = 1
The independent variable X and the error term u were generated randomly and independently
from a normal (0, 1) distribution. The innovations in the martingale process for the parameters
a and b (v and w , respectively) were likewise drawn from a normal distribution with mean zero,
but with variance that was inversely related to the sample size: for n = 40, they had a variance
of 0.01; for n = 100, a variance of 0.004. The model was then estimated by ordinary least
squares assuming constant parameters, and a number of diagnostic tests for structural change
applied.
In order to check that the results were not strongly dependent on the type of parameter
instability selected, tests were also done on a ‘parameter shift’ model, in which a and b took
the value 1 for the first n l observations and the value 1.4 for the remainder.
The random walk formulation was intended to capture a feature which (in the author’s
belief) is characteristic of economic data: that because of gradual shifts in social attitudes and
preferences and other not easily quantified factors, parameters tend to drift over time. The
random walk is a convenient simple model that generates a wide variety of drift patterns, and
would therefore seem eminently suitable for this type of experiment. A further important
consideration is that a random walk does not appear a priori to be biased in favour of any
particular test. It was decided to make X stationary in order to prevent serial correlation of
the errors from acting as an indicator of parameter instability.
The following tests were used in the experiments: ( I ) a standard Chow test at the midpoint
of the data; ( 2 ) a test for linear trends in the coefficients; (3) a Chow predictive failure test;
(4) a Hendry predictive failure test: and (5) a cusum of squares test. A brief description of these
tests follows. For more detailed discussion readers should consult the references cited.

(1) The midpoint Chow test


This is a version of the well-known test introduced by Chow (1960) for a shift in coefficients
between two subsets of the data. The null hypothesis of equation (1) is tested against the
alternative:
Y=a+a’D+(b+b’D)X+u (5)
M. Bleaney Structural Change Tests 439

where D = 0 for t = 1,2, ...,n / 2


= 1 for t = (n +1)/2, ...,n
Effectively, this is a joint test of the null hypothesis a’ = b’ = 0, which may be carried out with
the usual F-test for the addition of regressors. Thus if the residual sums of squares from
estimation of equations (1) and (5) are, respectively, RSSl and RSS5, then the [Link]
F1 = (n - 4)(RSS1 - RSSs)/(2RSSs) - F(2, n - 4)
or, more generally, for k regressors,
F1 = (n - 2k)(RSS* - RSSs)/(kRSSs) - F(k, n - 2k)
(2) The linear test
This is a little-used test which is more powerful than the Chow test against smooth trends in
parameters and also against once-for-all parameter shifts if the hypothesized shift point is not
close to the true shift point (Farley et al., 1975). The alternative hypothesis specifies all
coefficients to be subject to a linear trend:
Y = a + a ’ t + ( b+ b ’ t ) X + u (6)
Once again the null hypothesis is a’ = b’ = 0, and the test statistic is identical to that for the
Chow test.

(3) The Chow test for predictive failure


This is the test proposed by Chow (1960) for the case where one of the data subsets contains
less than k observations, so that estimation of equation ( 5 ) would yield residuals that were
identically zero for the observations within that subset. If this subset contains m < k
observations and the residual sum of squares when these are omitted from the sample is RSS,,,,
then the modified test statistic proposed by Chow was
F 2 = (n - m - k)(RSS1- RSS,)/ (mRSS,) - F(rn, n - m - k )
Like F1, F 2 can be given a dummy variable interpretation. If equation (1) were estimated with
the addition of m dummy variables, each taking the value 1 in one of the m observations and
0 otherwise, the test for the joint hypothesis that all the dummy coefficients are zero is F 2 . In
principle there is no reason why the use of FZshould be restricted to cases where m < k, and
in these experiments the last 20% of the observations were used: i.e. m = 0.2n.

(4) Hendry’s predictive failure test


In this test, introduced by Hendry (1979), the equation is estimated over the first n - m
observations and the remaining m observations are retained for testing forecasting
performance. Let the residual sum of squares from the regression on the first n - m
observations be S, and denote the sum of squares of the m forecasting errors as S,. Then
under the null hypothesis of no structural change the statistic
H=(n-m-k)Sm/S
follows the chi-square distribution with m degrees of freedom. As in the case of the Chow
predictive failure test, the last 20% of the data set was taken as the forecasting period in these
experiments.

(5) The cusum of squares test (Brown ef al., 1975)


This is the stronger of the two tests suggested by these authors. Let RSSm be the residual sum
440 Journal of Forecasting Vol. 9, Iss. No. 5

of squares from equation (1) estimated over observations 1,2, ...,m. Then the series
RSSm/RSSnmay be calculated for m = k + 1, ..., n, a n d follows a beta-distribution. Critical
values for this statistic are given by Harvey (1981). This is effectively a test o n the recursive
residuals, since RSS, is equal to the sum of squares of the first (r - k ) recursive residuals.

RESULTS

T h e results presented in Table I for tests (1) t o (4) refer t o 500 replications of the random walk
model for sample sizes of 40 a n d 100 observations. T h e cusum of squares test was carried out
only for a sample size of 40. As a n indication of the likelihood that a random walk of the
assumed magnitude could lead to serious forecasting errors, the end-period parameter values
were assessed in comparison t o the 95% confidence interval from the regression. For n = 40,
parameters ended outside such a 95% confidence interval o n 43% of occasions, and for
n = 100, o n 65% of occasions. T h e 95% confidence intervals included the true end-values of
both parameters in the equation on only 34% o f occasions for n = 40 a n d 15% for n = 100.
Tests were evaluated according t o the number of rejections of the null hypothesis at the 0.05
level. The ordering of the tests was the same for both sample sizes, with the linear test
performing best, followed by the midpoint Chow test, the Hendry predictive failure test and
the Chow predictive failure test, in that order. T h e power of the linear a n d midpoint Chow tests
improved strongly with increasing sample size, but this was not true of the predictive failure
tests; indeed the Chow predictive failure test did not increase in power a t all. T h e cusum of
squares test was the weakest of all if based o n forward recursions only. If backward as well
as forward recursions were performed, its power was similar t o that of the Chow predictive
failure test (but the null would also be rejected more than 5% of the time). Other tests based
o n recursive residuals such as the cusum test and the runs tests of Dufour (1986) were not
systematically examined but appeared t o be even weaker.
Table I1 presents the results of the same tests, plus two quartile Chow tests, in the context
of a parameter shift model (the cusum of squares test was omitted, but Garbade, 1977, has
shown its relative weakness in this context). A sample size of 100 was used, a n d the tests were

Table I. Power of tests against structural change in random walk model


Rejection of Ho at 0.05 level
Number Percentage
n = 40; 500 replications
Chow midpoint 120 24
Linear 141 28
Chow predictive failure (33-40) 56 11
Hendry predictive failure (33-40) 103 21
Cusum of squares forward 32 6
Cusum of squares forward and back 53 11
n = 100; 500 replications
Chow midpoint 268 54
Linear 314 63
C h o w predictive failure (81-100) 54 11
Hendry predictive failure (81-100) 116 23

Note: Variance of random walk proportional to n - ’ .


M. Bleaney Structural Change Tests 441

Table 11. Power of tests against structural change in shift model (sample size of
100; 500 replications)

Rejection o f HOat the 0.05 level


Shift after obs. 80 Shift after obs. 65
Test Number Percentage Number Percentage
Linear 136 21 247 49
Q1 Chow 33 7 74 15
Midpoint Chow 70 14 190 38
4 3 Chow 188 38 216 43
Chow p. f. (81-100) 82 16 46 9
Hendry p. f. (81-100) 160 32 96 19

Nore: Q1 Chow and Q3 Chow tests: shift hypothesized to be after observations 25 and
7 5 , respectively.

carried out twice: once with the parameter shift after observation 80, and once after
observation 65. Thus in the first case the predictive failure tests were carried out at the exact
point of the parameter shift, whereas in the second they were not. When the shift is after
observation 65 the results are qualitatively very similar to those obtained in the random walk
model, with the linear and midpoint Chow tests performing considerably better than the
predictive failure tests. When the shift is after observation 80 so that predictive failure tests are
carried out at the exact shift point, the Hendry test outperforms the linear and midpoint Chow
tests, but not a ‘nearby’ Chow test (Q3 Chow). The Chow predictive failure test outperforms
a midpoint Chow test in this case.
Two separate factors are at work here. The midpoint Chow and linear tests lose power as
the shift point moves further away from the midpoint of the data set (Farley et al., 1975), while
the predictive failure tests gain power as the shift point moves towards their point of
application. Nevertheless, the predictive failure tests are still outperformed by an appropriately
located Chow test.

DISCUSSION

Some fairly clear-cut conclusions emerge from these experiments. One is that tests based on
simple alternative specifications of parameter behaviour (the linear and midpoint Chow tests)
are more powerful than predictive failure tests or tests based on recursive residuals as general
tests of structural change, even when the alternative is not correctly specified. An intuitive
explanation for this is that the predictive failure tests lose information by not estimating the
model over the forecasting period. Tests based on recursive residuals essentially suffer from

the same problem. Thus it is only in circumstances that are particularly favourable to the
predictive failure tests (where the structural change occurs near the end of the data set) that
they outperform the linear or midpoint Chow tests. Even in this case, a Chow test at the
hypothesized shift point is probably a better response. Unlike Chow tests, the predictive failure
and cusum of squares tests have some power against heteroscedasticity, but it is likely that they
have less power in this respect than alternative tests, for similar reasons. In other words, in
gaining power against a wider range of misspecification these tests lose power against given
forms of it. This is not to suggest that predictive failure and cusum of squares tests do not have
442 Journal of Forecasting Voi. 9, Zss. No. 5

some uses (this is discussed below); merely that they are less powerful than the Chow and linear
tests as general tests against structural change.
McAleer and Fisher (1982) have pointed out that the Chow test follows the Wald principle,
estimating towards the more restricted from the less restricted hypothesis, whereas the cusum
of squares and the predictive failure tests are based on the Lagrange Multiplier principle,
estimating towards the less restricted from the more restricted hypothesis. The results reported
here could therefore be regarded as a manifestation of the general result that Wald tests are
more powerful than Lagrange Multiplier tests in finite samples.
A further conclusion is that the rarely used linear test is consistently more powerful than a
Chow test at the midpoint of the data or any other randomly chosen Chow test. This has been
shown to be true for the parameter shift model by Farley et al. (1975) (except close to the
midpoint, where both tests have great power), but this paper has had little influence upon
econometric practice. The results given above show that it is also true for the random walk
model.
The Hendry predictive failure test consistently outperforms the Chow predictive failure test.
However, this largely reflects the fact that the Hendry test, by comparing the mean square
forecasting error with the mean square estimation residual, ignores the component of the
forecasting error that reflects errors in the estimation of the parameters, and therefore tends
to over-reject the null. For example, in the present case, in 250 replications of a sample size
of 40 with a = b = 1, the linear, quartile and midpoint Chow tests and the Chow predictive
failure test all rejected at the 0.05 level between 4% and 6% of the time, but the Hendry
predictive failure test rejected on more than 12% of occasions.
The proliferation of tests against structural change raises dangers of overtesting, especially
given the relative ease of application in the latest microcomputer regression packages. By
overtesting we mean the adoption of a test procedure in which the probability of rejecting the
null hypothesis of no structural change is significantly greater than the nominal size of the test.
This might occur, for example, if a model is rejected if it fails either a Chow or a predictive
failure test; or if a cusum or cusum of squares test is used to select at which point in the data
a Chow test should be applied. To avoid overtesting, the correct procedure is to choose one
test which will stand as the criterion by which the model may be rejected on the grounds of
structural instability. The results given above strongly suggest that this should be the linear test,
with the midpoint Chow test as a reasonable alternative. Other tests may then be used as ways
of ‘getting to know the data’ and of understanding discrepancies with earlier results (recursive
estimation may be particularly useful in this regard). From a forecasting point of view evidence
of instability in the last few observations is of particular interest and yet is unlikely to be
detected by general tests for structural change. This is where predictive failure tests are
potentially useful, although it would appear that a Chow test is still preferable if the forecast
period contains sufficient observations to permit one.

CONCLUSIONS

With the recent proliferation of tests against structural change in regression models there has
been an explosion of possible testing strategies, but comparatively little discussion of the
optimal strategy. This paper has reported the results of Monte Carlo experiments on a two-
variable regression model in which the parameters follow a random walk. It emerges that in
such a context tests for structural change based on the Chow principle of specifying an explicit
alternative hypothesis tend to be more powerful than predictive failure tests or those based on
M. Bleaney Structural Change Tests 443

recursive residuals. Experiments were carried out for sample sizes of 40 and 100, and indicated
that the power advantage of the specific tests is much greater in larger sample sizes, so that it
is possible that this conclusion does not hold for small data sets (less than 30 observations, say).
It was found that a test for linear trends in the parameters was more powerful than a standard
Chow test, and it is recommended that this is the best general test against structural change in
time series data. As a check that these results were not specific to the chosen form of parameter
instability, the experiment was repeated with a once-for-all parameter shift model.
The conclusion that tests against specific alternatives tend to be more powerful than
apparently more general tests is interesting, since one might have thought that misspecification
of the alternative hypothesis would greatly reduce the power of the former group (indeed, an
intuition of this kind has probably provided the motivation to develop the non-specific tests).
The explanation for this result would appear to be that the non-specific tests are all based on
out-of-sample forecasting performance (since even recursive residuals are standardized one-
step-ahead forecasting errors), and in retaining some of the data for this purpose they
effectively lose the opportunity to use it for estimation of an alternative. This loss apparently
outweights the disadvantages of misspecification. Tests of out-of-sample forecasting
performance are most useful for diagnosing structural change close to the end of the data set,
where the linear test is weakest. Analysis of recursive residuals is best regarded as an
exploratory technique, a way of ‘getting to know the data’, rather than as a rigorous test of
structural stability.

ACKNOWLEDGEMENTS

The author acknowledges the financial support of the Nottingham University Research Fund.
He wishes to thank Christine Green for research assistance, and David Hendry and participants
in seminars at the Universities of Loughborough and East Anglia for helpful comments on
earlier versions of this paper. Any errors that remain are, of course, entirely his own
responsibility.

NOTE

1. In rhis context it is worth noting that the Chow predictive failure test in effect tests whether the mean square
recursive residual is the same in the forecasting period as in the estimation period, since the test statistic is the ratio
of the latter to the former.

REFERENCES

Brown, R. L., Durbin, J . and Evans, J . M., ‘Techniques for testing the constancy of regression
relationships over time’, Journal of the Royal Statistical Society, Series B, 37 (1975), 149-92.
Chow, G. C., ‘Tests of equality between sets of coefficients in two linear regressions’, Econometrica, 28
(1960), 591-605.
Dufour, J.-M., ‘Recursive stability analysis of linear regression relationships’, Journal of Econometrics,
19 (1982), 31-76.
Dufour, J.-M., ‘Recursive stability analysis: the demand for money during the German hyperinflation’,
in Belsley, D. A. and Kuh, E. Model Reliability, London: MIT Press, 1986.
Farley, J. U., Hinich, M . J. and McGuire, T., ‘Some comparisons of tests for a shift in the slopes of
a multivariate linear time series model’, Journal of Econometrics, 3 (1975), 297-318.
444 Journal of Forecasting Vol. 9, Iss. No. 5

Garbade, K . , ‘Two methods for examining the stability of regression coefficients’, Journal of the
American Statistical Association, 72 (1977), 54-63.
Harvey, A . C . , The Econometric Analysis of Time Series, New York: John Wiley, 1981.
Hendry, D. F . , ‘Predictive failure and econometric modelling in macroeconomics: the transactions
demand for money’, in Ormerod, P., (ed.), Economic Modelling, London: Heinemann, 1979.
McAleer, M. and Fisher, G., ‘Testing separate regression models subject to specification error’, Journal
of Econometrics, 19 (1982), 125-45.
Pesaran, M. H., Smith, R. P. and Yeo, J . S., ‘Testing for structural stability and predictive failure: a
review’, The Munchester School, 53 (1985), 280-95.

Author’s biography:
Michael Bleaney holds a BA and PhD from the University of Cambridge. He is currently Lecturer in
Economics at the University of Nottingham and is the author of three books as well as articles in a variety
of journals.

Author’s address:
Michael Bleaney, Department of Economics, University of Nottingham, Nottingham NG7 2RD, U.K

You might also like