TIME-SERIES
ECONOMETRICS
TOPIC 3D: OUTLIERS, STRUCTURAL
BREAK & VARIOUS PRACTICAL ISSUES
Box-Cox Transformation
Vietnam Export and Import Values, not seasonal-adjusted
Vietnam Export Values: Log Transformation
Vietnam export value: 24-month forecast
Box-Cox transformations
• Box-Cox transformations, which has a parameter 𝜆, are defined as
follows:
log 𝑦𝑡 if 𝜆 = 0
𝑤𝑡 = ൞𝑦𝑡𝜆 − 1
if 𝜆 ≠ 0
𝜆
• It helps to stabilize variance of data; the transformed data become
compatible with the ARIMA modeling.
• Packages in R, such as: fable, can make the transformation automatically
in ARIMA modelling.
Box-Cox transformations (2)
• Under different values of 𝜆, Box-Cox transformations are
equivalent to:
▪ 𝜆=1 : (No substantive transformation)
1
▪ 𝜆= : (Square root plus linear transformation)
2
▪ 𝜆 = 0 : (Natural logarithm)
▪ 𝜆 = −1 : (Inverse plus 1)
Outliers
Outliers
• An outlier: a data point that is significantly different from other
observations in a dataset
• Outlier detection: graphs or statistical criteria.
• Statistical criteria, such as: ±1.5 𝐼𝑄𝑅, are only meaningful when
the time series is stationary.
• Outliers can be an error or a genuine extreme data. Genuine
outliers are often associated with specific real-world events.
±𝟏. 𝟓/±𝟑. 𝟎 𝐈𝐐𝐑 (Interquartile Range)
When the data is stationary and has the
normal distribution, we have:
• The probability of data point is outside
of ±1.5 IQR is 7/1,000.
• The probability of data point is outside
of ±3 IQR is 1/500,000.
US Real GDP, not seasonal-adjusted
Decomposition of US RGDP
How to deal with outliers
• Remove.
• Replace.
• Using dummy variables.
STL Decomposition
• STL (Seasonal and Trend decomposition using Loess) is a
popular method for decomposing time series data into trend,
seasonal, and residual components using Loess (locally
weighted regression).
• It has been developed to handle multiple seasonal patterns.
E.g. weakly pattern overlaps with daily pattern of daily data.
• In R, we use the STL()function to perform STL
decomposition.
US Real GDP – seasonal adjusted using STL
Rolling and Recursive
Estimations
Rolling and Recursive Estimations
• Rolling and recursive estimation are two techniques for
updating a model over time using all available data.
Feature Rolling Recursive
Window Size Fixed Expands with each iteration
Start Point Shifts with each iteration Remains fixed
Analyzing parameter Forecasting, where more
Primary Use changes within a fixed-size data becomes available over
moving window time
Available data 𝒀𝟏 , 𝒀𝟐 , 𝒀𝟑 , … 𝒀𝑻 ,
R𝐨𝐥𝐥𝐢𝐧𝐠 𝐰𝐢𝐧𝐝𝐨𝐰 (𝑵)
R𝐨𝐥𝐥𝐢𝐧𝐠 𝐰𝐢𝐧𝐝𝐨𝐰 (𝑵)
R𝐨𝐥𝐥𝐢𝐧𝐠 𝐰𝐢𝐧𝐝𝐨𝐰 (𝑵)
R𝐨𝐥𝐥𝐢𝐧𝐠 𝐰𝐢𝐧𝐝𝐨𝐰 (𝑵)
Available data 𝒀𝟏 , 𝒀𝟐 , 𝒀𝟑 , … 𝒀𝑻 ,
R𝐞𝐜𝐮𝐫𝐢𝐯𝐞 𝐰𝐢𝐧𝐝𝐨𝐰 (𝑵𝟏 )
R𝐞𝐜𝐮𝐫𝐢𝐯𝐞 𝐰𝐢𝐧𝐝𝐨𝐰 (𝑵𝟐 )
R𝐞𝐜𝐮𝐫𝐢𝐯𝐞 𝐰𝐢𝐧𝐝𝐨𝐰 (𝑵𝟑 )
R𝐞𝐜𝐮𝐫𝐢𝐯𝐞 𝐰𝐢𝐧𝐝𝐨𝐰 (𝑵𝟒 )
Estimated coefficients for 𝑪𝑷𝑰𝒕 using the rolling and
recursive estimations for ARIMA(1,1,0)
Parameter Instability and
Structural Change
Structural Break
• The key assumption of the Box–Jenkins methodology is that the
parameters are constant, or there is no change in the data-
generating process.
• Under certain circumstances, there are many reasons to suspect
a structural break in the data-generating process.
• These breaks can be caused by the underlying factors of the
economy or the phenomena that generate the data.
Example: simulated data
Example: US interest rate spreads
Chow test for structural break
• The essence of the Chow test is to fit the same ARMA model the
pre-break data and to the post-break data.
• If the two models are not sufficiently different, it can be
concluded that there has not been any structural change.
Chow test for structural break (2)
• Suppose that we suspect there is a break at time 𝑡𝑚 from the sample
𝑡 = 1, 2, 3, … 𝑇
• We have two models for two subsample: pre-beark {1, 2 , … 𝑡𝑚 } and
post-break 𝑡𝑚+1 , 𝑡𝑚+2 , … , 𝑡𝑇 . We estimate two models:
∗ ∗
pre−beark: 𝑌𝑡∗ = 𝛼10 + 𝛼11 𝑌𝑡−1 + 𝛼12 𝑌𝑡−2 ∗
+ ⋯ + 𝛼1𝑝 𝑌𝑡−𝑝
+ 𝛽11 𝜀𝑡−1 + 𝛽12 𝜀𝑡−2 + ⋯ + 𝛽1𝑝 𝜀𝑡−𝑞 + 𝜀𝑡
∗ ∗
post−break 𝑌𝑡∗ = 𝛼20 + 𝛼21 𝑌𝑡−1 + 𝛼22 𝑌𝑡−2 ∗
+ ⋯ + 𝛼2𝑝 𝑌𝑡−𝑝
+ 𝛽21 𝜀𝑡−1 + 𝛽22 𝜀𝑡−2 + ⋯ + 𝛽2𝑝 𝜀𝑡−𝑞 + 𝜀𝑡
Chow test for structural break (3)
• The we test for the joint hypothesis that all coefficients for the same terms
are equal:
𝐻0 : 𝛼10 = 𝛼20 ; 𝛼11 = 𝛼21 ; … ; 𝛼1𝑝 = 𝛼2𝑝 ; 𝛽11 = 𝛽21 ; 𝛽12 = 𝛽22 ; … ; 𝛽1𝑝 = 𝛽2𝑝
• The test statistic is under the 𝐹-distribution as:
𝑺𝑺𝑹 − 𝑺𝑺𝑹𝟏 − 𝑺𝑺𝑹𝟐 /𝒌
𝑭𝐬𝐭𝐚𝐭 =
(𝑺𝑺𝑹𝟏 + 𝑺𝑺𝑹𝟐 )/(𝑻 − 𝟐𝒌)
where 𝑘 as the number of estimated parameters 𝑘 = 𝑝 + 𝑞 + 1 if an
intercept is included) and the degree of freedom are 𝑘, 𝑇 − 2𝑘 .
Chow test for structural break (4)
• Another method is to use the dummy variable to detect a break in
one or more of the coefficients.
• For example, if you suspect there is a break after period 𝑡𝑚 , you can
break a dummy 𝐷𝑡 𝑚 such that
𝐷𝑡 𝑚 = 0 for all 𝑡 ≤ 𝑡𝑚
𝐷𝑡 𝑚 = 1 for all 𝑡 > 𝑡𝑚
• You can learn about the existence of the break by testing the
coefficients of the dummy variable to be zero.
Endogenous Breaks
• The Chow test asks whether there is a break beginning at some
particular known break date 𝑡𝑚 .
• A break occurring at a date not prespecified by the researcher is
called an endogenous break.
• We can search the date that leads to the maximum, or
supremum, value of the sample 𝐹-statistic.
Quandt likelihood ratio (QLR)
• Quandt likelihood ratio (QLR) or sup-Wald is a type of test is
designed to look for the unknown break time.
• The approach is to look for the maximum Chow statistics across
all possible break points to test null hypothesis of no break
against the alternative of a one-time break.
• The test statistic is computed distinctive rather than using the
usual 𝐹-statistic values.
Quandt likelihood ratio (QLR)
• With a reasonable large sample, the subsample at the middle
70% of the sample is used to look for break points
• Two time point are used: 𝜏0 = 0.15𝑇 and 𝜏1 = 0.85𝑇. And the
test look for the maximum test statistic in the subsample:
𝑸𝑳𝑹 = 𝐦𝐚𝐱 𝑭 𝝉𝟎 , 𝑭 𝝉𝟎 + 𝟏 , … , 𝑭 𝝉𝟏
• It is important to note that the test can find multiple break
points.
Source: Introduction to Econometrics, 4th
Edition by James H. Stock and Mark W.
Watson that cites Andrews (2023) “Tests for
Parameter Instability and Structural
Change with Unknown Change Point: A
Corrigendum.” Econometrica 71: 395–397.
Other methods
• CUSUM: a recursive method to learn how an individual
coefficients evolve over time
• CUSUMSQ.
• Nyblom Test.
• Hansen Test.
Various Issues
Decomposition by official agencies
• Official statistics agencies (such as: the US Census Bureau…) are
responsible official economic and social time series.
• These agencies developed their own decomposition procedures.
• In R, we have the implementation as the function
X_13ARIMA_SEATS() for the decomposition.
• See more at [Link]
Weekly, daily and sub-daily data
• Weekly data is difficult to work with because the seasonal
period (the number of weeks in a year) is both large and non-
integer.
• The average number of weeks in a year is 52.18.
• We can use dynamic harmonic regression model – the Fourier
terms.
Weekly, daily and sub-daily data (2)
• Daily and sub-daily (such as hourly) data are challenging for a
different reason — they often involve multiple seasonal
patterns.
• Holiday effects can be treated by dummy variables.
• Approaches for these problems requires advanced technique.
Decomposition techniques, such as STL …, or harmonic
regression model can be utilized.
Missing values
• Missing data can arise for many reasons, and it is worth
considering whether the missingness will induce bias in the
forecasting model.
• No automated method can handle such effects as they depend
on the specific forecasting context.
• ARIMA model to the data containing missing values, and then
use the model to interpolate the missing observations.
• See more at [Link]