0% found this document useful (0 votes)
6 views14 pages

TIME SERIES Modeling: 1 Model Selection Using Information Criteria

Chapter 3 discusses diagnostic checking and model selection for time series modeling, focusing on the Akaike Information Criterion (AIC) for model selection and the importance of checking residuals for white noise to validate ARMA models. It also covers methods for transforming non-stationary time series into stationary ones, including differencing and power transformations. The chapter concludes with an introduction to ARIMA models and their application in fitting time series data.

Uploaded by

yujiaaoro
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views14 pages

TIME SERIES Modeling: 1 Model Selection Using Information Criteria

Chapter 3 discusses diagnostic checking and model selection for time series modeling, focusing on the Akaike Information Criterion (AIC) for model selection and the importance of checking residuals for white noise to validate ARMA models. It also covers methods for transforming non-stationary time series into stationary ones, including differencing and power transformations. The chapter concludes with an introduction to ARIMA models and their application in fitting time series data.

Uploaded by

yujiaaoro
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

TIME SERIES Modeling

Chapter 3: diagnostic checking and model selection

1 Model selection using Information criteria


Evaluation of the graphs of sample ACF and PACF allows to preliminarily choose
order p and q of ARMA model to be fit. A final decision is done using AIC criterion
which allows us to compare the fit of different models.

DEFINITION 1 Assume that a statistical model of M parameters is fitted to data.


The Akaike’s Information Criterion (AIC) statistic is defined as

AIC = −2ln[maximum likelihood of data] + 2M.

Suppose that the white noise Zt is Gaussian distribution with variance σ 2 . Let
n
P
σ̂ 2 = n1 (Xj − X̂j )2 /rj−1 with rj−1 being some constants, independent of σ 2 . It
j=1
turns out that the log likelihood function of ARMA(p,q) models is
n
ln L = − lnσ̂ 2 + Const.
2

DEFINITION 2 The Akaike’s Information Criterion (AIC) statistic for ARMA(p,q)


models is defined as
AIC = nlnσ̂ 2 + 2(p + q),
where σ̂ 2 stands for the estimated error variance and n is the number of ob-
servations.

How does it works ? Choose the model (choose the values of p, q) with
minimum AIC.
Intuitively, one can think of 2(p + q) as a penalty term to discourage over-
parameterization.

1
2 Model diagnostic checking
OK means that the fitted model can describe the dependence structure of a time
series adequately.
If ARMA model

Xt = δ + φ1 Xt−1 + · · · + φp Xt−p + Zt + θ1 Zt−1 + · · · + θq Zt−q

is adequate, then {Zt } should be white noise (WN).


If the ARMA model is adequate, the residuals should be WN (approximately
at least).
Recall what is white noise? If {Zt } is WN, then ρk (or ρ(k)) is zero. In practice
we may use its sample ACF. For example, if an AR(1) model is considered, the
residuals are

Ẑt = Xt − δ̂ − φ̂1 Xt−1 .

To check the dependence structure, we calculate


Pn−k
(Ẑt − ā)(Ẑt+k − ā)
rk = t=1Pn , k≥1
2
t=1 (Ẑt − ā)
P
where ā = Ẑt /n. Here rk is called the residual autocorrelation at lag k. Thus,
if a model is adequate we expect

rh ≈ 0.

THEOREM 1 If H0 : ρ(k) = 0 is true, then

1
ρ̂(k) = rk ∼ N(0, ).
n
We can use the above as a rough guide on whether each ρ(k) is zero.

DEFINITION 3 Overall test, define


m
X
Q(m) = n(n + 2) rk2 /(n − k)
k=1

where 0 << m << n (usually, m ≈ n/5). Q(m) is called the Ljung-Box statistic
(or Portmanteau statistic).

2
If the fitted model is OK (adequate), then
2
Q(m) ∼ χm−np

where np is the number of parameters (exclusive of δ) in the ARMA model. For


example, if the model is Xt = φ1 Xt−1 + Zt , then np = 1.
e.g. if the model is Xt = φ1 Xt−1 + φ3 Xt−3 + Zt , then np = 2.
e.g. if the model is Xt = δ + φ1 Xt−1 + φ2 Xt−3 + θ1 Zt−1 + Zt , then np = 3 (δ will
not be counted).
EXAMPLE 1 The data Xt , t = 1, · · · , 20 are observed as: 0.50, -0.41, 0.37, -0.61, 0.23,
-0.13, 0.06, -0.11, 0.18, -0.14, 0.20, 0.09, -0.03, -0.02, -0.14, -0.07, 0.09, 0.09, -0.01, -0.10

Series y Series y
1.0

0.4
0.2
0.5

0.0
Partial ACF
ACF

−0.2
0.0

−0.4
−0.5

−0.6

0 2 4 6 8 10 12 2 4 6 8 10 12

Lag Lag

We may try model Xt = δ + φ1 Xt−1 + Zt by looking at SPACF and SACF.


fit = arima(y, order = c(1,0,0))
The fitted model
X̂t = 0.0075 − 0.832Xt−1
The residuals are et : 2, 3, ..., 20: 0.00, 0.02, -0.31, -0.29, 0.05, -0.06, -0.07, 0.08,
0.00, 0.08, 0.25, 0.04, -0.05, -0.16, -0.19, 0.02, 0.16, 0.06, -0.12
The SACF for et are
r1 = 0.34, r2 = −0.21r3 = −0.12,
r4 = −0.22, r5 = −0.09, r6 = 0.09,
r7 = −0.18, r8 = −0.24, r9 = 0.02, r10 = 0.10
(each H0 : ρe (k) = 0, k = 1, ..., 10 can be accepted separately, why?)
Consider the Ljung-Box test. If we let m = 5, then
Q(5) = n ∗ (n + 2) ∗ (r12 /(n − 1) + r22 /(n − 2)
+r32 /(n − 3) + r42 /(n − 4) + r52 /(n − 5))
= 5.6964

3
2 2
Since χ0.05 (5 − 1) = 9.49, Q(5) < χ0.05 (5 − 1). Thus we can not reject the adequacy
of the model by setting α equal to 0.05.
From the table of the residual we can also see that the p value is 0.3819 >
0.05 = α when taking m=6. Again we can not reject the adequacy of the model
by setting α equal to 0.05. This is consistent with that obtained by comparing the
critical value with the observed statistic.
Using tsdiag(fit) yields the following plot.

Standardized Residuals

2
1
0
−1
−2

5 10 15 20

Time

ACF of Residuals
0.8
0.4
ACF

−0.4 0.0

0 2 4 6 8 10 12

Lag

p values for Ljung−Box statistic


0.8
p value

0.4
0.0

2 4 6 8 10

lag

3 Using ACF and PACF of of residuals to improve the


model
EXAMPLE 2 : Suppose that we fit AR(1) model

Xt = φ1 Xt−1 + Zt .

IF the SACF of Ẑt has a cut-off after lag 1, then it suggests

Ẑt ∼ MA(1)

i.e
Ẑt = et + θet−1

4
Thus
Xt ∼ ARMA(1, 1).
Hopefully, êt is now closer to white noise.
If the SPACF of Ẑt has a cut-off after lag 1, it suggests
Ẑt ∼ AR(1)
i.e
Zt = ψ1 Zt−1 + et−1 .
Thus
(Xt − φ1 Xt−1 ) = ψ1 (Xt−1 − φ1 Xt−2 ) + et−1
i.e.
Xt ∼ AR(2).
Hopefully, êt is now closer to white noise.

4 How to change a non-stationary time series into sta-


tionary one
According to the definition of stationarity there are 3 types of non-stationarity:
⊲ Non-stationarity in mean: EXt depends on t;
⊲ Non-stationarity in variance: var(Xt ) depends on t;
⊲ Non-stationarity in covariance: cov(Xt , Xt+k ) depends on t for some k.
EXAMPLE 3 Suppose that {Yt } is a stationary time series. Let Xt = a + bt + Yt .
Apparently, Xt is not a stationary process. However applying the first difference
operator ▽ to {Xt } yields a stationary process. Therefore applying the difference
operator is one way of obtaining a stationary process. 
EXAMPLE 4 Suppose {Yt } is an i.i.d. sequence with γ(k) = Cov(Yt ) independent
of time t. Let Xt = et+Yt . Can we find a suitable d such that {(1 − B)d Xt } is
stationary? (No).
Let Ut = log(Xt ). What happen to {Ut }? 
Not all non-stationary series can be transformed to stationary ones by differ-
encing. Many time series are stationary in the mean but are not stationary in the
variance such as Example 2. To overcome this problem, we need to stabilize the
variance of the time series by using pre-differencing transformation.
We first consider the power transformation to remove some possible non-
stationarity in variance.
Xλ − 1
T(Xt ) = t
λ

When and how to perform the power transformation?

5
Values of lambda Transformation
1
-1.0 Xt
-0.5 √1
Xt
0.0 ln(X
√ t)
0.5 Xt
1.0 Xt (no transformation)

(a) If the variability of a time series increases as time advances it then implies
that the time series is non-stationary with respect to its variance; See figure
1 below.
1800

1600
no. of passengers

1400

1200

1000

800
0 2 4 6 8 10 12 14 16 18 20
time: month

Figure 1:

(b) Find the one with minimum sample variance.

7.6
log(no. of passengers)

7.4

7.2

6.8

6.6
0 2 4 6 8 10 12 14 16 18 20
time: month

Figure 2: log transformation of data in Fig 1

6
After transformation, we then consider the possible differecing to make the
time series stationary.
Criteria: The ACFs of non-stationary time series converges to zero slowly, how-
ever, those of stationary time series converges to zero fast.

5 ARIMA model
EXAMPLE 5 Figure 3 shows US Dow Jones Industrial Average Market Index {Yt }
from 17-Jul-02 to 20-Mar-03.
{Yt } is not stationary. See figures 3 and 4.

z = y −y
t t t−1
80

60

40

20

−20

−40

−60
0 20 40 60 80 100 120 140 160 180

Figure 3:

Series y Series y
1.0

0.8
0.8

0.6
0.6

Partial ACF

0.4
ACF

0.4

0.2
0.2

0.0
0.0

0 5 10 15 20 5 10 15 20

Lag Lag

Figure 4: series Y

Moreover, by figure 5 we can fit the following models to the data


xt = Zt + θ19 Zt−19

7
Series z Series z

0.15
1.0

0.10
0.8

0.05
0.6

Partial ACF

0.00
ACF

0.4

−0.05
0.2

−0.10
0.0

−0.15
−0.2

0 5 10 15 20 5 10 15 20

Lag Lag

Figure 5: series z or x

or
xt − φ1 xt−1 − · · · − φ19 xt−19 = Zt .
Generally, we can fit the difference xt = Xt − Xt−1 of a time series by a
ARMA(p,q) model,

φp (B)(Xt − Xt−1 ) = θq (B)Zt

or

φp (B)(1 − B)Xt = θq (B)Zt .

This is an ARIMA(p, 1, q) model.


We can also consider higher order difference,

wt = xt − xt−1 = (Xt − Xt−1 ) − (Xt−1 − Xt−2 )


= Xt − 2Xt−1 + Xt−2 = (1 − 2B + B2 )Xt
= (1 − B)2 Xt .

If we fit wt by

φp (B)wt = θq (B)Zt

or

φp (B)(1 − B)2 Xt = θq (B)Zt .

This is an ARIMA(p,2,q) model. More generally, we define ARIMA(p,d,q) as

φp (B)(1 − B)d Xt = θq (B)Zt .

For the example, we can fit ARIMA(0,1,19) to Yt in the example

8
For the example, we can fit ARIMA(0,1,19) to yt in the example

fitma = arima(y, order = c(0, 1, 19))


Call: arima(x = y, order = c(0, 1, 19))
Coefficients:
ma1 ma2 ma3 ma4 ma5 ma6 ma7 ma8
-0.2219 0.0249 -0.1093 -0.0702 0.0666 0.0518 0.0191 0.1042
s.e. 0.0950 0.0832 0.0863 0.0898 0.0908 0.0867 0.0899 0.0948
ma9 ma10 ma11 ma12 ma13 ma14 ma15 ma16
-0.0537 -0.0714 -0.0685 0.1185 0.0134 0.0531 -0.0942 0.0521
s.e. 0.1016 0.1087 0.0896 0.0893 0.0987 0.1004 0.1145 0.1075
ma17 ma18 ma19
-0.1438 -0.1234 -0.4681
s.e. 0.0972 0.1059 0.0971
sigma^2 estimated as 665.5: log likelihood = -799.39, aic =
1638.78

tsdiag(fitma)
Standardized Residuals
2
1
0
−2

0 50 100 150

Time

ACF of Residuals
0.8
ACF

0.4
0.0

0 5 10 15 20

Lag

p values for Ljung−Box statistic


0.8
p value

0.4
0.0

2 4 6 8 10

lag

predict(fitma, [Link]= 20)


Call: arima(x = y, order = c(19, 1, 0))
Coefficients:

9
1500
1400
1300
y

1200
1100
0 50 100 150 200

1:171

Figure 6: the black dot is the observation; the red dots are the predictions

ar1 ar2 ar3 ar4 ar5 ar6 ar7 ar8


-0.0665 0.1378 -0.0443 -0.1369 -0.0163 0.0462 -0.0396 0.0475
s.e. 0.0749 0.0757 0.0764 0.0770 0.0796 0.0808 0.0808 0.0818
ar9 ar10 ar11 ar12 ar13 ar14 ar15 ar16
-0.0416 -0.1021 -0.0085 0.038 0.0039 -0.0407 -0.0506 0.0904
s.e. 0.0823 0.0817 0.0821 0.082 0.0826 0.0836 0.0837 0.0836
ar17 ar18 ar19
0.1094 -0.0049 -0.2208
s.e. 0.0840 0.0837 0.0832
sigma^2 estimated as 726.2: log likelihood = -801.95,
aic = 1643.9

tsdiag(fitma)
predict(fitma, [Link]= 20)
The fitted model is

Coefficients:
Factor 1: 1 + 0.05076 B**(1) - 0.03281 B**(2) + 0.0297 B**(3) + 0.00626 B**(4)
- 0.01162 B**(5) - 0.01904 B**(6) - 0.06261 B**(7) - 0.09867 B**(8) - 0.14025 B**(9) -
0.11528 B**(10) - 0.02889 B**(11) - 0.00768 B**(12) - 0.2065 B**(13) - 0.18588 B**(14)
+ 0.03721 B**(15) + 0.00595 B**(16) - 0.22064 B**(17)-0.1234B**(18) -0.4681B**(19)

10
Standardized Residuals

2
1
0
−2
0 50 100 150

Time

ACF of Residuals

0.8
ACF

0.4
0.0
0 5 10 15 20

Lag

p values for Ljung−Box statistic

0.8
p value

0.4
0.0

2 4 6 8 10

lag
1500
1400
1300
y

1200
1100

0 50 100 150 200

1:171

Figure 7: the black dot is the observation; the red dots are the predictions

11
EXAMPLE 6 Weekly sales of Super Tech Videocassette Tape [the data can be found
at the website]. 

> plot(1:161, y, xlim = c(0, 200), ylim=c(20, 100) )


> lines(1:161, y, type="l" ) # l for L

100
80
60
y

40
20

0 50 100 150 200

1:161

Figure 8: series y

> acf(y, [Link]=30)


> pacf(y, [Link]=30)

Series y Series y
1.0
1.0

0.8
0.8

0.6
0.6

Partial ACF

0.4
ACF

0.4

0.2
0.2

0.0
0.0

−0.2
−0.2

0 5 10 15 20 25 30 0 5 10 15 20 25 30

Lag Lag

Figure 9: series y

The the raw data is not stationary. We take difference

zt = yt − yt−1 = (1 − B)yt .

> acf(z, [Link]=30)


> pacf(z, [Link]=30)

12
Series z Series z

1.0

0.4
0.8

0.3
0.6

0.2
Partial ACF
0.4
ACF

0.1
0.2

0.0
0.0

−0.1
−0.2

−0.2
0 5 10 15 20 25 30 0 5 10 15 20 25 30

Lag Lag

Figure 10: series x

We can use ARIMA(0, 0, 6) for zt , or ARIMA(0,1,6) for yt .

> fit = arima(z, order = c(0,1,6))


> tsdiag(fit)

Standardized Residuals
2
1
−1 0
−3

0 50 100 150

Time

ACF of Residuals
1.0
0.6
ACF

0.2
−0.2

0 5 10 15 20

Lag

p values for Ljung−Box statistic


0.8
p value

0.4
0.0

2 4 6 8 10

lag

Figure 11: ACF of Residuals

Thus, the model is adequate (OK), i.e. there is no autocorrelation in the resid-
uals.
1. Write down the estiamted model
> fit
Call: arima(x = y, order = c(0, 1, 6))
Coefficients:

13
ma1 ma2 ma3 ma4 ma5 ma6
0.6331 -0.0160 0.0361 -0.0264 -0.1490 -0.4374
s.e. 0.0771 0.0892 0.0917 0.0879 0.1055 0.0783
sigmaˆ2 estimated as 4.896: log likelihood = -356.33, aic = 726.65
The fitted model is

(1 − B)yt = Zt + 0.6331Zt−1 − 0.0160Zt−2 + 0.0361Zt−3


−0.0264Zt−4 − 0.1490Zt−5 − 0.4374Zt−6 .

2. Fitted values are as follows.

> plot(1:161, y, xlim = c(0, 200), ylim=c(20, 100) )


> lines(1:161, y, type="l" )
> lines(1:161, y-fit$residuals, type="l", col="red")

3. Forecast for 6 steps ahead

> forecast = predict(fit, [Link]=6)


> lines(162:167, forecast$pred, type="o", col="red")
> lines(162:167, forecast$pred-1.96*forecast$se, col="blue")
> lines(162:167, forecast$pred+1.96*forecast$se, col="blue")
100
80
60
y

40
20

0 50 100 150 200

Index

Figure 12: the black dots are the observation and the red dots are the predictions

14

You might also like