Chapter 5
Time Series Models
5.1 Description of a time series
Here we model time series, yt with t = 1, . . . , T , which are sequences of observations
ordered in time. The goal is to predict future values using only the dependency among
the observations in the time series.
Examples: Euribor, Eurodol, Airbcn
A time series can be described through its level, µt = E(yt ), its variability, σt2 = V ar(yt ),
and through its dependency that can be measured through its
1. AutoCorrelation Function, ACF(j)=ρj,t = Corr(yt , yt−j ) for j = 1, 2, . . ., and its
2. Partial AutoCorrelation Function, PACF(j)=ρPj,t for j = 1, 2, . . ., which measures
the correlation between yt and yt−j after removing the effect of yt−1 , . . . , yt−j+1 .
5.1.1 Stationary time series
We will model time series with ARMA models, which require that time series be sta-
tionary. A time series is stationary if its structure does not change with time and, in
particular, if µt = µ, σt2 = σ 2 , ρj,t = ρj and ρPj,t = ρPj for all j.
1
Chapter 5. Time Series Models 2
The ACF (j) = ρj and P ACF (j) = ρPj are called the autocorrelation and partial auto-
correlation of lag j. Their sample versions are denoted by ρ̂j and ρ̂Pj , and they work as
the time series fingerprint, which will be crucial when characterizing time series.
Often, time series are not stationary because their level and/or their variability change
with time. That was the case with the Euribor, Eurodol and Airbcn examples.
Non-stationarities are easy to identify through the time plots. Another warning of the ex-
istence of non-stationarities is the fact that the sample ACF and PACF of non-stationary
series tend to decrease slowly (linearly) with lag, j.
When the problem is that the level, µt , changes with time, one can often fix the problem
by modeling the series of the differences, zt = yt − yt−1 . If these differences do not have
a stationary mean either, one might need to further differentiate them a second time,
and work with st = zt − zt−1 = yt − 2yt−1 + yt−2 . It is rare that one needs to differentiate
the original time series more than twice to attain stationarity.
When the problem is that the variance, σt2 , changes with time, it is usually because σt2
increases with µt , and in that case one can try modeling log yt instead.
When both kinds of non-stationarities are in place, which is quite frequent, one needs to
take logs and differentiate, and model rt = log yt − log yt−1 = log yt /yt−1 .
5.1.2 White noise
A stationary time series is called white noise when µ = 0, ρj = 0 and ρPj = 0 for all j,
and thus when there is no dependency that can be used to predict the future.
The noise in normal linear models was assumed to be normally distributed white noise.
Here we will also build models with a noise component that is expected to be white noise,
with no information left to spare. To validate these models one will need to recognize
whether residuals qualify as a sample of white noise or not.
For that, it helps to know that the ρ̂j and ρ̂Pj of a sample of white noise are close to
being Normal(0, V ar(ρ̂j ) = 1/T ) distributed, where
√ T is the time series length. Hence,
when some |ρ̂j | or |ρ̂j | are much larger than 2/ T , the sample can not be white noise.
P
Exercise: Simulate a sample of white noise, and compute its ACF and PACF.
Applied Statistics Josep Ginebra
Chapter 5. Time Series Models 3
5.1.3 Seasonal time series
A periodic time series it is called seasonal. When a time series is monthly the period is
usually s = 12, when it is quarterly s = 4, and when it is daily usually s = 5, 7 or 30.
One will know that a time series is seasonal by noticing a periodic behavior in its time
series plot. One can also identify a periodicity through the sample ACF and PACF of
the original time series or of the series of differences.
5.2 Autoregressive models, AR(p)
A time series, yt , follows an autoregressive model of order p, AR(p), when:
yt = ϕ0 + ϕ1 yt−1 + . . . + ϕp yt−p + ϵt , (5.1)
where ϵt is (normally distributed) white noise.
When a time series follows an AR(p) model, the PACF is such that ρPj = 0 for all j > p
and the ACF is such that |ρj | decreases exponentially with increasing j. When the
sample ACF and PACF behave in this way, an AR(p) model might be adequate.
Example: AR(1)MA(1)Examples
These models can be fitted by least squares, leading to:
yt = ϕ̂0 + ϕ̂1 yt−1 + . . . + ϕ̂p yt−p + et = ŷt + et . (5.2)
To simplify the model, one needs to check whether ϕp = 0. One shoul not remove ϕp
unless one has removed all the ϕk with k > p.
To answer questions about ϕp based on ϕ̂p , one uses the fact that if the model is correct,
then ϕ̂p is approximately Normal(ϕp , V ar(ϕ̂p )) distributed.
Hence, one can check whether ϕp = 0 by computing approximate 95% confidence intervals
for ϕp , [ϕ̂p ± 2 ∗ sϕ̂p ], and checking whether 0 is in the interval. One can also use the
number of standard deviations separating ϕ̂p from 0, |tp | = |ϕ̂p /sϕ̂p |. If |tp | is less than
2, one can not reject ϕp = 0; otherwise, the larger |tp | the less likely it is that ϕp = 0.
Applied Statistics Josep Ginebra
Chapter 5. Time Series Models 4
To check whether the model is correct, one needs to carry out a residual analysis analo-
gous to the one for linear models. The novelty is that here one should also check whether
there is any dependency left in the residuals through their ACF and PACF. If all the ρ̂j
and ρ̂Pj of the residuals are small enough, it is likely that ρj = 0 and ρPj = 0 for all j,
and nothing contradicts the hypotheses that ϵt is white noise.
More specifically, to decide whether ϵt is white noise one checks whether all the ρ̂j and
ρ̂Pj of the residuals could be Normal(0, √ V ar(ρ̂j ) = 1/T ) distributed. If some values of
|ρ̂j | and/or |ρ̂Pj | are much larger than 2/ T , there is still dependency left in the residual
time series and one can further improve the model.
Cross validation also applies here, and it can be used to select p and to compare the
performance of AR(p) models with the models presented next, but its implementation
is not as straightforward as the one described for linear models.
5.3 Moving average models, MA(q)
A time series, yt , follows a moving average model of order q, MA(q), when:
yt = θ0 − θ1 ϵt−1 − . . . − θq ϵt−q + ϵt , (5.3)
where ϵt is (normally distributed) white noise.
When a time series follows a MA(q) model, the ACF and PACF are such that ρj = 0
for all j > q and that |ρPj | decreases exponentially with increasing j. When the sample
ACF and PACF behave in this way, an MA(q) model might be adequate.
Everything regarding model fitting, inference, and model checking works in a similar way
as for autoregressive models. In particular, one should not try to simplify the model by
removing θj unless one has removed all θk with k > j.
5.4 ARIMA(p,d,q) models
A time series, yt , follows an ARMA model of order (p, q), denoted ARMA(p, q), when:
yt = ϕ0 + ϕ1 yt−1 + . . . + ϕp yt−p − θ1 ϵt−1 − . . . − θq ϵt−q + ϵt , (5.4)
Applied Statistics Josep Ginebra
Chapter 5. Time Series Models 5
where ϵt is white noise.
When one needs to differentiate the original time series d times to make it stationary,
and the differentiated series follows an ARMA(p, q) model, one states that yt follows an
ARIMA(p, d, q) model. In practice a d of 1 or at most 2 is usually enough to attain
stationarity. The I in ARIMA stands for Integrated.
Model fitting, model checking and inference in this general case work as described above.
In practice, instead of guessing the p, d and q values by examining the ACF and PACF
of the original time series, usually one chooses p, d and q by trial and error, to be the
smallest values leading to a residual time series that qualifies as white noise.
That requires checking whether all the ρ̂j and all the ρ̂Pj of the residual time series are
small enough to accept that all the ρj and ρPj might be equal to 0. To help do that, one
can use the Box-Pierce statistic that summarizes the ACF of residuals through:
∑
h
ρ̂2j
Box-Pierce(h) = T (T + 2) , (5.5)
j=1
T −j
where T is the time series length. When a time series is white noise, its Box-Pierce(h)
statistic is approximately χ2h−r distributed, where r is the number of parameters in the
model. Statistical packages often provide this Box-Pierce(h) statistic for the residual
time series with h = 12, 24, 36, 48, together with the right tail area. When the value of
this statistic is much larger than h − r and the right tail area is much smaller than .05,
certain values of ρ̂2j are far too large for the time series to be white noise.
When fitting ARIMA models with many parameters, the model fitting process sometimes
fails to converge. Sometimes, one way around that is to remove the constant term, ϕ0 ,
from the model, which in this context is often not needed. That is typically warranted
when one is modeling a differentiated time series.
Examples: GDP USA, PIB Spain
5.5 Seasonal ARIMA models
A time series, y1 , . . . , yt , . . ., follows a seasonal ARMA model with periodicity s and order
(P, Q), denoted by SARMAs (P, Q), when:
yt = ϕ0 + ϕ1 yt−s + . . . + ϕp yt−p∗s − θ1 ϵt−s − . . . − θq ϵt−q∗s + ϵt , (5.6)
Applied Statistics Josep Ginebra
Chapter 5. Time Series Models 6
where ϵt is white noise.
To deal with trends at a seasonal level, one often needs to model lagged s differences,
zt = yt − yt−s , or lagged s differences of order two, st = zt − zt−s = yt − 2yt−s + yt−2∗s ,
to deal with more complex seasonal non-stationarities.
If the differentiated time series follows a SARMAs (P, Q) model, the original time series
follows a SARIMAs (P, D, Q) model, where D is the number of lagged s differences.
In practice, usually one needs to use both a seasonal component capturing the seasonal
dependencies together with a plain ARIMA component, capturing the short term de-
pendencies, leading to the use of SARIMAs (P, D, Q)(p, d, q) models.
One chooses the smallest values for P, D, Q, p, d and q that lead to a residual time series
that qualifies as white noise, using a trial and error approach. To do that, one needs to
pay special attention to the sample autocorrelations and partial autocorrelations of lags
multiple of s, together with the ones for the smallest lags.
Example: Activity in Barcelona harbor
5.6 ARIMAX model
Sometimes, together with the time series yt for t = 1, . . . , T one also has a set of explana-
tory variables, x1t , x2t , . . . , xpt . When that is the case, one can combine a linear model
capturing the information about yt in the xjt ’s, and a time series model capturing the
information in the dependencies left in the residuals of the linear model.
This constitutes what is called an ARIMAX model, that is usually fitted in a single shot.
Applied Statistics Josep Ginebra