1
ARIMA Modelling and
Forecasting
BY SHIPRA MISHRA
INTERN
2
WHAT DOES FORECASTING MEANS?
Forecasting is a common statistical task in business, where it helps to inform
decisions about the scheduling of production, transportation and personnel,
and provides a guide to long-term strategic planning. At times , Time Series
forecasting is needed in absence of other dependent Variables , for future.
The basic steps in forecasting are:-
1. Problem definition
2. Gathering information
3. Choosing and fitting models
4. Preliminary analysis
5. Using and evaluating a forecasting model
3
WHAT ARE TIME SERIES PATTERNS?
TREND: A trend exists when there is a long-term
increase or decrease in the data. It does not have to be
linear.
SEASONAL: A seasonal pattern occurs when a time
series is affected by seasonal factors such as the time
of the year or the day of the week. Seasonality is
always of a fixed and known frequency.
CYCLIC: A cycle occurs when the data exhibit rises and
falls that are not of a fixed frequency. These fluctuations
are usually due to economic conditions, and are often
related to the “business cycle”. The duration of these
fluctuations is usually at least 2 years.
4
WHAT IS ARIMA?
It stands for Autoregressive Integrated Moving Average.
It is a class of statistical models for analyzing and forecasting
time series data.
AR: Autoregression. A model that uses the dependent
relationship between an observation and some number of
lagged observations.
I: Integrated. The use of differencing of raw observations in
order to make the time series stationary.
MA: Moving Average. A model that uses the dependency
between an observation and a residual error from a moving
average model applied to lagged observations.
5
ARIMA EQUATION
let us understand arima equation and how it is dependent on AR and MA.
y′t=c+ϕ1y′t−1+⋯+ϕpy′t−p+θ1εt−1+⋯+θqεt−q+εt
Where y′t is the differenced series (it may have been differenced more than once).
The“predictors” on the right hand side include both lagged values of yt and lagged errors.
Point forecasts can be calculated using the following three steps.
Expand the ARIMA equation so that yt is on the left hand side and all other terms are on
the right.
Rewrite the equation by replacing t withT+h.
On the right hand side of the equation, replace future observations with the forecasts,
future errors with zero, and past errors with the corresponding residuals.
6
STATIONARITY OF AR PROCESS
A stationary time series is one whose properties do not depend on the time at which
the series is observed.
If an AR model is not stationary, it means that
previous values of the error term will have a
non-declining effect on the current value of
the dependent variable.
This implies that the coefficients on the MA
process would not converge to zero as the
lag length increases.
A time series is stationary when it’s mean and variance remains constant.
7
What is Autoregression?
In an autoregression model, we forecast the variable of interest using a linear combination of
past values of the variable. The term autoregression indicates that it is a regression of the variable
against itself.
The test for stationarity in an
AR model (with p lags)is that
the roots of the characteristic
equation lie outside the unit
circle, where the
characteristic equation is:
1 1 z 2 z 2 ... p z p 0
From the curve we see that
value of p is 6.
8
WHAT IS DIFFRENCING?
The way to make a non-stationary time series stationary — compute the
differences between consecutive observations. This is known as
differencing.
Transformations such as logarithms can help to stabilize the variance of a
time series. Differencing can help stabilize the mean of a time series by
removing changes in the level of a time series, and therefore eliminating
(or reducing) trend and seasonality.
Since the curve is not stationary as its characteristics are notconstant so
differencing is of order 1.
9
WHAT IS MOVING AVERAGE?
A moving average model uses past forecast errors in a regression-like
model.
yt=c+εt+θ1εt−1+θ2εt−2+⋯+θqεt−q,
Where εt is white noise.
Notice that each value of yt can be thought of as a weighted
moving average of the past few forecast errors.
From the curve, we see that the value of q is 1.
10
AIC AND BIC CRITERION
Akaike’s Information Criterion (AIC)
It is useful in selecting predictors for regression, is also
useful for determining the order of an ARIMA model. It
can be written as
AIC=−2log(L)+2(p+q+k+1),
Where L is the likelihood of the data,
Bayesian Information Criterion
It can be written as
BIC=AIC+[log(T)−2](p+q+k+1).
Good models are obtained by minimizing BIC and
AIC.
11
FIRST MODEL
For the first model, we will take the order of p=8, d=1 and q=1. And observe
the BIC value:
12
SECOND MODEL
For the second model, we will take
the order of p=6, d=1 and q=1.
And observe the BIC value:
13
THIRD MODEL
For the third model, we will take the order of p=6, d=1 and q=2. And
observe the BIC value:
14
ORDER FOR ARIMA
From the above three observations we found
that the value of p and q for which BIC is
minimum is 6 and 1.
So, the order for arima forecasting will be (6,1,1).
The residual plot and kde plot implies that order
is (6,1,1).
15
COMPARING PREDICTION OF TEST DATA
WITH ACTUAL DATA
Here, we are using train test for prediction.
We are using 95% of data and predicting the 5% data. On comparing the
two plots of predicted data and actual data we get that predicted data is
approximately equal to actual data.
16
PREDICTION FOR NEXT ONE MONTH
Here, we have shown predicted data with historical data.
17
SUMMARY
In this presentation, we have learned about forecasting and how to use
arima model for forecasting
When using AR models, whether the series is stationary or not determine
how stable it is.
Forecasting of time series is an important measure of how well a model
works
NOTE: Because of the security concern proxy data is
used.