0% found this document useful (0 votes)
31 views19 pages

ARIMA Time Series Forecasting Project

The document discusses time series analysis and forecasting using ARIMA models. It describes the components of time series analysis including trend, seasonality, and randomness. It also covers extracting and preparing the data, checking for stationarity, differencing the data, modeling with ARIMA including identifying AR and MA terms from ACF and PACF plots, and evaluating the model residuals.

Uploaded by

atmagarag
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views19 pages

ARIMA Time Series Forecasting Project

The document discusses time series analysis and forecasting using ARIMA models. It describes the components of time series analysis including trend, seasonality, and randomness. It also covers extracting and preparing the data, checking for stationarity, differencing the data, modeling with ARIMA including identifying AR and MA terms from ACF and PACF plots, and evaluating the model residuals.

Uploaded by

atmagarag
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Business Forecasting (MBA3017)

END-TERM PROJECT
TIME SERIES FORECASTING USING
ARIMA

Submitted to
Dr. Rosewine Joy
Professor
School of Management (SOM)
Presidency university

Submitted By:
ATMANAND P GARAG
20212MBA0055
SECTION “2”

Date of Submit:
07/12/2022
INTRODUCTION
TIME SERIES ANALYSIS
Time series forecasting focuses on analyzing data changes across equally spaced
time intervals. Time series analysis is used in a wide variety of domains, ranging
from econometrics to geology and earthquake prediction; it’s also used in almost all
applied sciences and engineering. Examples of time series data include S&P 500
Index, disease rates, mortality rates, blood pressure tracking, global temperatures.
This post will be looking at how the autoregressive integrated moving
average(ARIMA) models work and are fitted to time series data. The first point to
consider before moving forward is the difference between Multi and Univariate
forecasting. The former uses only the previous values in time to
forecast future values.

DATA SET
The monthly average price of crude oil in India
 Date
 Price($/bbl)
We will explore how Crude oil prices in India Have varied from 2000-2022

DATA SOURCE
Socio-Economic Statistics India, Statistical Data Figures 2022 ([Link])

Time Series in R
Time Series in R is used to see how an object behaves over a period of time. In R, it
can be easily done by the ts function with some parameters. Time series takes the
data vector and each data is connected with a timestamp value as given by the user.
This function is mostly used to learn and forecast the behavior of an asset in business
for a period of time. For example, sales analysis of a company, inventory analysis,
price analysis of a particular stock or market, population analysis, etc.
Syntax: objectName <- ts(data, start, end, frequency)
where,
 data represents the data vector
 start represents the first observation in the time series
 end represents the last observation in the time series
 frequency represents the number of observations per unit of time. For example,
frequency=1 for monthly data.

Components of Time Series Analysis

The reasons or forces that change the attributes of a time series are known as the
Components of Time Series.
The following are the components of time series −
 Trend
 Seasonal Variations
 Cyclical Variations
 Random or Irregular Movements

TIME SERIES GRAPHICS


The first thing to do in any data analysis task is to plot the data. Graphs enable many
features of the data to be visualised, including patterns, unusual observations,
changes over time, and relationships between variables. The features that are seen in
plots of the data must then be incorporated, as much as possible, into the forecasting
methods to be used. Just as the type of data determines what forecasting method to
use, it also determines what graphs are appropriate. But before we produce graphs,
we need to set up our time series in R.
For time series data, the obvious graph to start with is a time plot. That is, the
observations are plotted against the time of observation, with consecutive
observations joined by straight lines. Below Figure shows the Monthly consumption
of LPG in india.
Here we can see the random variation in the Data sets

DECOMPOSE

R uses the default additive time series model to decompose the data. To use the
multiplicative model, we specify type equals to multiplicative in the decompose
function. We don’t calculate the trend with the first and last few values. The seasonal
component repeats from year to year.

Here we see a very nice clean visualization showing our original time series, trend,
seasonal and random components.
ADF TEST

Augmented Dickey-Fuller Test in R, If a time series has no trend, constant variance


over time, and a consistent autocorrelation structure across time, it is considered to be
“stationary.”
An augmented Dickey-Fuller test, which uses the following null and alternative
hypotheses to determine whether a time series is stationary, is one technique to do so
Code For ADF TEST

Results

Here the p-value is more than 0.05 that means it’s not a stationary
Differencing to remove a trend or seasonal effects
An alternative to decomposition for removing trends is differencing. We saw in the
lecture how the difference operator works and how it can be used to remove linear
and nonlinear trends as well as various seasonal features that might be evident in the
data. As a reminder, we define the difference operator as
In R we can use the diff() function for differencing a time series, which requires 3
arguments: x (the data), lag (the lag at which to difference), and differences (the
order of differencing
For example, first-differencing a time series will remove a linear trend
(i.e., differences = 1); twice-differencing will remove a quadratic trend
(i.e., differences = 2). In addition, first differencing a time series at a lag equal to the
period will remove a seasonal trend (e.g., set lag = 12 for monthly data).
Let’s use diff() to remove the trend and seasonal signal from the data

CODES

STATIONARY DATA PLOT


Here the P value is 0.01 which means the data is stationary

ARIMA MODEL

ARIMA stands for Autoregressive Integrated Moving Average and is specified by


three order parameters: (p, d, q).

 AR(p) Autoregression: A regression model that utilizes the dependent


relationship between a current observation and observations over a previous
period. An autoregressive (AR(p)) component refers to the use of past values in
the regression equation for the time series.
 I(d) Integration: Uses differencing of observations (subtracting an observation
from observation at the previous time step) in order to make the time series
stationary. Differencing involves the subtraction of the current values of a series
with its previous values d number of times.
 MA(q) Moving Average: A model that uses the dependency between an
observation and a residual error from a moving average model applied to lagged
observations. A moving average component depicts the error of the model as a
combination of previous error terms. The order q represents the number of terms
to be included in the model.

1. Auto-regressive Component
It implies the relationship of a value of a series at a point in time with its own
previous values. Such a relationship can exist with any order of lag. Lag is
basically the value at a previous point in time. It can have various orders as
shown in the table below. It hints toward a pointed relationship.

Moving average components

It implies the current deviation from the mean depends on previous deviations.
Such a relationship can exist with any number of lags that decides the order of
the moving average.
The moving Average is the average of consecutive values at various time
periods. It can have various orders as shown in the table below. It hints toward
a distributed relationship as moving itself is derivative of various lags.
ARIMA Modelling procedure
When fitting an ARIMA model to a set of (non-seasonal) time series data, the
following procedure provides a useful general approach.
1. Plot the data and identify any unusual observations.
2. If necessary, transform the data (using a Box-Cox transformation) to stabilize the
variance.
3. If the data are non-stationary, take the first differences of the data until the data are
stationary.
4. Examine the ACF/PACF: Is an ARIMA(p,d,0p,d,0) or ARIMA(0,d,q0,d,q) model
appropriate?
5. Try your chosen model(s), and use the AICc to search for a better model.
6. Check the residuals from your chosen model by plotting the ACF of the residuals,
and doing a portmanteau test of the residuals. If they do not look like white noise,
try a modified model.
7. Once the residuals look like white noise, calculate forecasts.
The Hyndman-Khandakar algorithm only takes care of steps 3–5. So even if you use it,
you will still need to take care of the other steps yourself.

Autocorrelation Function (ACF)


Correlation between time series with a lagged version of itself. The correlation
between the observation at the current time spot and the observations at previous time
spots. The autocorrelation function starts a lag 0, which is the correlation of the time
series with itself and therefore results in a correlation of 1.

The ACF plot can provide answers to the following questions:

 Is the observed time series white noise/random?


 Is an observation related to an adjacent observation, an observation twice removed,
and so on?
 Can the observed time series be modeled with an MA model? If yes, what is the
order?
Codes for Autocorrelation Function(ACF)

ACF with Lag of 01

ACF with Lag of 05


ndiff Functions
Functions to estimate the number of differences required to make a given time series
stationary. estimates the number of first differences necessary

Codes for ndiffs

Output

Here the number of differences required to make a given the data is 1 that means
the d value in the arima is 1

PACF

PACF is the partial autocorrelation function that explains the partial correlation
between the series and lags itself. In simple terms, PACF can be explained using a
linear regression where we predict y(t) from y(t-1), y(t-2), and y(t-3) [2]. In
PACF, we correlate the “parts” of y(t) and y(t-3) that are not predicted by y(t-
1) and y(t-2).

Codes for PACF


PACF Plot

PACF with lag of 5


Through acf and pacf model we found different arima models i.e (pdq)

Model p d q
s
A 2 1 2
B 0 1 0
C 1 1 0
D 0 1 1
E 1 1 1

Outputs of the Arima models

Model AIC RMSE


s
A 2075.1 11.0003
4 1
B 2099.8 11.75
5
C 2101.6 11.74
9
D 2101.6 AIC
11.74
9 The Akaike
E 2103.6 11.75 information
9 criterion (AIC) is a
mathematical method
for evaluating how well a model fits the data it was generated from
There is no value for AIC that can be considered “good” or “bad” because
we simply use AIC as a way to compare regression models. The model with the
lowest AIC offers the best fit
RMSE

Root mean squared error (RMSE) is the square root of the mean of the square
of all of the error. RMSE is considered an excellent general-purpose error
metric for numerical predictions. RMSE is a good measure of accuracy, but
only to compare prediction errors of different models or model configurations
for a particular variable and not between variables, as it is scale-dependent. It
is the measure of how well a regression line fits the data points

The model with the highest RMSE offers the best fit
Here, “A” fits the best model because of the lower AIC

Forecast of Consumption of LPG using ARIMA


Out PUT
Auto Arima

we have been going through the process of manually fitting different models
and deciding which one is best. So, we are going to automate the process.
Basically, it takes the data and fits many models in a different order before
comparing the characteristics. However, the processing time increases
substantially, when we try to fit complex models
Conclusion

For checking of the accuracy of ARIMA model the mean percentage error has
been calculated after comparing the actual price and the forecasted price. The
mean percentage error was calculated which shows ARIMA gives accuracy.
The result showed that the suggested model with the lower AIC, was more
accurate in forecasting the time series. Final fitted model (0,1,0) with fitted
values .

Reference
[Link]
[Link]
[Link]
[Link]

Codes

#ATMANAND P GARAG
#20212MBA0055
# SEC2
#END-TERM PROJECT

# [Link] series -objects are called ts objects


#command for timeseries is ts
#name<-ts(data,additional argumets)
[Link]("datasets")
[Link]("forecast")
[Link]("graphics")
[Link]("stats")
[Link]("tseries")
[Link]("ets")
library(forecast)
library(graphics)
library(stats)
library(tseries)
library(ggplot2)
library(datasets)
library(readxl)
# ts()- adding time series
# plot()- plot a time series
# start()- Returning the starting time of a timeseries
# end()- Returning the end time of a timeseries
#frequency()-Return the period of a timeseries
#window()-Subsets of a timeseries
class(Data)
View(Atma)
Data<-ts(Atma$Price,start = c(2000,04),end = c(2022,10),frequency=12)
Data
class(Data)
plot(Data)
autoplot(Data)
plot(Data)

#[Link] Plot
seasonplot(Data,[Link]="TRUE",main="SEASONAL RANDOM-DATA")

#2. Decompose function is used for identifying the components in TS .


#ts components : Trend, Sesonal ,Cyclical and Random

DC<-decompose(Data)
DC
plot(DC)

#[Link]
[Link](Data)

#we know that we need to adress two issues before we test stationary seris
#one we need to remove un equal verieances. we do this using log of the seris
#Two we need to adress the tend componet
#Converting non stationary Into stationary
ndiffs(Data)
Data1<-diff(Data,differences = 1)
Data1
[Link](Data1)
plot(Data1)
autoplot(Data1)
#ARIMA
#ARIMA- Autoregressive integrated moving average (ARIMA)
#To find Lag function is lag(ts,k)
#ACF plot autocorrelation function - Acf(ts)
#PACF plot partial auto correlation -pacf(ts)
#differential for stationarity ndiffs(ts)
#An ARIMA model is characterized by 3 terms: p, d, q
#p is the order of the AR term

#q is the order of the MA term

#p A pure Auto Regressive (AR only) model is one where Yt depends only on its
own lags. That is, Yt is a function of the 'lags of Yt'.
#d is the number of differencing required to make the time series stationary
#q Likewise a pure Moving Average (MA only) model is one where Yt depends only
on the lagged forecast errors.

View(Data)
class(Data)

#Augmented Dickey -Fuller (ADF) test


m<-[Link](Data)
m
#ndifs(d)
ndiffs(Data)
#d=1

#ACF -Auto correlation


acf(Data)
acf(Data,[Link] = 1)

acf(Data,[Link] = 5)

#partial Auto correlation


pacf(Data)
pacf(Data,[Link] = 1)
pacf(Data,[Link] = 5)
[Link](Data)
#Arima Models
A<-arima(Data,order=c(2,1,2))
A
accuracy(A)
B<-arima(Data,order=c(0,1,0))
B
accuracy(B)

C<-arima(Data,order=c(1,1,0))
C
accuracy(C)

D<-arima(Data,order=c(0,1,1))
D
accuracy(D)
E<-arima(Data,order=c(1,1,1))
E
accuracy(E)
#aic values
A
B
C
D
E
#RMSE
accuracy(A)
accuracy(B)
accuracy(C)
accuracy(D)
accuracy(E)

#Forecast
FC<-forecast(A)
plot(FC)
autoplot(FC)
FC

#auto arima
[Link](Data)
FA<-[Link](Data,ic="aic",trace = TRUE)
FC<-forecast(FA)
plot(FC)
FC

You might also like