0% found this document useful (0 votes)
8 views75 pages

Module 3. Time Series

Uploaded by

Manas Parab
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views75 pages

Module 3. Time Series

Uploaded by

Manas Parab
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Module 3.

Time Series
Time Series
Time Series
Time Series Data
Time series analysis

Analyzing this time series data with certain tools and techniques is
called Time series analysis.

Forecasting
● Forecasting is the process of making predictions from the
historical data so that they can predict the future from the
past and present data.

Types of forecasting:
1) Quantitative forecasting
2) Qualitative forecasting
Regression vs Time Series

Regression
● It is the relationship between dependent and independent
variables.
● The target variable is continuous.
● This involves finding patterns in the data and predict the target
with this pattern.
Time Series
● It is the series of data points associated with time.
● The target variable is continuous.
● This involves finding trends in the data and forecast the future with
this trend.
Regression vs Time Series

[Link]
_Series
How to Analyze Time Series
● Collecting the data and cleaning it
● Preparing Visualization with respect to time vs key
feature
● Observing the stationarity of the series
● Developing charts to understand its nature.
● Model building – AR, MA, ARMA and ARIMA
● Extracting insights from prediction
What Are the Limitations of Time Series Analysis?

Time series has the below-mentioned limitations; we have to take


care of those during our data analysis.
● Similar to other models, the missing values are not supported by TSA
● The data points must be linear in their relationship.
● Data transformations are mandatory, so they are a little expensive.
● Models mostly work on Uni-variate data.
•Trend: A long-term upward or downward movement in the
data, indicating a general increase or decrease over time.
•Seasonality: A repeating pattern in the data that occurs at
regular intervals, such as daily, weekly, monthly, or yearly.
•Cycle: A pattern in the data that repeats itself after a specific
number of observations, which is not necessarily related to
seasonality.
•Irregularity: Random fluctuations in the data that cannot be
easily explained by trend, seasonality, or cycle.
Data Types of Time Series
1. Stationary
2. Non-stationary.

● Stationary Time Series: Data whose statistical properties—mean,


variance, and autocorrelation—remain constant over time (no
trend or seasonality).
● Non-Stationary Time Series: Data where the mean, variance, or
other patterns change over time, typically exhibiting trends,
cycles, or seasonal variations.
Why Stationarity Matters?

● In time series modeling, stationarity ensures


that the model's performance is consistent over
time.
● Non-stationary data can produce misleading
forecasts and incorrect inferences.
● That’s why one of the first steps in time series
analysis is testing for stationarity
Deterministic trends and Stochastic Trend
● Deterministic time series are characterized by patterns or trends that are
predictable and can be described using mathematical functions.
● In the field of Time series analysis, a deterministic trend implies that any
deviations from this underlying trend are temporary and the series will eventually
revert to the trend line.

● Stochastic time series incorporate elements of randomness and


unpredictability, which means future values cannot be exactly determined even
if past behaviors are known.
● Stochastic processes are better suited for modeling real-world phenomena
where unpredictability plays a significant role.
Methods to Check Stationarity
• Augmented Dickey-Fuller (ADF) Test or Unit Root Test
• KPSS Test
H0 (Null Hypothesis): Φ=1 (Non-stationarity, unit root exist)
H1(Alternate Hypothesis) : Φ<1 (Stationarity, does not have unit root)
Autoregressive (AR) Model for Time Series Forecasting

● It is a concept in time series analysis and forecasting that captures


the relationship between an observation and several lagged
observations i.e previous time steps.
● Its idea is that the current value of a time series data can be
expressed as a linear combination of its past values with some
random noise.
Working of Autoregressive (AR) Model

❏ Autocorrelation Function (ACF) in Autoregressive Models is the one that measures the
correlation between a time series and its past lagged values. Its working is as follows:
a. Understanding Lag and Temporal Dependence
❏ A lag represents the number of time steps by which the series is shifted. For
example:
● Lag 1 compares each value with the one immediately preceding it.
● Lag 2 compares values with those two time steps earlier, and so on.
❏ The autocorrelation coefficient at a specific lag quantifies this relationship:
● A high autocorrelation indicates a strong connection between the present and
the past value at that lag.
● A low or near-zero autocorrelation suggests weak or no temporal dependence.
b. Visualizing Autocorrelation using ACF Plot
❏ To analyze autocorrelation patterns, we use an ACF plot which displays
autocorrelation values across various lags:
❏ The x-axis shows the lag values.
❏ The y-axis shows the autocorrelation coefficients
C. Role of ACF in AR Model Selection

● The ACF helps determine how many past time steps (lags) should be included
in the model.
● It also helps assess whether a time series is stationary
● In a stationary series autocorrelations typically decrease gradually as lag
increases.
● If autocorrelations persist or decay slowly it may indicate non-stationarity
suggesting that the series needs transformation before modeling.
Types of Autoregressive Models

Autoregressive models vary based on the number of past values (lags) they use. The
two most common types are:
1. AR(1) Model:
● This is a autoregressive model of order 1 which is the simplest form of an
autoregressive model.
● The current value of the time series depends only on its immediate past value along
with a constant and some random noise.
● This model is particularly useful when the data shows strong autocorrelation at lag 1
i.e the current value depends only on the previous value.
WHITE NOISE:
A sequence of random
Xt = is the value at time t c= Constant variables that are
uncorrelated, have a
Φ1= Model parameter Xt-1 = Lagged value mean of zero, and
possess a constant
ε t = represents white noise or random error at time t. variance.
2. AR(p) Model:
● It is the generalized form of the autoregressive model where the current value

depends on the past p values.

● Choosing the correct order p is a crucial step and typically involves analyzing the

Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF)


plots.
Augmented Dickey-Fuller (ADF)

• The Augmented Dickey-Fuller (ADF) test, a type of unit root test,


checks if a time series is stationary by testing for the presence of a
unit root, which indicates non-stationarity.
• The presence of a unit root implies that the time series is
non-stationary and requires differencing
How to Interpret the Results:
•The ADF test provides a test statistic, a p-value, and critical values at different significance
levels (e.g., 1%, 5%, 10%).
•Low p-value (less than the significance level): You reject the null hypothesis, suggesting the
time series is stationary (or at least trend-stationary).
•High p-value (greater than the significance level): You fail to reject the null hypothesis,
suggesting the time series has a unit root and is non-stationary.
Kwiatkowski-Phillips-Schmidt-Shin (KPSS)

● The Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test is a statistical test


used to assess the stationarity of a time series.
● In contrast to the ADF test, which tests the null hypothesis that a series has a
unit root (i.e., is non-stationary), the KPSS test assumes the series is
stationary under the null hypothesis and looks for evidence to reject that
assumption.
● The KPSS (Kwiatkowski-Phillips-Schmidt-Shin) test checks whether a time

series is stationary around a mean or deterministic trend. It tests the null


hypothesis that the series is stationary.
Hypotheses in KPSS Test:

Null Hypothesis (H₀): The time series is stationary (either around a constant or
around a trend).

Alternative Hypothesis (H₁): The time series is non-stationary (it has a unit root,
meaning it is not stationary around a trend or constant).

A random walk is a
sequence of discrete
steps in which each step
is randomly taken subject
to some set of restrictions
in allowed directions and
step lengths.
● A high KPSS statistic indicates strong evidence against the null hypothesis of
stationarity, suggesting the presence of a unit root (i.e., non-stationarity).
● Conversely, a low KPSS value means the null hypothesis of stationarity cannot be
rejected.
Moving Average
● Moving Average Models are a type of time series analysis model
usually used to forecast trends and understand patterns in time
series data.
● In moving average models, the present value of the time series
depends on the linear combination of the past white noise
error terms of the time series.
● In time series analysis moving average is denoted by the letter “q”
which represents the order of the moving average model
● In simple words we can say the current value of the time series will
depend on the past q error terms
Moving Average
Therefore, the moving average model of order q could be
represented as:
Xt=c+ө1εt-1+εt
Lagged error
Xt-1= c+ө1εt-2+εt-1

Xt= c+ө1εt-1+εt

Xt+1= c+ө1εt+εt+1 (No more εt-1)


1. ARMA Components: Autoregressive (AR)

The Autoregressive (AR) part of the ARMA model uses the relationship between an
observation and a number of lagged (previous) observations to predict future values.

Eg: Attempting to forecast the temperature for tomorrow by using the data from the last
several days
● temperature of today as T and the temperatures of the last two days as T and
t t−1

Tt−2, an AR(2) model


2. ARMA Components: Moving Average (MA)

The Moving Average (MA) part of the ARMA model uses the dependency between an
observation and a residual error from a moving average model applied to lagged
observations.
● today's temperature is also influenced by the errors made in predicting previous days'
temperatures.
If we denote today's error as ɛt and the errors of the last two days as ɛt−1 and εt−2 an MA(2)

model can be written as:


The ARMA model is a combination of both AR and MA components. An ARMA(p, q) model,

where p is the number of lagged observations (AR part) and q is the number of lagged

forecast errors (MA part), is represented as:


How to Determine the Orders p and q in ARMA Model?
Determining the appropriate values for p and q is crucial for building an effective ARMA
model. This can be done using the following methods:
1. Partial Autocorrelation Function (PACF):
● PACF is used to determine the order p of the AR model. It measures the
correlation between observations at different lags, excluding the influence of
intermediate lags.
● The order p is determined by the lag at which the PACF plot cuts off (drops
sharply).
2. Autocorrelation Function (ACF):
● ACF is used to determine the order q of the MA model. It measures the
correlation between observations at different lags.
● The order q is determined by the lag at which the ACF plot cuts off.
ARIMA

Autoregression (AR): A model that uses the correlation


between the current observation and lagged observations.
The number of lagged observations is referred to as the lag
order or p.
Integrated (I): The use of differencing of raw observations to
make the time series stationary. The number of differencing
ARIMA(p,d,q) operations is referred to as d.
Moving Average (MA): A model takes into account the relationship between the
current observation and the residual errors from a moving average model applied
to past observations. The size of the moving average window is the order or q.
ARIMA
The parameters of the ARIMA model are defined as follows:
● p: The number of lag observations included in the
model, also called the lag order.
● d: The number of times that the raw observations are
differenced, also called the degree of differencing.
● q: The size of the moving average window, also called
ARIMA(p,d,q)
the order of moving average.
Autocorrelation (ACF) in AR Model

“ACF” (Autocorrelation Function), is a fundamental concept in time series


analysis and autoregressive models.
It refers to the correlation between a time series and a lagged version of
itself.

Autocorrelation measures how closely the current value of a time series is


related to its past values, specifically those at different time lags.
Concept of autocorrelation
1. Autocorrelation involves calculating the correlation between a time series
and a lagged version of itself.

● The “lag” represents the number of time units by which the

series is shifted.

● For example, a lag of 1 corresponds to comparing the series

with its previous time step, while a lag of 2 compares it with the time step
before that, and so on.

● Lag values help you calculate autocorrelation, which measures how each

observation in a time series is related to previous observations.


Concept of autocorrelation
2. The autocorrelation at a particular lag provides insights into the
temporal
dependence of the data.

● If the autocorrelation is high at a certain lag, it indicates a strong

relationship between the current value and the value at that lag.

● If the autocorrelation is low or close to zero, it suggests a weak or no

relationship.
Concept of autocorrelation
3. To visualize autocorrelation, a common approach is to create an ACF plot.
This plot displays the autocorrelation coefficients at different lags. The
horizontal axis represents the lag, and the vertical axis represents the
autocorrelation values.
Significant peaks or patterns in the ACF plot can reveal the underlying
temporal structure of the data.
Concept of autocorrelation
4. In an Autoregressive model of order p, the current value of the time series
is expressed as a linear combination of its past p values,

5. Autocorrelation can also be used to assess whether a time series is


stationary . In a stationary time series, autocorrelation should
gradually decrease as the lag increases.
Autocorrelation

The autocorrelation function (ACF) at lag k for a time series.


Cov() is the covariance function. Var() is
the variance function.

k is the lag.
X t is the value of the time series at time t.
X t-k is the value of the time series at time t-k
Partial autocorrelation

● Partial autocorrelation removes the influence of intermediate lags


● providing a clearer picture of the direct relationship between a variable

and its past values.

● Partial autocorrelation focuses on the direct correlation at each lag. The

partial autocorrelation function (PACF) at lag k for a time series.


Partial autocorrelation

is the value of the time series at time.


is the value of the time series at time (t-k)
is the conditional covariance between
and and given the values of the intermediate lags.
is the conditional variance of given the
values of the intermediate lags.
is the conditional variance of
given the values of the intermediate lags.
Partial autocorrelation Interpretation
Direct Relationship: PACF isolates the direct correlation between the
current observation and the observation at lag k, controlling for the
influence of lags in between.
AR Process Identification: Peaks or significant values in PACF at specific
lags can indicate potential orders for autoregressive (AR) terms in time
series models.
Modeling Considerations: Analysts often examine PACF to guide the
selection of lag orders in autoregressive integrated moving average
(ARIMA) models.
Autocorrelation plot using Matplotlib

Autocorrelation plots are a commonly used tool for checking randomness in a


data set.

Characteristics Of Autocorrelation Plot :


It varies from +1 to -1.
An autocorrelation of +1 indicates that if time series one increases in value
the time series 2 also increases in proportion to the change in time series
1.
An autocorrelation of -1 indicates that if time series one increases in value
the time series 2 decreases in proportion to the change in time series 1.
Autocorrelation plot using Matplotlib
Application of Autocorrelation:

● Pattern recognition
● Signal detection.
● Signal processing
● Estimating pitch.
● Technical analysis of stocks.
What Is Box Jenkins Methodology?
● The Box-Jenkins methodology is a systematic, iterative process for
building, fitting, and checking ARIMA (AutoRegressive Integrated Moving
Average) models.
● Introduced by George Box and Gwilym Jenkins in 1970, it is considered
the most reliable approach for short-term time series forecasting.
● The process focuses on finding a parsimonious model—the simplest model
that adequately explains the data.
● The methodology follows a circular, iterative path. If a model fails at a
later stage, you return to the beginning and try a different specification.
What Is Box Jenkins Methodology?

It comprises of different steps including

● Identification

● Estimation

● Diagnostic checking
Model Fitting: With the parameter estimates in hand, the ARIMA model is
fitted to the historical data. The model is used to generate predicted values,
and the fit is assessed by comparing these predictions to the actual observed
values.

You might also like