MANIPULATING TIME SERIES DATA IN PYTHON
Rolling Window Functions
with Pandas
Manipulating Time Series Data in Python
Window Functions in pandas
● Windows identify sub periods of your time series
● Calculate metrics for sub periods inside the window
● Create a new time series of metrics
● Two types of windows:
● Rolling: same size, sliding (this video)
● Expanding: contain all prior values (next video)
Manipulating Time Series Data in Python
Calculating a Rolling Average
In [1]: data = pd.read_csv('[Link]',
parse_dates=['date'],
index_col='date')
DatetimeIndex: 1761 entries, 2010-01-04 to 2016-12-30
Data columns (total 1 columns):
price 1761 non-null float64
dtypes: float64(1)
Manipulating Time Series Data in Python
Calculating a Rolling Average
# Integer-based window size
In [5]: [Link](window=30).mean() # fixed # observations
DatetimeIndex: 1761 entries, 2010-01-04 to 2017-05-24
Data columns (total 1 columns): window=30: # business days
price 1732 non-null float64 min_periods: choose value < 30 to
dtypes: float64(1)
get results for first days
# Offset-based window size
In [6]: [Link](window='30D').mean() # fixed period length
DatetimeIndex: 1761 entries, 2010-01-04 to 2017-05-24
Data columns (total 1 columns):
price 1761 non-null float64 30D: # calendar days
dtypes: float64(1)
Manipulating Time Series Data in Python
90 Day Rolling Mean
In [7]: r90 = [Link](window='90D').mean()
In [8]: [Link](r90.add_suffix(‘_mean_90’)).plot()
.join:
concatenate Series
or DataFrame along
axis=1
Manipulating Time Series Data in Python
90 & 360 Day Rolling Means
In [8]: data['mean90'] = r90
In [9]: r360 = data[‘price'].rolling(window='360D'.mean()
In [10]: data['mean360'] = r360; [Link]()
Manipulating Time Series Data in Python
Multiple Rolling Metrics (1)
In [8]: r = [Link](‘90D’).agg([‘mean’, 'std'])
In [9]: [Link](subplots = True)
Manipulating Time Series Data in Python
Multiple Rolling Metrics (2)
In [10]: rolling = [Link]('360D')
In [11]: q10 = [Link](.1).to_frame('q10')
In [12]: median = [Link]().to_frame(‘median')
In [13]: q90 = [Link](.9).to_frame('q90')
In [14]: [Link]([q10, median, q90], axis=1).plot()
MANIPULATING TIME SERIES DATA IN PYTHON
Let’s practice!
MANIPULATING TIME SERIES DATA IN PYTHON
Expanding Window
Functions with Pandas
Manipulating Time Series Data in Python
Expanding Windows in pandas
● From rolling to expanding windows
● Calculate metrics for periods up to current date
● New time series reflects all historical values
● Useful for running rate of return, running min/max
● Two options with pandas:
● .expanding() - just like .rolling()
● .cumsum(), .cumprod(), cummin()/max()
Manipulating Time Series Data in Python
The Basic Idea
In [1]: df = [Link]({'data': range(5)})
In [2]: df['expanding sum'] = [Link]().sum()
In [3]: df['cumulative sum'] = [Link]()
In [4]: df
data expanding sum cumulative sum
0 0 X 0.0 0
1 1 1.0 1
2 2 3.0 3
3 3 6.0 6
4 4 10.0 10
Manipulating Time Series Data in Python
Get data for the S&P 500
In [5]: data = pd.read_csv('[Link]', parse_dates=['date'],
index_col=‘date')
DatetimeIndex: 2519 entries, 2007-05-24 to 2017-05-24
Data columns (total 1 columns):
SP500 2519 non-null float64
Manipulating Time Series Data in Python
How to calculate a Running Return
● Single period return r: current price over last price minus 1
Pt
rt = 1
Pt 1
● Multi-period return: product of (1 + r) for all periods, minus 1:
RT = (1 + r1 )(1 + r2 )...(1 + rT ) 1
● For the period return: .pct_change()
● For basic math .add(), .sub(), .mul(), .div()
● For cumulative product: .cumprod()
Manipulating Time Series Data in Python
Running Rate of Return in Practice
In [6]: pr = data.SP500.pct_change() # period return
In [7]: pr_plus_one = [Link](1)
In [8]: cumulative_return = pr_plus_one.cumprod().sub(1)
In [9]: cumulative_return.mul(100).plot()
Manipulating Time Series Data in Python
Ge!ing the running min & max
In [2]: data['running_min'] = [Link]().min()
In [3]: data['running_max'] = [Link]().max()
In [4]: [Link]()
Manipulating Time Series Data in Python
Rolling Annual Rate of Return
In [10]: def multi_period_return(period_returns):
return [Link](period_returns + 1) - 1
In [11]: pr = data.SP500.pct_change() # period return
In [12]: r = [Link]('360D').apply(multi_period_return)
In [13]: data['Rolling 1yr Return'] = [Link](100)
In [14]: [Link](subplots=True)
Manipulating Time Series Data in Python
Rolling Annual Rate of Return
In [13]: data['Rolling 1yr Return'] = [Link](100)
In [14]: [Link](subplots=True)
MANIPULATING TIME SERIES DATA IN PYTHON
Let’s practice!
MANIPULATING TIME SERIES DATA IN PYTHON
Case Study:
S&P500 Price Simulation
Manipulating Time Series Data in Python
Random Walks & Simulations
● Daily stock returns are hard to predict
● Models o!en assume they are random in nature
● Numpy allows you to generate random numbers
● From random returns to prices: use .cumprod()
● Two examples:
● Generate random returns
● Randomly selected actual SP500 returns
Manipulating Time Series Data in Python
Generate Random Numbers
In [1]: from [Link] import normal, seed
In [2]: from [Link] import norm
In [3]: seed(42)
In [3]: random_returns = normal(loc=0, scale=0.01, size=1000)
In [4]: [Link](random_returns, fit=norm, kde=False)
Normal
Distribution
1,000
Random
Returns
Manipulating Time Series Data in Python
Create A Random Price Path
In [5]: return_series = [Link](random_returns)
In [6]: random_prices = return_series.add(1).cumprod().sub(1)
In [7]: random_prices.mul(100).plot()
Manipulating Time Series Data in Python
S&P 500 Prices & Returns
In [5]: data = pd.read_csv('[Link]', parse_dates=['date'],
index_col=‘date')
In [6]: data['returns'] = data.SP500.pct_change()
In [7]: [Link](subplots=True)
Manipulating Time Series Data in Python
S&P Return Distribution
In [8]: [Link]([Link]().mul(100), fit=norm)
S&P 500
Returns
Normal
Distribution
Manipulating Time Series Data in Python
Generate Random S&P 500 Returns
In [9]: from [Link] import choice
In [10]: sample = [Link]()
In [11]: n_obs = [Link]()
In [12]: random_walk = choice(sample, size=n_obs)
In [14]: random_walk = [Link](random_walk, index=[Link])
In [15]: random_walk.head()
DATE
2007-05-29 -0.008357
2007-05-30 0.003702
2007-05-31 -0.013990
2007-06-01 0.008096
2007-06-04 0.013120
Manipulating Time Series Data in Python
Random S&P 500 Prices (1)
In [9]: start = [Link]('D')
DATE
2007-05-25 1515.73
Name: SP500, dtype: float64
In [10]: sp500_random = [Link](random_walk.add(1))
In [11]: sp500_random.head())
DATE
2007-05-25 1515.730000
2007-05-29 0.998290
2007-05-30 0.995190
2007-05-31 0.997787
2007-06-01 0.983853
dtype: float64
Manipulating Time Series Data in Python
Random S&P 500 Prices (2)
In [9]: data['SP500_random'] = sp500_random.cumprod()
In [10]: data[['SP500', 'SP500_random']].plot()
MANIPULATING TIME SERIES DATA IN PYTHON
Let’s practice!
MANIPULATING TIME SERIES DATA IN PYTHON
Relationships between
Time Series: Correlation
Manipulating Time Series Data in Python
Correlation & Relations between Series
● So far, focus on characteristics of individual variables
● Now: characteristic of relations between variables
● Correlation: measures linear relationships
● Financial markets: important for prediction and risk
management
● Pandas & seaborns have tools to compute & visualize
Manipulating Time Series Data in Python
Correlation & Linear Relationships
● Correlation coefficient: how similar is the pairwise
movement of two variables around their averages?
PN
(x
i=i i x̄)(yi ȳ)
● Varies between -1 and + 1 r=
sx sy
Strength of linear
relationship
Positive or
negative
Not: non-linear
relationships
Manipulating Time Series Data in Python
Importing Five Price Time Series
In [1]: data = pd.read_csv('[Link]', parse_dates=['date'],
index_col='date')
In [2]: data = [Link]().info()
DatetimeIndex: 2469 entries, 2007-05-25 to 2017-05-22
Data columns (total 5 columns):
sp500 2469 non-null float64
nasdaq 2469 non-null float64
bonds 2469 non-null float64
gold 2469 non-null float64
oil 2469 non-null float64
Manipulating Time Series Data in Python
Visualize pairwise linear relationships
In [4]: daily_returns = data.pct_change()
In [5]: [Link](x='sp500', y='nasdaq', data=data_returns);
Manipulating Time Series Data in Python
Calculate all Correlations
In [6]: correlations = [Link]()
In [7]: correlations
Out[7]:
bonds oil gold sp500 nasdaq
bonds 1.000000 -0.183755 0.003167 -0.300877 -0.306437
oil -0.183755 1.000000 0.105930 0.335578 0.289590
gold 0.003167 0.105930 1.000000 -0.007786 -0.002544
sp500 -0.300877 0.335578 -0.007786 1.000000 0.959990
nasdaq -0.306437 0.289590 -0.002544 0.959990 1.000000
Manipulating Time Series Data in Python
Visualize all Correlations
In [8]: [Link](correlations, annot=True)
MANIPULATING TIME SERIES DATA IN PYTHON
Let’s practice!