Peeking at data with
head, tail, and
describe
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Kennedy Behrman
Data Engineer, Author, Founder
Understanding your data
Data is loaded correctly
Understand the data's shape
INTERMEDIATE PYTHON FOR FINANCE
First look at data
aapl
INTERMEDIATE PYTHON FOR FINANCE
First look at data
aapl
Date
03/27/2020
03/26/2020
03/25/2020
03/24/2020
INTERMEDIATE PYTHON FOR FINANCE
First look at data
aapl
Price
Date
03/27/2020 247.74
03/26/2020 258.44
03/25/2020 245.52
03/24/2020 246.88
INTERMEDIATE PYTHON FOR FINANCE
First look at data
aapl
Price Volume
Date
03/27/2020 247.74 51054150
03/26/2020 258.44 63140170
03/25/2020 245.52 75900510
03/24/2020 246.88 71882770
INTERMEDIATE PYTHON FOR FINANCE
First look at data
aapl
Price Volume Trend
Date
03/27/2020 247.74 51054150 Down
03/26/2020 258.44 63140170 Up
03/25/2020 245.52 75900510 Down
03/24/2020 246.88 71882770 Up
INTERMEDIATE PYTHON FOR FINANCE
Head
[Link]() displays the first 5 rows
take a peek
Price Volumne Trend
Date
03/27/2020 247.74 51054150 Down
03/26/2020 258.44 63140170 Up
03/25/2020 245.52 75900510 Down
03/24/2020 246.88 71882770 Up
03/23/2020 224.37 84188210 Down
INTERMEDIATE PYTHON FOR FINANCE
Head
[Link]()
INTERMEDIATE PYTHON FOR FINANCE
Head
[Link](3)
```out
Price Volumne Trend
Date
03/27/2020 247.74 51054150 Down
03/26/2020 258.44 63140170 Up
03/25/2020 245.52 75900510 Down
INTERMEDIATE PYTHON FOR FINANCE
Tail
[Link]() to see the bottom rows
Price Volumne Trend
Date
03/05/2020 292.92 46893220 Down
03/04/2020 302.74 54794570 Up
03/03/2020 289.32 79868850 Down
03/02/2020 298.81 85349340 Up
02/28/2020 273.36 106721200 Down
INTERMEDIATE PYTHON FOR FINANCE
Describe
[Link]()
Price Volume
count 21.000000 2.100000e+01
mean 263.715714 7.551468e+07
std 23.360598 1.669757e+07
min 224.370000 4.689322e+07
25% 246.670000 6.409497e+07
50% 258.440000 7.505841e+07
75% 285.340000 8.418821e+07
max 302.740000 1.067212e+08
INTERMEDIATE PYTHON FOR FINANCE
Include type of column
[Link](include='object')
Trend
count 21
unique 2
top Down
freq 14
INTERMEDIATE PYTHON FOR FINANCE
Include
[Link](include='all')
Price Volumne Trend
count 21.000000 2.100000e+01 21
unique NaN NaN 2
top NaN NaN Down
freq NaN NaN 14
mean 263.715714 7.551468e+07 NaN
std 23.360598 1.669757e+07 NaN
min 224.370000 4.689322e+07 NaN
25% 246.670000 6.409497e+07 NaN
INTERMEDIATE PYTHON FOR FINANCE
[Link](include=['float', 'object'])
Price Trend
count 21.000000 21
unique NaN 2
top NaN Down
freq NaN 14
mean 263.715714 NaN
std 23.360598 NaN
min 224.370000 NaN
25% 246.670000 NaN
50% 258.440000 NaN
75% 285.340000 NaN
max 302.740000 NaN
INTERMEDIATE PYTHON FOR FINANCE
Percentiles
[Link](percentiles=[.1, .5, .9])
Price Volumne
count 21.000000 2.100000e+01
mean 263.715714 7.551468e+07
std 23.360598 1.669757e+07
min 224.370000 4.689322e+07
10% 242.210000 5.479457e+07
50% 258.440000 7.505841e+07
90% 292.920000 1.004233e+08
max 302.740000 1.067212e+08
INTERMEDIATE PYTHON FOR FINANCE
Exclude
[Link](exclude='float')
Volumne Trend
count 2.100000e+01 21
unique NaN 2
top NaN Down
freq NaN 14
mean 7.551468e+07 NaN
std 1.669757e+07 NaN
min 4.689322e+07 NaN
25% 6.409497e+07 NaN
INTERMEDIATE PYTHON FOR FINANCE
Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Filtering data
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Kennedy Behrman
Data Engineer, Author, Founder
Introducing the data
[Link]()
INTERMEDIATE PYTHON FOR FINANCE
Introducing the data
[Link]()
Date Symbol High
0 2020-04-03 AAPL 245.70
1 2020-04-02 AAPL 245.15
2 2020-04-01 AAPL 248.72
3 2020-03-31 AAPL 262.49
4 2020-03-30 AAPL 255.52
INTERMEDIATE PYTHON FOR FINANCE
Introducing the data
[Link]()
INTERMEDIATE PYTHON FOR FINANCE
Introducing the data
[Link]()
High
count 378.000000
mean 881.593138
std 720.771922
min 227.490000
max 2185.950000
INTERMEDIATE PYTHON FOR FINANCE
Introducing the data
[Link](include='object')
Symbol
count 378
unique 3
top AMZN
freq 126
INTERMEDIATE PYTHON FOR FINANCE
Comparison operators
< <= > >= == !=
INTERMEDIATE PYTHON FOR FINANCE
Column comparison
[Link] > 2160
INTERMEDIATE PYTHON FOR FINANCE
Column comparison
[Link] > 2160
0 False
1 False
2 False
3 False
4 False
...
374 False
375 False
376 False
377 False
INTERMEDIATE PYTHON FOR FINANCE
Column comparison
[Link] == 'AAPL'
INTERMEDIATE PYTHON FOR FINANCE
Column comparison
[Link] == 'AAPL'
0 True
1 True
2 True
3 True
4 True
...
374 False
375 False
376 False
377 False
INTERMEDIATE PYTHON FOR FINANCE
Masking by symbol
mask_symbol = [Link] == 'AAPL'
aapl = [Link][mask_symbol]
INTERMEDIATE PYTHON FOR FINANCE
Masking by symbol
mask_symbol = [Link] == 'AAPL'
aapl = [Link][mask_symbol]
[Link](include='object')
Symbol
count 126
unique 1
top AAPL
freq 126
INTERMEDIATE PYTHON FOR FINANCE
Masking by price
mask_high = [Link] > 2160
big_price = [Link][mask_high]
INTERMEDIATE PYTHON FOR FINANCE
Masking by price
big_price.describe()
High
count 6.000000
mean 2177.406567
std 7.999334
min 2166.070000
max 2185.95000
INTERMEDIATE PYTHON FOR FINANCE
Pandas Boolean operators
And &
Or |
Not ~
INTERMEDIATE PYTHON FOR FINANCE
Combining conditions
mask_prices = prices['Symbol'] != 'AMZN'
mask_date = historical_highs['Date'] > datetime(2020, 4, 1)
mask_amzn = mask_prices & mask_date
[Link][mask_amzn]
INTERMEDIATE PYTHON FOR FINANCE
Combining conditions
Date Symbol High
0 2020-04-03 AAPL 245.7000
1 2020-04-02 AAPL 245.1500
252 2020-04-03 TSLA 515.4900
253 2020-04-02 TSLA 494.2599
INTERMEDIATE PYTHON FOR FINANCE
Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Plotting data
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Kennedy Behrman
Data Engineer, Author, Founder
Look at your data
INTERMEDIATE PYTHON FOR FINANCE
[Link]()
INTERMEDIATE PYTHON FOR FINANCE
Introducing the data
[Link]()
Date High Volume Month
0 2015-05-01 90.089996 198924100 May
1 2015-06-01 85.970001 238808600 Jun
2 2015-07-01 83.529999 274029000 Jul
3 2015-08-01 79.290001 387523600 Aug
4 2015-09-01 75.470001 316644500 Sep
INTERMEDIATE PYTHON FOR FINANCE
Matplotlib
my_dataframe.plot()
INTERMEDIATE PYTHON FOR FINANCE
Line plot
[Link](x='Date',
y='High' )
INTERMEDIATE PYTHON FOR FINANCE
INTERMEDIATE PYTHON FOR FINANCE
Rotate
[Link](x='Date',
y='High',
rot=90 ) rotation of the labels
INTERMEDIATE PYTHON FOR FINANCE
INTERMEDIATE PYTHON FOR FINANCE
Title
[Link](x='Date',
y='High',
rot=90,
title='Exxon Stock Price')
INTERMEDIATE PYTHON FOR FINANCE
INTERMEDIATE PYTHON FOR FINANCE
Index
exxon.set_index('Date', inplace=True)
[Link](y='High',
rot=90,
title='Exxon Stock Price')
INTERMEDIATE PYTHON FOR FINANCE
INTERMEDIATE PYTHON FOR FINANCE
Plot types
line density
bar area
barh pie
hist scatter
box hexbin
kde
INTERMEDIATE PYTHON FOR FINANCE
Bar
[Link](x='Month',
y='Volume',
kind='bar',
title='Exxon 2018')
INTERMEDIATE PYTHON FOR FINANCE
INTERMEDIATE PYTHON FOR FINANCE
Hist
[Link](y='High',kind='hist')
INTERMEDIATE PYTHON FOR FINANCE
INTERMEDIATE PYTHON FOR FINANCE
Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Wrapping up
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Kennedy Behrman
Data Engineer, Author, Founder
Chapter 1
Representing time Mapping data
datetime dict()
INTERMEDIATE PYTHON FOR FINANCE
Chapter 2
Comparison operators If statements
< <= > >=
if a < b:
print(a)
Equality operators
== != Loops
Boolean operators while a < b:
and or not a = a + 1
for a in c:
print(a)
INTERMEDIATE PYTHON FOR FINANCE
Chapter 3
Creating a DataFrame Aggregating, summarizing
DataFrame(data=data) [Link]()
pd.read_csv('/[Link]') [Link]()
Accessing data Extending, manipulating
[Link]['a', 'Values'] pce['PCESV'] = pcesv
[Link][2:22, 12] [Link]([Link], axis=1)
INTERMEDIATE PYTHON FOR FINANCE
Chapter 4
Peeking Plo ing
[Link]() [Link](x='Date',
[Link]() y='High' )
[Link]()
Filtering
mask = [Link] > 216
[Link][mask]
INTERMEDIATE PYTHON FOR FINANCE
Congratulations!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E