0% found this document useful (0 votes)
21 views62 pages

Chapter 4

The document provides an overview of data exploration techniques in Python for finance, focusing on methods such as 'head', 'tail', and 'describe' to analyze stock data. It includes examples using Apple Inc. (AAPL) stock prices, demonstrating how to peek at data, filter it, and visualize it using Matplotlib. Additionally, it covers comparison operators and boolean logic for data manipulation.

Uploaded by

Xie Niyun
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views62 pages

Chapter 4

The document provides an overview of data exploration techniques in Python for finance, focusing on methods such as 'head', 'tail', and 'describe' to analyze stock data. It includes examples using Apple Inc. (AAPL) stock prices, demonstrating how to peek at data, filter it, and visualize it using Matplotlib. Additionally, it covers comparison operators and boolean logic for data manipulation.

Uploaded by

Xie Niyun
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Peeking at data with

head, tail, and


describe
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E

Kennedy Behrman
Data Engineer, Author, Founder
Understanding your data
Data is loaded correctly

Understand the data's shape

INTERMEDIATE PYTHON FOR FINANCE


First look at data
aapl

INTERMEDIATE PYTHON FOR FINANCE


First look at data
aapl

Date
03/27/2020
03/26/2020
03/25/2020
03/24/2020

INTERMEDIATE PYTHON FOR FINANCE


First look at data
aapl

Price
Date
03/27/2020 247.74
03/26/2020 258.44
03/25/2020 245.52
03/24/2020 246.88

INTERMEDIATE PYTHON FOR FINANCE


First look at data
aapl

Price Volume
Date
03/27/2020 247.74 51054150
03/26/2020 258.44 63140170
03/25/2020 245.52 75900510
03/24/2020 246.88 71882770

INTERMEDIATE PYTHON FOR FINANCE


First look at data
aapl

Price Volume Trend


Date
03/27/2020 247.74 51054150 Down
03/26/2020 258.44 63140170 Up
03/25/2020 245.52 75900510 Down
03/24/2020 246.88 71882770 Up

INTERMEDIATE PYTHON FOR FINANCE


Head
[Link]() displays the first 5 rows
take a peek

Price Volumne Trend


Date
03/27/2020 247.74 51054150 Down
03/26/2020 258.44 63140170 Up
03/25/2020 245.52 75900510 Down
03/24/2020 246.88 71882770 Up
03/23/2020 224.37 84188210 Down

INTERMEDIATE PYTHON FOR FINANCE


Head
[Link]()

INTERMEDIATE PYTHON FOR FINANCE


Head
[Link](3)

```out
Price Volumne Trend
Date
03/27/2020 247.74 51054150 Down
03/26/2020 258.44 63140170 Up
03/25/2020 245.52 75900510 Down

INTERMEDIATE PYTHON FOR FINANCE


Tail
[Link]() to see the bottom rows

Price Volumne Trend


Date
03/05/2020 292.92 46893220 Down
03/04/2020 302.74 54794570 Up
03/03/2020 289.32 79868850 Down
03/02/2020 298.81 85349340 Up
02/28/2020 273.36 106721200 Down

INTERMEDIATE PYTHON FOR FINANCE


Describe
[Link]()

Price Volume
count 21.000000 2.100000e+01
mean 263.715714 7.551468e+07
std 23.360598 1.669757e+07
min 224.370000 4.689322e+07
25% 246.670000 6.409497e+07
50% 258.440000 7.505841e+07
75% 285.340000 8.418821e+07
max 302.740000 1.067212e+08

INTERMEDIATE PYTHON FOR FINANCE


Include type of column

[Link](include='object')

Trend
count 21
unique 2
top Down
freq 14

INTERMEDIATE PYTHON FOR FINANCE


Include
[Link](include='all')

Price Volumne Trend


count 21.000000 2.100000e+01 21
unique NaN NaN 2
top NaN NaN Down
freq NaN NaN 14
mean 263.715714 7.551468e+07 NaN
std 23.360598 1.669757e+07 NaN
min 224.370000 4.689322e+07 NaN
25% 246.670000 6.409497e+07 NaN

INTERMEDIATE PYTHON FOR FINANCE


[Link](include=['float', 'object'])

Price Trend
count 21.000000 21
unique NaN 2
top NaN Down
freq NaN 14
mean 263.715714 NaN
std 23.360598 NaN
min 224.370000 NaN
25% 246.670000 NaN
50% 258.440000 NaN
75% 285.340000 NaN
max 302.740000 NaN

INTERMEDIATE PYTHON FOR FINANCE


Percentiles
[Link](percentiles=[.1, .5, .9])

Price Volumne
count 21.000000 2.100000e+01
mean 263.715714 7.551468e+07
std 23.360598 1.669757e+07
min 224.370000 4.689322e+07
10% 242.210000 5.479457e+07
50% 258.440000 7.505841e+07
90% 292.920000 1.004233e+08
max 302.740000 1.067212e+08

INTERMEDIATE PYTHON FOR FINANCE


Exclude
[Link](exclude='float')

Volumne Trend
count 2.100000e+01 21
unique NaN 2
top NaN Down
freq NaN 14
mean 7.551468e+07 NaN
std 1.669757e+07 NaN
min 4.689322e+07 NaN
25% 6.409497e+07 NaN

INTERMEDIATE PYTHON FOR FINANCE


Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Filtering data
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E

Kennedy Behrman
Data Engineer, Author, Founder
Introducing the data
[Link]()

INTERMEDIATE PYTHON FOR FINANCE


Introducing the data
[Link]()

Date Symbol High


0 2020-04-03 AAPL 245.70
1 2020-04-02 AAPL 245.15
2 2020-04-01 AAPL 248.72
3 2020-03-31 AAPL 262.49
4 2020-03-30 AAPL 255.52

INTERMEDIATE PYTHON FOR FINANCE


Introducing the data
[Link]()

INTERMEDIATE PYTHON FOR FINANCE


Introducing the data
[Link]()

High
count 378.000000
mean 881.593138
std 720.771922
min 227.490000
max 2185.950000

INTERMEDIATE PYTHON FOR FINANCE


Introducing the data
[Link](include='object')

Symbol
count 378
unique 3
top AMZN
freq 126

INTERMEDIATE PYTHON FOR FINANCE


Comparison operators
< <= > >= == !=

INTERMEDIATE PYTHON FOR FINANCE


Column comparison
[Link] > 2160

INTERMEDIATE PYTHON FOR FINANCE


Column comparison
[Link] > 2160

0 False
1 False
2 False
3 False
4 False
...
374 False
375 False
376 False
377 False

INTERMEDIATE PYTHON FOR FINANCE


Column comparison
[Link] == 'AAPL'

INTERMEDIATE PYTHON FOR FINANCE


Column comparison
[Link] == 'AAPL'

0 True
1 True
2 True
3 True
4 True
...
374 False
375 False
376 False
377 False

INTERMEDIATE PYTHON FOR FINANCE


Masking by symbol
mask_symbol = [Link] == 'AAPL'
aapl = [Link][mask_symbol]

INTERMEDIATE PYTHON FOR FINANCE


Masking by symbol
mask_symbol = [Link] == 'AAPL'
aapl = [Link][mask_symbol]
[Link](include='object')

Symbol
count 126
unique 1
top AAPL
freq 126

INTERMEDIATE PYTHON FOR FINANCE


Masking by price
mask_high = [Link] > 2160
big_price = [Link][mask_high]

INTERMEDIATE PYTHON FOR FINANCE


Masking by price
big_price.describe()

High
count 6.000000
mean 2177.406567
std 7.999334
min 2166.070000
max 2185.95000

INTERMEDIATE PYTHON FOR FINANCE


Pandas Boolean operators
And &

Or |

Not ~

INTERMEDIATE PYTHON FOR FINANCE


Combining conditions
mask_prices = prices['Symbol'] != 'AMZN'

mask_date = historical_highs['Date'] > datetime(2020, 4, 1)

mask_amzn = mask_prices & mask_date

[Link][mask_amzn]

INTERMEDIATE PYTHON FOR FINANCE


Combining conditions
Date Symbol High
0 2020-04-03 AAPL 245.7000
1 2020-04-02 AAPL 245.1500
252 2020-04-03 TSLA 515.4900
253 2020-04-02 TSLA 494.2599

INTERMEDIATE PYTHON FOR FINANCE


Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Plotting data
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E

Kennedy Behrman
Data Engineer, Author, Founder
Look at your data

INTERMEDIATE PYTHON FOR FINANCE


[Link]()

INTERMEDIATE PYTHON FOR FINANCE


Introducing the data
[Link]()

Date High Volume Month


0 2015-05-01 90.089996 198924100 May
1 2015-06-01 85.970001 238808600 Jun
2 2015-07-01 83.529999 274029000 Jul
3 2015-08-01 79.290001 387523600 Aug
4 2015-09-01 75.470001 316644500 Sep

INTERMEDIATE PYTHON FOR FINANCE


Matplotlib
my_dataframe.plot()

INTERMEDIATE PYTHON FOR FINANCE


Line plot
[Link](x='Date',
y='High' )

INTERMEDIATE PYTHON FOR FINANCE


INTERMEDIATE PYTHON FOR FINANCE
Rotate
[Link](x='Date',
y='High',
rot=90 ) rotation of the labels

INTERMEDIATE PYTHON FOR FINANCE


INTERMEDIATE PYTHON FOR FINANCE
Title
[Link](x='Date',
y='High',
rot=90,
title='Exxon Stock Price')

INTERMEDIATE PYTHON FOR FINANCE


INTERMEDIATE PYTHON FOR FINANCE
Index
exxon.set_index('Date', inplace=True)
[Link](y='High',
rot=90,
title='Exxon Stock Price')

INTERMEDIATE PYTHON FOR FINANCE


INTERMEDIATE PYTHON FOR FINANCE
Plot types
line density

bar area

barh pie

hist scatter

box hexbin

kde

INTERMEDIATE PYTHON FOR FINANCE


Bar
[Link](x='Month',
y='Volume',
kind='bar',
title='Exxon 2018')

INTERMEDIATE PYTHON FOR FINANCE


INTERMEDIATE PYTHON FOR FINANCE
Hist
[Link](y='High',kind='hist')

INTERMEDIATE PYTHON FOR FINANCE


INTERMEDIATE PYTHON FOR FINANCE
Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Wrapping up
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E

Kennedy Behrman
Data Engineer, Author, Founder
Chapter 1
Representing time Mapping data

datetime dict()

INTERMEDIATE PYTHON FOR FINANCE


Chapter 2
Comparison operators If statements

< <= > >=


if a < b:
print(a)
Equality operators

== != Loops

Boolean operators while a < b:


and or not a = a + 1

for a in c:
print(a)

INTERMEDIATE PYTHON FOR FINANCE


Chapter 3
Creating a DataFrame Aggregating, summarizing

DataFrame(data=data) [Link]()
pd.read_csv('/[Link]') [Link]()

Accessing data Extending, manipulating

[Link]['a', 'Values'] pce['PCESV'] = pcesv


[Link][2:22, 12] [Link]([Link], axis=1)

INTERMEDIATE PYTHON FOR FINANCE


Chapter 4
Peeking Plo ing

[Link]() [Link](x='Date',
[Link]() y='High' )
[Link]()

Filtering

mask = [Link] > 216


[Link][mask]

INTERMEDIATE PYTHON FOR FINANCE


Congratulations!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E

You might also like