Python Library
NumPy, Pandas and Matplotlib
Python libraries for Data science
• Data Science is the process deriving useful
insights from data in order to solve real-world
problems.
• Other side, Machine learning is the process that
focuses on getting machine to make decision by
feeding them data.
• Machine learning is part of Data science.
Python Library
• Python library is a collection of functions and
methods that allows you/user to perform
many action without writing complex code.
Python libraries for statistical analysis
• NumPy (numerical computing / complex mathematical computation)
– Scientific Computations
– Multi-dimensional array objects
– Data manipulation
• SciPy
– Mathematical computations
– Provide a collection of sub-packages
– Advanced linear algebra functions
– Support for signal processing
• Pandas (data manipulation with pandas)
– Data frame objects
– Process large data sets
– Complex data analysis
– Time series data
Python libraries for data visualization
• Matplotlib (Bar plot, graph chart, scatter plot)
– Plot a variety of graphs
– Extract quantitative info
– Pyplot module (Similar to MATLAB)
– Integrate with tools
• Seaborn
– Compatible with various data formats
– Support for automated statistical estimation
– High level abstractions
– Support for build-in themes (graph themes)
• Plotly
– In-build API
• Bokeh
• Create complex statistical graph
• Integration with flask and django
Python libraries for machine learning
• Scikit-learn
– Provides a set of standard datasets
– Machine learning algorithm
– In-build functions for feature extraction and selection
– Model evaluation
• XGBoost
– eXtreme Gradient Boosting
– Fast data processing
– Support parallel computation
– Provides internal parameters for evaluation
– Higher Accuracy
• Eli5
Python library for Deep learning
• TensorFlow
– Build and train multiple neural networks
– Statistical analysis
– In-build functions to improve the accuracy
– Comes with an in-build visualizer
• Keras
– Support various type of neural networks
– Advanced neural networks computatons
– Provide pre-processed dataset
– Easily extensible
• Pytorch
– APIs to integrate with ML/Data science frameworks
– Support for multi-dimensional arrays
– 200+ mathematical operations
– Build Dynamic computation graphs
NumPy
• Numerical Python (Core library for numeric
and scientific computing)
• It consists of multi- dimensional array objects
and a collection of routines (methods) for
processing those arrays.
Work with multi-dimensional array and provide
methods to process it….
Creating NumPy Array
Load library
(Method array of numpy, pass list of list of
element inside array) (2 row ad 4 column)
(Method array () of numpy, pass list
of element inside array)
Initializing NumPy Array with zeros
create list
--Call zeors(row, col) method of NumPy
Check type of array
Initializing NumPy Array with some number
--Call full((dimensions),value) method of NumPy
full((row,col),val)
Initializing NumPy Array within a range
--Call arange(initial value , final value) method of NumPy
--Final value is not included in range
--If find values given in series (steps/gaps)
Final value is excluded ..
Initializing NumPy Array with random numbers
--Call [Link](initial value , final value, no of random int) method of NumPy
Random is sub module inside NumPy and randint() method of random.
Checking the shape of NumPy arrays
Change the shape of NumPy arrays
Join NumPy arrays
-vstack()--vertical
-hstack()--horizontal
-column_stack()
NumPy Intersection and Difference
--Intersection: Find common in two array
--Difference: Find unique in two array
n1-n2
n2-n1
One id
NumPy Array Mathematics
--Addition of NumPy Arrays
--Add column values (axis=0)
--Add row values (axis=1)
NumPy Math Functions
Mean = {Sum of Observation} ÷ {Total numbers
of Observations}
5, 11, 44, 67, 85, 96
NumPy Save and Load
--Saving NumPy Array
-- Loading NumPy Array
Single object, Data frame
List of values
Labeled
Passing dictionary
+,-,*./
Multi dimensional data structures
Like excel sheet
Dictionary
Key (Name, Marks) and
their values in list
First five rows of data frame: head()
Last five rows of data frame: tail()
No. of row and col. In data frame: shape()
Information of data frame: describe()
25% - The 25% percentile*.
50% - The 50% percentile*.
75% - The 75% percentile*.
*Percentile meaning: how many of the values
are less than the given percentile.
Extract value from dataset
method
Extract first three
rows and two col.
Row, col
11 is excusive
Put column name directly, not index
0 and 3 both are included (04 record)
Axis=1 [Link]()
Axis=0
Apply category col.
sub module of matplotlib
Create Numpy array
Linear Relationship
attributes
Subplot(row, col, index)
Understand the distribution of categorical value
[Link]
Adding two
markers in the
same plot
It will use for numerical values
Max value
75 % value
50 % value median
25 % value
Min value
Applications
It is used to know:
-The outliers and their values
-Symmetry of Data
-Tight grouping of data
-Data skewness – if, in which direction and how
Box Plot Example
Example:
Find the maximum, minimum, median, first quartile,
third quartile for the given data set: 23, 42, 12, 10, 15,
14, 9.
Solution:
Given: 23, 42, 12, 10, 15, 14, 9.
Arrange the given dataset in ascending order.
9, 10, 12, 14, 15, 23, 42
Hence,
Minimum = 9
Maximum = 42
Median = 14
First Quartile = 10 (Middle value of 9, 10, 12 is 10)
Third Quartile = 23 (Middle value of 15, 23, 42 is 23).
often used in explanatory data analysis. Box plots visually show the
distribution of numerical data and skewness through displaying the
data quartiles (or percentiles) and averages.
descriptive statistics box plot or boxplot (also known as box and
whisker plot
Check the distribution of categorical values in %
Outer circle
Thank You