0% found this document useful (0 votes)
84 views59 pages

Essential Python Libraries for Data Science

The document provides an overview of various Python libraries essential for data science, including NumPy, Pandas, Matplotlib, and machine learning libraries like Scikit-learn and TensorFlow. It details the functionalities of these libraries, such as data manipulation, statistical analysis, and data visualization techniques. Additionally, it discusses the creation and manipulation of NumPy arrays, as well as the use of box plots for data distribution analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views59 pages

Essential Python Libraries for Data Science

The document provides an overview of various Python libraries essential for data science, including NumPy, Pandas, Matplotlib, and machine learning libraries like Scikit-learn and TensorFlow. It details the functionalities of these libraries, such as data manipulation, statistical analysis, and data visualization techniques. Additionally, it discusses the creation and manipulation of NumPy arrays, as well as the use of box plots for data distribution analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Python Library

NumPy, Pandas and Matplotlib


Python libraries for Data science
• Data Science is the process deriving useful
insights from data in order to solve real-world
problems.

• Other side, Machine learning is the process that


focuses on getting machine to make decision by
feeding them data.

• Machine learning is part of Data science.


Python Library
• Python library is a collection of functions and
methods that allows you/user to perform
many action without writing complex code.
Python libraries for statistical analysis
• NumPy (numerical computing / complex mathematical computation)
– Scientific Computations
– Multi-dimensional array objects
– Data manipulation
• SciPy
– Mathematical computations
– Provide a collection of sub-packages
– Advanced linear algebra functions
– Support for signal processing
• Pandas (data manipulation with pandas)
– Data frame objects
– Process large data sets
– Complex data analysis
– Time series data
Python libraries for data visualization
• Matplotlib (Bar plot, graph chart, scatter plot)
– Plot a variety of graphs
– Extract quantitative info
– Pyplot module (Similar to MATLAB)
– Integrate with tools
• Seaborn
– Compatible with various data formats
– Support for automated statistical estimation
– High level abstractions
– Support for build-in themes (graph themes)
• Plotly
– In-build API
• Bokeh
• Create complex statistical graph
• Integration with flask and django
Python libraries for machine learning
• Scikit-learn
– Provides a set of standard datasets
– Machine learning algorithm
– In-build functions for feature extraction and selection
– Model evaluation
• XGBoost
– eXtreme Gradient Boosting
– Fast data processing
– Support parallel computation
– Provides internal parameters for evaluation
– Higher Accuracy
• Eli5
Python library for Deep learning
• TensorFlow
– Build and train multiple neural networks
– Statistical analysis
– In-build functions to improve the accuracy
– Comes with an in-build visualizer
• Keras
– Support various type of neural networks
– Advanced neural networks computatons
– Provide pre-processed dataset
– Easily extensible
• Pytorch
– APIs to integrate with ML/Data science frameworks
– Support for multi-dimensional arrays
– 200+ mathematical operations
– Build Dynamic computation graphs
NumPy
• Numerical Python (Core library for numeric
and scientific computing)
• It consists of multi- dimensional array objects
and a collection of routines (methods) for
processing those arrays.

Work with multi-dimensional array and provide


methods to process it….
Creating NumPy Array

Load library

(Method array of numpy, pass list of list of


element inside array) (2 row ad 4 column)

(Method array () of numpy, pass list


of element inside array)
Initializing NumPy Array with zeros
 create list
--Call zeors(row, col) method of NumPy

Check type of array


Initializing NumPy Array with some number
--Call full((dimensions),value) method of NumPy
full((row,col),val)
Initializing NumPy Array within a range
--Call arange(initial value , final value) method of NumPy
--Final value is not included in range

--If find values given in series (steps/gaps)

Final value is excluded ..

Initializing NumPy Array with random numbers


--Call [Link](initial value , final value, no of random int) method of NumPy

Random is sub module inside NumPy and randint() method of random.


Checking the shape of NumPy arrays

Change the shape of NumPy arrays


Join NumPy arrays
-vstack()--vertical
-hstack()--horizontal
-column_stack()
NumPy Intersection and Difference
--Intersection: Find common in two array
--Difference: Find unique in two array

n1-n2

n2-n1

One id
NumPy Array Mathematics
--Addition of NumPy Arrays

--Add column values (axis=0)

--Add row values (axis=1)


NumPy Math Functions
Mean = {Sum of Observation} ÷ {Total numbers
of Observations}

5, 11, 44, 67, 85, 96


NumPy Save and Load
--Saving NumPy Array
-- Loading NumPy Array
Single object, Data frame
List of values
Labeled
Passing dictionary
+,-,*./
Multi dimensional data structures

Like excel sheet

Dictionary
Key (Name, Marks) and
their values in list
First five rows of data frame: head()
Last five rows of data frame: tail()
No. of row and col. In data frame: shape()
Information of data frame: describe()

25% - The 25% percentile*.


50% - The 50% percentile*.
75% - The 75% percentile*.
*Percentile meaning: how many of the values
are less than the given percentile.
Extract value from dataset

method

Extract first three


rows and two col.

Row, col
11 is excusive
Put column name directly, not index
0 and 3 both are included (04 record)
Axis=1 [Link]()

Axis=0
Apply category col.
sub module of matplotlib

Create Numpy array

Linear Relationship
attributes
Subplot(row, col, index)
Understand the distribution of categorical value
[Link]
Adding two
markers in the
same plot
It will use for numerical values
Max value

75 % value

50 % value median

25 % value

Min value

Applications
It is used to know:
-The outliers and their values
-Symmetry of Data
-Tight grouping of data
-Data skewness – if, in which direction and how
Box Plot Example
Example:
Find the maximum, minimum, median, first quartile,
third quartile for the given data set: 23, 42, 12, 10, 15,
14, 9.
Solution:
Given: 23, 42, 12, 10, 15, 14, 9.
Arrange the given dataset in ascending order.
9, 10, 12, 14, 15, 23, 42
Hence,
Minimum = 9
Maximum = 42
Median = 14
First Quartile = 10 (Middle value of 9, 10, 12 is 10)
Third Quartile = 23 (Middle value of 15, 23, 42 is 23).

often used in explanatory data analysis. Box plots visually show the
distribution of numerical data and skewness through displaying the
data quartiles (or percentiles) and averages.

descriptive statistics box plot or boxplot (also known as box and


whisker plot
Check the distribution of categorical values in %
Outer circle
Thank You

You might also like