Unit- V NUMPY and PANDAS
Numpy – Introduction – Computations using NumPy functions –
Computation on Arrays – Aggregation – Indexing and Sorting –
Pandas – Introduction and Basic Pandas Concepts – Data frames –
Data Handling
What is NumPy?
❖ NumPy is a Python library used for working with arrays.
❖ It also has functions for working in domain of linear algebra,
Fourier transform, and matrices.
❖ NumPy was created in 2005 by Travis Oliphant. It is an open
source project and it is free.
❖ NumPy stands for Numerical Python.
❖ The NumPy API is used extensively in Pandas, SciPy,
Matplotlib, scikit-learn, scikit-image and most other data
science and scientific Python packages.
Numpy array Vs python list
import numpy as np
import sys
b=range(1000)
print([Link](b))
c=[Link](1000)
print([Link])
Why Use NumPy?
❖ NumPy arrays are faster and more compact than Python lists.
❖ An array consumes less memory and is convenient to use.
❖ NumPy uses much less memory to store data and it provides
a mechanism of specifying the data types.
❖ This allows the code to be optimized even further.
❖ NumPy aims to provide an array object that is up to 50 times
faster than traditional Python lists.
❖ The array object in NumPy is called ndarray, it provides a lot
of supporting functions that make working with ndarray very
easy.
❖ Arrays are very frequently used in data science, where speed
and resources are very important.
Which Language is NumPy written in?
NumPy is a Python library and is written partially in Python, but
most of the parts that require fast computation are written in C or
C++.
Installation of NumPy
If Python already installed on a system, then installation of
NumPy is very easy.
Install it using this command:
C:\Users\primya>pip install numpy
Numpy
Import NumPy
Once NumPy is installed, import it in applications by adding the
import keyword:
import numpy
Ex:
import numpy
arr = [Link]([1, 2, 3, 4, 5])
print(arr)
NumPy as np
NumPy is usually imported under the np alias.
In Python alias are an alternate name for referring to the same
thing.
Create an alias with the as keyword while importing:
import numpy as np
Now the NumPy package can be referred to as np instead of
numpy.
Ex:
import numpy as np
arr = [Link]([1, 2, 3, 4, 5])
print(arr)
Checking NumPy Version
The version string is stored under __version__ attribute.
Ex:
import numpy as np
print(np.__version__)
Create a NumPy ndarray Object
NumPy is used to work with arrays. The array object in NumPy
is called ndarray.
Create a NumPy ndarray object by using the array() function.
Ex:
import numpy as np
arr = [Link]([1, 2, 3, 4, 5])
print(arr)
print(type(arr))
Create an ndarray
To create an ndarray, pass a list, tuple or any array-like object into
the array() method, and it will be converted into an ndarray:
Use a tuple to create a NumPy array:
import numpy as np
arr = [Link]((1, 2, 3, 4, 5))
print(arr)
Dimensions in Arrays
A dimension in arrays is one level of array depth (nested arrays).
Nested array: Arrays that have arrays as their elements.
0-D Arrays
0-D arrays, or Scalars, are the elements in an array. Each value in
an array is a 0-D array.
Ex:
Create a 0-D array with value 42
import numpy as np
arr = [Link](42)
print(arr)
1-D Arrays
An array that has 0-D arrays as its elements is called uni-
dimensional or 1-D array.
These are the most common and basic arrays.
Ex:
Create a 1-D array containing the values 1,2,3,4,5:
import numpy as np
arr = [Link]([1, 2, 3, 4, 5])
print(arr)
2-D Arrays
An array that has 1-D arrays as its elements is called a 2-D array.
These are often used to represent matrix or 2nd order tensors.
Ex:
Create a 2-D array containing two arrays with the values 1,2,3
and 4,5,6:
import numpy as np
arr = [Link]([[1, 2, 3], [4, 5, 6]])
print(arr)
3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-
D array.
These are often used to represent a 3rd order tensor.
Ex:
Create a 3-D array with two 2-D arrays, both containing two
arrays with the values 1,2,3 and 4,5,6:
import numpy as np
arr = [Link]([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
Check Number of Dimensions
NumPy Arrays provides the ndim attribute that returns an integer
that tells how many dimensions the array have.
Ex:
import numpy as np
a = [Link](42)
b = [Link]([1, 2, 3, 4, 5])
c = [Link]([[1, 2, 3], [4, 5, 6]])
d = [Link]([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print([Link])
print([Link])
print([Link])
print([Link])
Access Array Elements
❖ Array indexing is the same as accessing an array element.
❖ Access an array element by referring to its index number.
❖ The indexes in NumPy arrays start with 0, meaning that the first
element has index 0, and the second has index 1 etc.
Ex:
Get the first element from the following array:
import numpy as np
arr = [Link]([1, 2, 3, 4])
print(arr[0])
Ex:
Get third and fourth elements from the following array and add
them.
import numpy as np
arr = [Link]([1, 2, 3, 4])
print(arr[2] + arr[3])
Access 2-D Arrays
❖ To access elements from 2-D arrays ,use comma separated
integers representing the dimension and the index of the
element.
❖ Think of 2-D arrays like a table with rows and columns, where
the dimension represents the row and the index represents the
column.
Ex:
Access the element on the first row, second column:
import numpy as np
arr = [Link]([[1,2,3,4,5], [6,7,8,9,10]])
print('2nd element on 1st row: ', arr[0, 1])
Ex:
Access the element on the 2nd row, 5th column:
import numpy as np
arr = [Link]([[1,2,3,4,5], [6,7,8,9,10]])
print('5th element on 2nd row: ', arr[1, 4])
ndarray
The NumPy ndarray class is used to represent both matrices and
vectors.
A vector is an array with a single dimension (there’s no difference
between row and column vectors), while a matrix refers to an array with
two dimensions.
What are the attributes of an array?
❖ An array is usually a fixed-size container of items of the same
type and size.
❖ The number of dimensions and items in an array is defined by its
shape.
❖ The shape of an array is a tuple of non-negative integers that
specify the sizes of each dimension.
❖ In NumPy, dimensions are called axes.
❖ 2D array that looks like this:
[[0., 0., 0.],
[1., 1., 1.]]
The first axis has a length of 2 and the second axis has a length of 3.
Numpy array
To create a NumPy array, use the function [Link]().
import numpy as np
a = [Link]([1, 2, 3])
create an array filled with 0’s:
import numpy as np
print([Link](2))
Numpy array
Array filled with 1’s:
import numpy as np
print([Link](2))
Empty Array
The function empty creates an array whose initial content is
random and depends on the state of the memory.
The reason to use empty over zeros (or something similar) is speed
- just make sure to fill every element afterwards!
import numpy as np
print([Link](2))
Numpy array
Full Array
Return a new array with the same shape and type as a given array
filled with a fill_value.
[Link](shape,fillvalue)
import numpy as np
x=[Link]((2,3),50)
print(x)
import numpy as np
x=[Link]((2, 2), [1, 2])
print(x)
Numpy array
Array with a range of elements:
import numpy as np
print([Link](5))
import numpy as np
print([Link](2,9,2))
[Link]() to create an array with values that are spaced linearly
in a specified interval:
[Link](start,stop,num)
import numpy as np
print([Link](0, 10, num=3))
Numpy array
Specifying data type
import numpy as np
x = [Link](2, dtype=np.float64)
print(x)
sorting elements
Sorting an element is simple with [Link]().
import numpy as np
arr = [Link]([2, 1, 5, 3, 7, 4, 6, 8])
print([Link](arr))
Numpy array
[Link]()
import numpy as np
a = [Link]([1, 2, 3, 4])
b = [Link]([5, 6, 7, 8])
x=[Link]((a, b))
print(x)
import numpy as np
x = [Link]([[1, 2], [3, 4]])
y = [Link]([[5, 6]])
x=[Link]((x, y), axis=0)
print(x)
Numpy array
[Link] will tell the number of axes, or dimensions, of the
array.
[Link] will tell the total number of elements of the array. This
is the product of the elements of the array’s shape.
[Link] will display a tuple of integers that indicate the
number of elements stored along each dimension of the array. If,
for example, a 2-D array with 2 rows and 3 columns, the shape of
array is (2, 3).
Numpy array
import numpy as np
x=[Link]([[[0, 1, 2, 3],[4, 5, 6, 7]],[[0, 1, 2, 3],[4, 5, 6, 7]],
[[0 ,1 ,2, 3],[4, 5, 6, 7]]])
print([Link])
print([Link])
print([Link])
Numpy array
reshape() will give a new shape to an array without changing the
data.
Ex:
import numpy as np
a = [Link](6)
print(a)
b=[Link](3,2)
print(b)
Indexing and slicing
import numpy as np
d=[Link]([1,3,5,7,9,34,67,44,99])
print(d[1])
print(d[0:2])
print(d[1:])
print(d[:1])
print(d[-2:])
import numpy as np
a = [Link]([[1 , 2, 3, 4], [5, 6, 7, 8], [9, 10, 11,
12]])
print(a[a<=5])
print(a[a>=5])
Indexing and slicing
import numpy as np
a=[Link]([10,20,30])
print(a[::])
import numpy as np
a=[Link]([[10,20,30],[40,50,60]])
print(a[0,0:2])print(a[1,0:2])
import numpy as np a=[Link]([[[10,20,30],[40,50,60]],
[[11,12,13],[14,15,16]]])
print(a[0,1,0:2])
print(a[1,1,0:2])
Aggregation
Sum()
import numpy as np
a=[Link]([1,3,4])
print(a)
average()
import numpy as np
a=[Link]([1,3,4])
print(a)
min()
import numpy as np
a=[Link]([1,3,4])
print(a)
Aggregation
max()
import numpy as np
a=[Link]([1,3,4])
print(a)
prod()
import numpy as np
a=[Link]([1,3,4])
print(a)
Aggregation
mean()
Python numpy mean function returns the mean or average of a given
array or in a given axis.
The mathematical formula for this numpy mean is the sum of all the
items in an array / total array of elements.
Ex:
import numpy as np
x=[Link]([3,4,5])
a=[Link](x)
print(a)
Aggregation
mean()
import numpy as np
x=[Link]([[3,4,5],[5,6,7]])
a=[Link](axis=1)
b=[Link](axis=0)
print(a)
print(b)
Aggregation
Python numpy sum function allows to use an optional argument
called an axis.
This Python numpy Aggregate Function helps to calculate the sum of
a given axis.
For example,
axis = 0 returns the sum of each column in a Numpy array.
axis = 1 returns the sum of each row in an array
Ex:
import numpy as np
a1=[Link]([[10,20,30],[2,4,6]])
x=[Link](axis=1)
y=[Link](0)
print(x)
print(y)
Aggregation
maximum()
import numpy as np
a=[Link]([10,12,4,3,7])
b=[Link]([5,4,4,2,8])
print([Link](a,b))
randint()
import numpy as np
x = [Link](1, 10, size = (2, 2))
print(x)
y = [Link](1, 10, size = (2, 2))
print(y)
print('\n-----Maximum Array----')
print([Link](x, y))
Pandas
❖ Pandas is a open-source Python library used for working with
data sets.(Python Data analysis Library)
❖ It has functions for analyzing, cleaning, exploring, and
manipulating data.
❖ This library is built on top of the NumPy library.
❖ Pandas is fast and it has high performance & productivity for
users.
Checking Pandas Version
import pandas as pd
print(pd.__version__)
Advantages
❖ Fast and efficient for manipulating and analyzing data.
❖ Data from different file objects can be loaded.
❖ Easy handling of missing data (represented as NaN) in floating
point as well as non-floating point data
❖ Data set merging and joining.
❖ Flexible reshaping and pivoting of data sets
❖ Provides time-series functionality.
Installation
If Python already installed on a system, then installation of pandas
is very easy.
Install it using this command:
C:\Users\primya>pip install pandas
After the pandas have been installed into the system, need to import
the library. This module is generally imported as:
import pandas as pd
pd is referred to as an alias to the Pandas. However, it is not
necessary to import the library using the alias, it just helps in
writing less amount code every time a method or property is called.
Two types
Pandas generally provide two data structures for manipulating data,
They are:
1. Series 2. DataFrame
Series
❖ A Pandas Series is like a column in a table.
❖ Pandas Series is a one-dimensional labeled array capable of
holding data of any type (integer, string, float, python objects,
etc.).
❖ The axis labels are collectively called index. Pandas Series is
nothing but a column in an excel sheet.
❖ Labels need not be unique but must be a hashable type.
❖ The object supports both integer and label-based indexing and
provides a host of methods for performing operations involving
the index.
Pandas-Series
Creating an empty Series:
❖ Series() function of Pandas is used to create a series.
❖ A basic series, which can be created is an Empty Series.
❖ By default, the data type of Series is float.
Ex:
import numpy as n
import pandas as p
ser = [Link]()
print(ser)
O/P: Series([], dtype: float64)
Pandas-Series
Creating a series from array:
In order to create a series from NumPy array, have to import
numpy module and have to use array() function.
Ex:
import numpy as n
import pandas as p
data = [Link](['C', 'S', 'E', '-', 'A'])
ser = [Link](data)
print(ser)
Pandas-Series
Creating a series from array with an index
In order to create a series by explicitly proving index instead of the
default, have to provide a list of elements to the index parameter
with the same number of elements as it is an array.
Ex:
import numpy as n
import pandas as p
data = [Link](['C', 'S', 'E', '-', 'A'])
ser = [Link](data, index=[10, 11, 12, 13, 14])
print(ser)
Pandas-Series
Creating a series from Lists:
In order to create a series from list, have to first create a list after
that can create a series from list.
Ex:
import numpy as n
import pandas as p
list = ['C', 'S', 'E', '-', 'A']
ser = [Link](list)
print(ser)
Pandas-Series
Creating a series from Dictionary:
In order to create a series from the dictionary, have to first create a
dictionary after that can make a series using dictionary.
Dictionary keys are used to construct indexes of Series.
Ex:
import numpy as n
import pandas as p
dict = {‘CSE': 10, ‘A': 20, ‘KPR': 30}
ser = [Link](dict)
print(ser)
Pandas-Series
Creating a series from Scalar value:
In order to create a series from scalar value, an index must be
provided. The scalar value will be repeated to match the length of
the index.
Ex:
import numpy as n
import pandas as p
ser = [Link](10, index=[0, 1, 2, 3, 4, 5])
print(ser)
Pandas-Series
Creating a series using NumPy functions :
In order to create a series using numpy function, can use different
function of numpy like [Link](), [Link]().
Ex:
import numpy as n
import pandas as p
ser1 = [Link]([Link](3, 33, 3))
print(ser1)
ser2 = [Link]([Link](1, 100, 10))
print(ser2)
Pandas-Series
Pandas-Series
Pandas-Series
Accessing element of Series
There are two ways through which we can access element of series,
they are :
❖ Accessing Element from Series with Position
❖ Accessing Element Using Label (index)
❖Accessing Element from Series with Position
❖ In order to access the series element refers to the index number.
❖ Use the index operator [ ] to access an element in a series.
❖ The index must be an integer.
❖ In order to access multiple elements from a series, use Slice
operation.
❖ Slice operation is performed on Series with the use of the
colon(:).
Pandas
Pandas
Accessing Element Using Label
(index)
In order to access an element from series, have to set values by
index label.
Pandas
Accessing multiple elements
Indexing and Selecting Data in Series
❖ Indexing in pandas means simply selecting particular data from
a Series.
❖ Indexing can also be known as Subset Selection.
Indexing a Series using indexing operator [] :
❖ Indexing operator is used to refer to the square brackets
following an object.
❖ The .loc and .iloc indexers also use the indexing operator to
make selections.
[Link]
[Link] Name Roll no Class
1 A 11 cse
2 B 111 eee
3 C 23 ece
4 D 34 mech
5 E 55 AAA
6 F 77 FFF
7 G 88 ERD
8 H 9 GGG
9 I 92 YYY
10 J 44 UUU
Pandas
Pandas
Binary Operation on Series
We can perform binary operation on series like addition,
subtraction and many other operation. In order to perform binary
operation on series we have to use function like .add(),.sub()
Pandas
DataFrames
❖ Data sets in Pandas are usually multi-dimensional tables
❖ Series is like a column, a DataFrame is the whole table.
❖ Pandas DataFrame consists of three principal components, the
data, rows, and columns.
DataFrames
Pandas
Dealing with Rows and Columns
In Order to select a column in Pandas DataFrame, can either access
the columns by calling them by their columns name.
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
By default, data_frame.head() displays the first five rows
and data_frame.tail() displays last five rows.
Pandas
[Link]
[Link] Name Roll no Class
1 A 11 cse
2 B 111 eee
3 C 23 ece
4 D 34 mech
5 E 55 AAA
6 F 77
7 G 88 ERD
8 H 9 GGG
9 I 92 YYY
10 J 44 UUU
Pandas
Pandas
Numpy Vs Pandas