0% found this document useful (0 votes)
4 views89 pages

NumPy and Pandas: Data Handling Guide

Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views89 pages

NumPy and Pandas: Data Handling Guide

Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Unit- V NUMPY and PANDAS

Numpy – Introduction – Computations using NumPy functions –


Computation on Arrays – Aggregation – Indexing and Sorting –
Pandas – Introduction and Basic Pandas Concepts – Data frames –
Data Handling
What is NumPy?

❖ NumPy is a Python library used for working with arrays.


❖ It also has functions for working in domain of linear algebra,
Fourier transform, and matrices.
❖ NumPy was created in 2005 by Travis Oliphant. It is an open
source project and it is free.
❖ NumPy stands for Numerical Python.
❖ The NumPy API is used extensively in Pandas, SciPy,
Matplotlib, scikit-learn, scikit-image and most other data
science and scientific Python packages.
Numpy array Vs python list

import numpy as np
import sys
b=range(1000)
print([Link](b))
c=[Link](1000)
print([Link])
Why Use NumPy?
❖ NumPy arrays are faster and more compact than Python lists.
❖ An array consumes less memory and is convenient to use.
❖ NumPy uses much less memory to store data and it provides
a mechanism of specifying the data types.
❖ This allows the code to be optimized even further.
❖ NumPy aims to provide an array object that is up to 50 times
faster than traditional Python lists.
❖ The array object in NumPy is called ndarray, it provides a lot
of supporting functions that make working with ndarray very
easy.
❖ Arrays are very frequently used in data science, where speed
and resources are very important.
Which Language is NumPy written in?

NumPy is a Python library and is written partially in Python, but


most of the parts that require fast computation are written in C or
C++.

Installation of NumPy

If Python already installed on a system, then installation of


NumPy is very easy.
Install it using this command:

C:\Users\primya>pip install numpy


Numpy

Import NumPy
Once NumPy is installed, import it in applications by adding the
import keyword:

import numpy

Ex:
import numpy
arr = [Link]([1, 2, 3, 4, 5])
print(arr)
NumPy as np
NumPy is usually imported under the np alias.
In Python alias are an alternate name for referring to the same
thing.
Create an alias with the as keyword while importing:
import numpy as np
Now the NumPy package can be referred to as np instead of
numpy.

Ex:
import numpy as np
arr = [Link]([1, 2, 3, 4, 5])
print(arr)
Checking NumPy Version
The version string is stored under __version__ attribute.

Ex:

import numpy as np
print(np.__version__)
Create a NumPy ndarray Object

NumPy is used to work with arrays. The array object in NumPy


is called ndarray.

Create a NumPy ndarray object by using the array() function.

Ex:
import numpy as np
arr = [Link]([1, 2, 3, 4, 5])
print(arr)
print(type(arr))
Create an ndarray

To create an ndarray, pass a list, tuple or any array-like object into


the array() method, and it will be converted into an ndarray:

Use a tuple to create a NumPy array:


import numpy as np
arr = [Link]((1, 2, 3, 4, 5))
print(arr)
Dimensions in Arrays

A dimension in arrays is one level of array depth (nested arrays).


Nested array: Arrays that have arrays as their elements.
0-D Arrays
0-D arrays, or Scalars, are the elements in an array. Each value in
an array is a 0-D array.
Ex:
Create a 0-D array with value 42
import numpy as np
arr = [Link](42)
print(arr)
1-D Arrays

An array that has 0-D arrays as its elements is called uni-


dimensional or 1-D array.

These are the most common and basic arrays.

Ex:
Create a 1-D array containing the values 1,2,3,4,5:
import numpy as np
arr = [Link]([1, 2, 3, 4, 5])
print(arr)
2-D Arrays

An array that has 1-D arrays as its elements is called a 2-D array.

These are often used to represent matrix or 2nd order tensors.

Ex:
Create a 2-D array containing two arrays with the values 1,2,3
and 4,5,6:

import numpy as np
arr = [Link]([[1, 2, 3], [4, 5, 6]])
print(arr)
3-D arrays

An array that has 2-D arrays (matrices) as its elements is called 3-


D array.
These are often used to represent a 3rd order tensor.
Ex:
Create a 3-D array with two 2-D arrays, both containing two
arrays with the values 1,2,3 and 4,5,6:

import numpy as np
arr = [Link]([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
Check Number of Dimensions

NumPy Arrays provides the ndim attribute that returns an integer


that tells how many dimensions the array have.
Ex:
import numpy as np
a = [Link](42)
b = [Link]([1, 2, 3, 4, 5])
c = [Link]([[1, 2, 3], [4, 5, 6]])
d = [Link]([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print([Link])
print([Link])
print([Link])
print([Link])
Access Array Elements

❖ Array indexing is the same as accessing an array element.


❖ Access an array element by referring to its index number.
❖ The indexes in NumPy arrays start with 0, meaning that the first
element has index 0, and the second has index 1 etc.

Ex:
Get the first element from the following array:
import numpy as np
arr = [Link]([1, 2, 3, 4])
print(arr[0])
Ex:

Get third and fourth elements from the following array and add
them.

import numpy as np
arr = [Link]([1, 2, 3, 4])
print(arr[2] + arr[3])
Access 2-D Arrays

❖ To access elements from 2-D arrays ,use comma separated


integers representing the dimension and the index of the
element.
❖ Think of 2-D arrays like a table with rows and columns, where
the dimension represents the row and the index represents the
column.
Ex:
Access the element on the first row, second column:

import numpy as np
arr = [Link]([[1,2,3,4,5], [6,7,8,9,10]])
print('2nd element on 1st row: ', arr[0, 1])
Ex:

Access the element on the 2nd row, 5th column:

import numpy as np
arr = [Link]([[1,2,3,4,5], [6,7,8,9,10]])
print('5th element on 2nd row: ', arr[1, 4])
ndarray

The NumPy ndarray class is used to represent both matrices and


vectors.

A vector is an array with a single dimension (there’s no difference


between row and column vectors), while a matrix refers to an array with
two dimensions.
What are the attributes of an array?

❖ An array is usually a fixed-size container of items of the same


type and size.
❖ The number of dimensions and items in an array is defined by its
shape.
❖ The shape of an array is a tuple of non-negative integers that
specify the sizes of each dimension.
❖ In NumPy, dimensions are called axes.
❖ 2D array that looks like this:
[[0., 0., 0.],
[1., 1., 1.]]
The first axis has a length of 2 and the second axis has a length of 3.
Numpy array

To create a NumPy array, use the function [Link]().


import numpy as np
a = [Link]([1, 2, 3])

create an array filled with 0’s:


import numpy as np
print([Link](2))
Numpy array

Array filled with 1’s:


import numpy as np
print([Link](2))

Empty Array
The function empty creates an array whose initial content is
random and depends on the state of the memory.
The reason to use empty over zeros (or something similar) is speed
- just make sure to fill every element afterwards!
import numpy as np
print([Link](2))
Numpy array

Full Array
Return a new array with the same shape and type as a given array
filled with a fill_value.
[Link](shape,fillvalue)

import numpy as np
x=[Link]((2,3),50)
print(x)

import numpy as np
x=[Link]((2, 2), [1, 2])
print(x)
Numpy array

Array with a range of elements:


import numpy as np
print([Link](5))

import numpy as np
print([Link](2,9,2))

[Link]() to create an array with values that are spaced linearly


in a specified interval:
[Link](start,stop,num)
import numpy as np
print([Link](0, 10, num=3))
Numpy array

Specifying data type

import numpy as np
x = [Link](2, dtype=np.float64)
print(x)

sorting elements
Sorting an element is simple with [Link]().
import numpy as np
arr = [Link]([2, 1, 5, 3, 7, 4, 6, 8])
print([Link](arr))
Numpy array

[Link]()
import numpy as np
a = [Link]([1, 2, 3, 4])
b = [Link]([5, 6, 7, 8])
x=[Link]((a, b))
print(x)

import numpy as np
x = [Link]([[1, 2], [3, 4]])
y = [Link]([[5, 6]])
x=[Link]((x, y), axis=0)
print(x)
Numpy array

[Link] will tell the number of axes, or dimensions, of the


array.
[Link] will tell the total number of elements of the array. This
is the product of the elements of the array’s shape.
[Link] will display a tuple of integers that indicate the
number of elements stored along each dimension of the array. If,
for example, a 2-D array with 2 rows and 3 columns, the shape of
array is (2, 3).
Numpy array

import numpy as np
x=[Link]([[[0, 1, 2, 3],[4, 5, 6, 7]],[[0, 1, 2, 3],[4, 5, 6, 7]],
[[0 ,1 ,2, 3],[4, 5, 6, 7]]])
print([Link])
print([Link])
print([Link])
Numpy array

reshape() will give a new shape to an array without changing the


data.
Ex:
import numpy as np
a = [Link](6)
print(a)
b=[Link](3,2)
print(b)
Indexing and slicing

import numpy as np
d=[Link]([1,3,5,7,9,34,67,44,99])
print(d[1])
print(d[0:2])
print(d[1:])
print(d[:1])
print(d[-2:])

import numpy as np
a = [Link]([[1 , 2, 3, 4], [5, 6, 7, 8], [9, 10, 11,
12]])
print(a[a<=5])
print(a[a>=5])
Indexing and slicing

import numpy as np
a=[Link]([10,20,30])
print(a[::])

import numpy as np
a=[Link]([[10,20,30],[40,50,60]])
print(a[0,0:2])print(a[1,0:2])

import numpy as np a=[Link]([[[10,20,30],[40,50,60]],


[[11,12,13],[14,15,16]]])
print(a[0,1,0:2])
print(a[1,1,0:2])
Aggregation

Sum()
import numpy as np
a=[Link]([1,3,4])
print(a)
average()
import numpy as np
a=[Link]([1,3,4])
print(a)
min()
import numpy as np
a=[Link]([1,3,4])
print(a)
Aggregation

max()
import numpy as np
a=[Link]([1,3,4])
print(a)

prod()
import numpy as np
a=[Link]([1,3,4])
print(a)
Aggregation

mean()
Python numpy mean function returns the mean or average of a given
array or in a given axis.
The mathematical formula for this numpy mean is the sum of all the
items in an array / total array of elements.

Ex:
import numpy as np
x=[Link]([3,4,5])
a=[Link](x)
print(a)
Aggregation

mean()

import numpy as np
x=[Link]([[3,4,5],[5,6,7]])
a=[Link](axis=1)
b=[Link](axis=0)
print(a)
print(b)
Aggregation
Python numpy sum function allows to use an optional argument
called an axis.
This Python numpy Aggregate Function helps to calculate the sum of
a given axis.
For example,
axis = 0 returns the sum of each column in a Numpy array.
axis = 1 returns the sum of each row in an array
Ex:
import numpy as np
a1=[Link]([[10,20,30],[2,4,6]])
x=[Link](axis=1)
y=[Link](0)
print(x)
print(y)
Aggregation

maximum()
import numpy as np
a=[Link]([10,12,4,3,7])
b=[Link]([5,4,4,2,8])
print([Link](a,b))
randint()
import numpy as np
x = [Link](1, 10, size = (2, 2))
print(x)
y = [Link](1, 10, size = (2, 2))
print(y)
print('\n-----Maximum Array----')
print([Link](x, y))
Pandas

❖ Pandas is a open-source Python library used for working with


data sets.(Python Data analysis Library)
❖ It has functions for analyzing, cleaning, exploring, and
manipulating data.
❖ This library is built on top of the NumPy library.
❖ Pandas is fast and it has high performance & productivity for
users.

Checking Pandas Version


import pandas as pd
print(pd.__version__)
Advantages

❖ Fast and efficient for manipulating and analyzing data.

❖ Data from different file objects can be loaded.

❖ Easy handling of missing data (represented as NaN) in floating


point as well as non-floating point data

❖ Data set merging and joining.

❖ Flexible reshaping and pivoting of data sets

❖ Provides time-series functionality.


Installation

If Python already installed on a system, then installation of pandas


is very easy.
Install it using this command:
C:\Users\primya>pip install pandas

After the pandas have been installed into the system, need to import
the library. This module is generally imported as:
import pandas as pd

pd is referred to as an alias to the Pandas. However, it is not


necessary to import the library using the alias, it just helps in
writing less amount code every time a method or property is called.
Two types
Pandas generally provide two data structures for manipulating data,
They are:
1. Series 2. DataFrame
Series
❖ A Pandas Series is like a column in a table.
❖ Pandas Series is a one-dimensional labeled array capable of
holding data of any type (integer, string, float, python objects,
etc.).
❖ The axis labels are collectively called index. Pandas Series is
nothing but a column in an excel sheet.
❖ Labels need not be unique but must be a hashable type.
❖ The object supports both integer and label-based indexing and
provides a host of methods for performing operations involving
the index.
Pandas-Series

Creating an empty Series:


❖ Series() function of Pandas is used to create a series.
❖ A basic series, which can be created is an Empty Series.
❖ By default, the data type of Series is float.

Ex:
import numpy as n
import pandas as p
ser = [Link]()
print(ser)
O/P: Series([], dtype: float64)
Pandas-Series

Creating a series from array:


In order to create a series from NumPy array, have to import
numpy module and have to use array() function.

Ex:
import numpy as n
import pandas as p
data = [Link](['C', 'S', 'E', '-', 'A'])
ser = [Link](data)
print(ser)
Pandas-Series

Creating a series from array with an index


In order to create a series by explicitly proving index instead of the
default, have to provide a list of elements to the index parameter
with the same number of elements as it is an array.
Ex:
import numpy as n
import pandas as p
data = [Link](['C', 'S', 'E', '-', 'A'])
ser = [Link](data, index=[10, 11, 12, 13, 14])
print(ser)
Pandas-Series

Creating a series from Lists:


In order to create a series from list, have to first create a list after
that can create a series from list.
Ex:
import numpy as n
import pandas as p
list = ['C', 'S', 'E', '-', 'A']
ser = [Link](list)
print(ser)
Pandas-Series

Creating a series from Dictionary:

In order to create a series from the dictionary, have to first create a


dictionary after that can make a series using dictionary.
Dictionary keys are used to construct indexes of Series.
Ex:
import numpy as n
import pandas as p
dict = {‘CSE': 10, ‘A': 20, ‘KPR': 30}
ser = [Link](dict)
print(ser)
Pandas-Series

Creating a series from Scalar value:


In order to create a series from scalar value, an index must be
provided. The scalar value will be repeated to match the length of
the index.

Ex:
import numpy as n
import pandas as p
ser = [Link](10, index=[0, 1, 2, 3, 4, 5])
print(ser)
Pandas-Series
Creating a series using NumPy functions :
In order to create a series using numpy function, can use different
function of numpy like [Link](), [Link]().

Ex:
import numpy as n
import pandas as p
ser1 = [Link]([Link](3, 33, 3))
print(ser1)
ser2 = [Link]([Link](1, 100, 10))
print(ser2)
Pandas-Series
Pandas-Series
Pandas-Series
Accessing element of Series

There are two ways through which we can access element of series,
they are :

❖ Accessing Element from Series with Position

❖ Accessing Element Using Label (index)


❖Accessing Element from Series with Position

❖ In order to access the series element refers to the index number.

❖ Use the index operator [ ] to access an element in a series.

❖ The index must be an integer.

❖ In order to access multiple elements from a series, use Slice


operation.

❖ Slice operation is performed on Series with the use of the


colon(:).
Pandas
Pandas
Accessing Element Using Label
(index)
In order to access an element from series, have to set values by
index label.
Pandas
Accessing multiple elements
Indexing and Selecting Data in Series

❖ Indexing in pandas means simply selecting particular data from


a Series.

❖ Indexing can also be known as Subset Selection.

Indexing a Series using indexing operator [] :


❖ Indexing operator is used to refer to the square brackets
following an object.

❖ The .loc and .iloc indexers also use the indexing operator to
make selections.
[Link]

[Link] Name Roll no Class


1 A 11 cse
2 B 111 eee
3 C 23 ece
4 D 34 mech
5 E 55 AAA
6 F 77 FFF
7 G 88 ERD
8 H 9 GGG
9 I 92 YYY
10 J 44 UUU
Pandas
Pandas
Binary Operation on Series
We can perform binary operation on series like addition,
subtraction and many other operation. In order to perform binary
operation on series we have to use function like .add(),.sub()
Pandas
DataFrames

❖ Data sets in Pandas are usually multi-dimensional tables


❖ Series is like a column, a DataFrame is the whole table.
❖ Pandas DataFrame consists of three principal components, the
data, rows, and columns.
DataFrames
Pandas
Dealing with Rows and Columns

In Order to select a column in Pandas DataFrame, can either access


the columns by calling them by their columns name.
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
By default, data_frame.head() displays the first five rows
and data_frame.tail() displays last five rows.
Pandas
[Link]

[Link] Name Roll no Class

1 A 11 cse

2 B 111 eee

3 C 23 ece

4 D 34 mech

5 E 55 AAA

6 F 77

7 G 88 ERD

8 H 9 GGG

9 I 92 YYY

10 J 44 UUU
Pandas
Pandas
Numpy Vs Pandas

You might also like