0% found this document useful (0 votes)
7 views13 pages

Python Pandas for Data Analysis Guide

panda ki chut

Uploaded by

dhruvarora050209
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views13 pages

Python Pandas for Data Analysis Guide

panda ki chut

Uploaded by

dhruvarora050209
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Python Pandas

Pandas or Python Pandas is Python’s library for data analysis. Pandas has
derived its name from “Panel Data System”, which is a term for
multidimensional, structured data set. Pandas has become a popular choice
for data analysis.
Data Analysis: It refers to process of evaluating big data sets using analytical
and statistical tools so as to discover useful information and conclusion to
support business decision making.
Panda makes available various tools for data analysis and makes it a simple
and easy process as compared to other available tools. The main author of
Pandas is Wes McKinney.
Using Pandas:
Panda is an open source, BSD library (Berkeley Software Distribution) built for
Python Programming language. Panda offers high-performance, easy to use
data structures and data analysis tools.
In order to work with pandas in Python, you need to import panda’s library in
your python environment.
Import pandas as pd
Why Pandas?
Pandas is very popular and it is capable of performing following tasks:
 It can read and write in many different data formats (integer, float,
double etc.)
 It can calculate in all the possible ways data is organized (i.e. across rows
and down columns).
 It can easily select subset of data from bulky data sets and combine
multiple datasets together.
 It can find and fill missing data.
 It allows you to apply operations to independent groups within data.
 It supports reshaping of data into different forms.
 It supports advance time-series functionality. (Time Series Forecasting
is the use of a model to predict future values based on
previously observed Values)
 It supports visualization by integrating matplotlib and seaborn etc.
Libraries)
Pandas Data Structure
Data Structure: A Data Structure is a collection of data values and
operations that can be applied to the data. It also refers to specialized way of
storing data so as to apply a specific type of functionality on them.
Basic Data Structure: Python Pandas has two basic Data Structures Series
and DataFrame.
Index Data A B C
0 A 0 Amit 34 India
1 B 1 Sumit 49 India
2 Mark 33 USA
2 C 3 Sam 60 NZ
3 D 4 Paul 23 China
Series Dataframe Object
Series is a one-dimensional data structure which can have values of any data
type (int, float, list, string). Series is called homogenous because it is
considered as object type. The values in the Series can be changed/ modified
so it is called mutable but the size of a series object cannot be changed.
Dataframe is a two-dimensional data structure that can have heterogeneous
(Different Types) data elements. The values of Dataframe are mutable. The
size of dataframe is also mutable. We can add or drop elements from a
dataframe object.

Installing Pandas
To install Pandas from command line we need to type in:

pip install pandas

Important: Pandas can be install when Python is already installed on


that system.

Series Data Structure


Series is an important data structure of pandas. It represents a One-
dimensional array of indexed data. A series type object has two main
components:
1) An array of actual data
2) An associated array of indexes or data label. The index is used to access
individual data values.
Index Data Index Data Index Data
0 A Amit 33 Monday Meeting
1 B Arpi 32 Tuesday Shopping
2 C t Wednesd Sports
3 D Mar 44 ay Activity
k Thursday Study
Woo 55
Examples: of Series type object
Creating Series Objects:
For creating a series type object, we have to import pandas and numpy
module (numerical python) with the help of import statement.
Import pandas as pd Here, pd is an alias name for pandas
Import numpy as np Here, np is an alias name for numpy
i) Creating empty series:
To create an empty series, we can use the pandas library Series().
Syntax:
<series object>= [Link]()
Eg.
import pandas as pd
myseries = [Link]()
print (myseries)
Output
Series([], dtype: float64) (the default dtype of empty series is float64 but
in future versions it will be object type)
ii) Creating non-empty series object:
To create non-empty Series Objects, you need to specify arguments for data
and indexes as per following syntax:
<series object>= [Link](data, index=idx)
 Where, * data is a data part of the series object * idx can be any
numpy datatype
o i.e. Python sequence (List, Tuple, String) & range()
o An ndarray (Numpy Array)
o A python Dictionary
o A scalar value
o Creating Non-Empty Series using range()
Syntax: <Series Name>= [Link] (range())
It will return an object of series type.
Eg.1
import pandas
myseries =
[Link](range(5))
print (myseries)
Output:
0 0
1 1
2 2
3 3
4 4
dtype: int64

o Creating Non-Empty Series using Python Sequence (List)


Syntax: <Series Name>= [Link] (List)
Eg.
a) Series Object:
0 10
import pandas as pd 1 20
s=[Link]([10,20,30,40,50]) 2 30
print("Series Object:") 3 40
print(s) 4 50
dtype: int64
b) Series Object:
0 a
import pandas as pd 1 b
s=[Link](['a','b','c','d','e']) 2 c
print("Series Object:") 3 d
print(s) 4 e
dtype: object
c) Series Object:
0 rat
import pandas as pd 1 bat
s=[Link](['rat','bat','cat']) 2 cat
print("Series Object:") dtype: object
print(s)

DataFrame Data Structure

A DataFrame a is two-dimensional labelled data structure like a table in excel or


spreadsheet. It contains rows and columns just like a two-dimensional array. Both
rows and columns have index numbers. Data frame has following characteristics:
1) It has two indexes – a row index (axis=0) and a column index (axis=1).
2) Each value of a dataframe can be accessed with the combination of row
index and column index. The row index is known as index and column-index
is known as column-name.
3) The indexes can be number or letters or strings.
4) There is no condition of having all data of same type across columns, its
columns can have data of different types.
5) We can easily change it values. It is value-mutables.
6) You can add or delete row/columns in a DataFrame. In other words, it is size-
mutable.

Creating and Displaying a DataFrame

A Dataframe object can be created by passing data in two-dimensional format.


Like earlier, before you do anything with pandas module, make sure to import
Pandas and Numpy modules.

Import pandas as pd
import numpy as np

To create a dataframe we can use the following syntax:

Dataframe=[Link](< a 2D datastructure>, [columns=<column


sequence>], [index=<index sequence>])

1) Creating empty data frame

Syntax: dataframe=[Link]()
Example: Empty
import pandas as pd
DataFrame
import numpy as np
Columns: []
df=[Link]()
Index: []
print(df)

2) Creating a dataframe object using a 2D dictionary

A two dimensional dictionary is a dictionary having items as (key:value) where


value part is a data structure of any type: another dictionary, an Ndarray, a series
object, a list etc. But here the value parts of all the keys should have similar
structure and equal lengths.

a) Creating a dataframe from a 2D dictionary having values as


lists/ndarrays:

We can create a dataframe from dictionary where each value of dictionary


consists of either list or ndarray.

Example:
import pandas as pd
import numpy as np Name Marks Sport
dict1={'Name':['Amit', 'Sumit', 'Arpit'], 0 Amit 79 Cricket
1 Sumit 65
'Marks':[79, 65,89],
Badminton
'sport':['Cricket', 'Badminton', 'Tennis'] 2 Arpit 89 Tennis
}
df=[Link](dict1)
print(df)

Attributes of DataFrame

Data Frame: df

RollNo Name Marks


A 110 Sandeep 97.5
B 111 Mukul 98.5
C 112 Rajkumar 99.5
D 113 Vipul 96.5

Index : it tells about index (Row labels) of the data frame.

print([Link])
Index(['A', 'B', 'C', 'D'], dtype='object')

Columns : it tells about column labels of the data frame.

print([Link])
Index(['RollNo', 'Name', 'Marks'], dtype='object')

dtypes : returns the data types of data in the data frame.

print([Link])
RollNo int64
Name object
Marks float64
dtype: object

shape : it returns a tuple representing the dimension of the data frame.

print([Link])
(4, 3)

size : it returns the number of elements present in the data frame


object.

print([Link])
12
values : It returns Numpy representation of the data frame.

print([Link])
[[110 'Sandeep' 97.5]
[111 'Mukul' 98.5]
[112 'Rajkumar' 99.5]
[113 'Vipul' 96.5]]

Selecting or Accessing Data

We can extract desired columns or rows from a dataframe.

Data Frame: df

RollNo Name
Marks
A 110 Sandeep
97.5
B 111 Mukul
98.5
1. Selecting / Accessing a Column
we can select / access a column as follows:

Syntax:
DataFrame[<Column Name>]
Or
DataFrame.<Column Name>

Example:
print(df['RollNo'])
or
print([Link])

2. Selecting Multiple columns

To select multiple columns we can specify the list of columns in square


bracket.
print(df[['RollNo','Name']])

RollNo Name
A 110 Sandeep
B 111 Mukul
C 112 Rajkumar
D 113 Vipul

3. Selecting / Accessing a subset from a Dataframe using Row/ Column


Names:
Syntax:

[Link][<StartRow>:<EndRow>,<StartColumn>: <EndColumn>)

Example:
import pandas as pd
st1={'RollNo':110,'Name':'Sandeep','Marks':97.5}
st2={'RollNo':111,'Name':'Mukul','Marks':98.5}
st3={'RollNo':112,'Name':'Rajkumar','Marks':99.5}
st4={'RollNo':113,'Name':'Vipul','Marks':96.5}
students=[st1,st2,st3,st4]
df=[Link](students, index=["A","B","C","D"])
print([Link]['A':'C','RollNo':'Name'])

print([Link]['A':'C':2,'RollNo':'Name'])

To extract specific rows we can use iloc (integer location) function. In this
function we use numeric index /position of rows and columns as follows:

print([Link][0:4:2, 0:2])

To Access Individual Values

We can extract individual value of a dataframe as follows:

1) <DataFrame>.<ColumnName>[Row Index / Row numeric Index]

print([Link]['A']) Output: Sandeep

2) Using ‘at’ attributes with DF

<DataFrame>.at[<row label>, <col label>]

Example:

print([Link]['B','Name']) Output: Mukul

3) Using ‘iat’ attributes with DF

<DataFrame>.iat [<row index No. >, <col Index No.>]

Example: print([Link][0,2]) Output: 97.5

Loading Data from CSV to DataFrames

Pythons pandas library offers two functions read_csv() and to_csv(), to


read the data from CSV files and to write the data in CSV file.

Reading from a CSV File


To read the data from CSV file we can use read_csv() function as per the
following syntax:

<df>=pandas.read_csv(<FilePath>)

CSV File= [Link]

Example:
import pandas as pd
df=pd.read_csv("d:\[Link]")
print([Link][0:1,'RollNo.':'Name'])

Here, we can see the first row of the csv file will be considered as column
name of dataframe df.

To Skip Rows while reading CSV File

<df>=pandas.read_csv(<Path>,names=[<Column Names>],skiprows=[<n>])

Examples

import pandas as pd
df=pd.read_csv("d:\[Link]",names=["RNo","Name","Marks"], skiprows=[0])
print(df)

Specifying Own Columns Names:

Some times CSV file doesn’t have column header, in that case the first row of
CSV file will be considered as column name in the dataframe.

So, to avoid such situation we can specify our own column names while reading
the data from CSV file.

Example:

CSV File= [Link]


(Without Column Header)

import pandas as pd
df=pd.read_csv("d:\[Link]",names=["RNo","Name","Marks"])
print(df)

If you do not want to use column heading, then you can use the following
statement.

import pandas as pd
df=pd.read_csv("d:\[Link]",header=None)
print(df)

Storing DataFrame Data to CSV Files

We can use “to_csv()” function to create CSV file from data frame. The syntax is
as follows:

Syntax:
<dataframe>.to_csv(<file path>)
Or
<dataframe>.to_csv(<file path>,sep=<Separator_Character>)

Example:

import pandas as pd
st1={'RollNo':110,'Name':'Sandeep','Marks':97.5}
st2={'RollNo':111,'Name':'Mukul','Marks':98.5}
st3={'RollNo':112,'Name':'Rajkumar','Marks':99.5}
st4={'RollNo':113,'Name':'Vipul','Marks':96.5}
students=[st1,st2,st3,st4]
df=[Link](students, index=["A","B","C","D"])
print(df)
df.to_csv("d:\[Link]",sep=',')

Adding / Modifying Rows’/ Columns values in DataFrame

We can assign or modify the data in dataframe by specifying the row name or
column name along with the dataframe’s name.

Adding/Modifying a column

 We can modify a column if it is already existing


 We can add a column if it is not existing.

Syntax:

<df>.<column>=<new value>
Or
<df>[<column>]=<new value>

Modifying Value in a Column


import pandas as pd
df=pd.read_csv("d:\[Link]",names=["RNo","Name","Marks"])
print("Before Adding:\n ",df)
[Link]="100"
print("After Adding:\n ",df)
Modifying specific value in a Column
import pandas as pd
df=pd.read_csv("d:\[Link]",names=["RNo","Name","Marks"])
print("Before Adding:\n ",df)
[Link][0,'Name']="Ashu"
print("After Adding:\n ",df)

import pandas as pd
df=pd.read_csv("d:\[Link]",names=["RNo","Name","Marks"])
print("Before Adding:\n ",df)
[Link][0:1,'Name']="Ashu"
print("After Adding:\n ",df)

import pandas as pd
df=pd.read_csv("d:\[Link]",names=["RNo","Name","Marks"])
print("Before Adding:\n ",df)
[Link][0,'Name']="Ashu"
print("After Adding:\n ",df)

Adding a Column
import pandas as pd
df=pd.read_csv("d:\[Link]",names=["RNo","Name","Marks"])
print("Before Adding:\n ",df)
df['Phone']="100"
print("After Adding:\n ",df)

Adding a Column with different values


import pandas as pd
df=pd.read_csv("d:\[Link]",names=["RNo","Name","Marks"])
print("Before Adding:\n ",df)
df['Phone']=[100,200]
print("After Adding:\n ",df)
Adding / Modifying a Row

We can change or add rows to a dataframe using at or loc attributes as follows:

<df>.at[<row name>, : ]=<New value>


Or
<df>.loc[<rowname> , : ]=<new value>

Important: if there is no row with such row label, then python adds a new
row with this row label and assign the given values to all its columns:

Example:
import pandas as pd
dict1={'Name':['Amit', 'Sumit', 'Arpit'],
'Marks':[79, 65,89],
'sport':['Cricket', 'Badminton', 'Tennis']}
df=[Link](dict1, index=['I','II', 'III'])
print(df)
Adding Rows: if row index is not available then it will add a new row.

(i)
import pandas as pd
dict1={'Name':['Amit', 'Sumit', 'Arpit'],
'Marks':[79, 65,89],
'sport':['Cricket', 'Badminton', 'Tennis']}
df=[Link](dict1, index=['I','II', 'III'])
print("Before:\n",df)
[Link]['IV']= "ABC"
print("After:\n",df)
(ii)
import pandas as pd
dict1={'Name':['Amit', 'Sumit', 'Arpit'],
'Marks':[79, 65,89],
'sport':['Cricket', 'Badminton', 'Tennis']}
df=[Link](dict1, index=['I','II', 'III'])
print("Before:\n",df)
[Link]['IV']= ["Kumar",99,"Ludo"]
print("After:\n",df)

If the sequence containing values ["Kumar",99,"Ludo"], is different than it will


raise a ValueError.

Modifying Existing Row:


(i)
import pandas as pd
dict1={'Name':['Amit', 'Sumit', 'Arpit'], 'Marks':[79, 65,89],
'sport':['Cricket', 'Badminton', 'Tennis']}
df=[Link](dict1, index=['I','II', 'III'])

print("Before:\n",df)
[Link]['III':]="Ludo"
print("After:\n",df)

Modifying Single Cell:


We can use the following syntax to modify a particular cell in dataframe.
<DataFrame>.ColumnName[<rowname/Label>]= <new value>
import pandas as pd
dict1={'Name':['Amit', 'Sumit', 'Arpit'],
'Marks':[79, 65,89],
'sport':['Cricket', 'Badminton', 'Tennis']}
df=[Link](dict1, index=['I','II', 'III'])
print("Before:\n",df)
[Link]['III']="Sandeep"
print("After:\n",df)
Deleting / Renaming Columns/ Rows

Python Pandas provides two ways to delete rows and columns – del statement
and drop ( ) function. Pandas also provides rename( ) function to rename rows
and columns.

Deleting Column

To delete a column you use del statement as follows:


Syntax:
del <df Object> [<ColumnName>]
Example:

import pandas as pd
dict1={'Name':['Amit', 'Sumit', 'Arpit'],
'Marks':[79, 65,89],
'sport':['Cricket', 'Badminton', 'Tennis']}
df=[Link](dict1, index=['I','II', 'III'])
print("Before:\n",df)
del df['Marks']
print("After:\n",df)

Deleting Multiple columns

import pandas as pd
dict1={'Name':['Amit', 'Sumit', 'Arpit'],
'Marks':[79, 65,89],
'sport':['Cricket', 'Badminton', 'Tennis']}
df=[Link](dict1, index=['I','II', 'III'])
print("Before:\n",df)
df=[Link](['Name','Marks'],axis=1)
print("After:\n",df)

You might also like