0% found this document useful (0 votes)

95 views6 pages

Python Programming: Pandas Practical 7

This document provides an overview of using Pandas to work with dataframes. It discusses creating dataframes from dictionaries and series, loading data from CSV files, cleaning data by dropping null values and replacing them, and visualizing data using plots like scatter plots, histograms, and more. The learning objective is for students to learn how to create dataframes, analyze data, handle large datasets, and create data visualizations using Pandas. Key concepts covered include using loc and iloc to select rows or columns, dropping and filling null values, and using the plot() method to visualize data. The document provides examples of code to work with dataframes along with explanations of common data cleaning and plotting tasks.

Uploaded by

Aayush Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

95 views6 pages

Python Programming: Pandas Practical 7

Uploaded by

Aayush Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

SVKM’s NMIMS University

Mukesh Patel School of Technology Management & Engineering

Course: Python Programming
PROGRAMME: [Link]/MBATech.
First Year AY 2022-2023 Semester: II

PRACTICAL 7
Part A (To be referred by students)

Built-in Package: Pandas (dataframes)

SAVE THE FILE AND UPLOAD AS (RollNo_Name_Exp7)

Problem Statement: Write Python program to

1. Create data dictionary for below data, then convert it into data frame.
i) Print all the columns where the name of students begin with letter A and
percentage is higher than 85 using “ index” attribute .
ii) Print Age column “loc” function. Print average age.
iii)Print 0th and 2nd index column using “iloc” function.
iv)Update the Percentage column between 0 and 1.
Name Age Stream Percentage
Rima 21 Math 58
Alok 19 Commerce 92
Anandita 20 Arts 85
Priyanka 18 Biology 30

2. Based on the dataset ”[Link]” answer the following

What is the first country in df?
Which country has won the most gold medals in summer games?
Which country had the biggest difference between their summer and winter gold
medal counts?
Which country has the biggest difference between their summer and winter gold
medal counts relative to their total gold medal count? Only include countries that have
won at least 1 gold in both summer and winter.
Write a function to update the dataframe to include a new column called "Points"
which is a weighted value where each gold medal counts for 3 points, silver medals
for 2 points, and bronze medals for 1 point. The function should return only the
column (a Series object) which you created
3. Create charts for below dataset.

i) Scatter plot of two columns – name and num_children , num_children and

num_pets
ii) Bar plot of column values – name and age, between any other two columns
of your choice.

1|Page
SVKM’s NMIMS University
Mukesh Patel School of Technology Management & Engineering
Course: Python Programming
PROGRAMME: [Link]/MBATech.
First Year AY 2022-2023 Semester: II

iii) Line plot, multiple columns – name and num_children, num_pets

iv) Save plot to file -
v) Bar plot with group by – Number of unique names per state
vi) Stacked bar plot with group by – hint – Learn concept of dummy variables

vii) Stacked bar plot with group by, normalized to 100%

ix) Stacked bar plot, two-level group by - Stacked bar chart showing the
number of people per state, split into males and females

x) Stacked bar plot with two-level group by, normalized to 100% - Count
grouped by state and gender, with normalized columns so that each sums up to
100%

2|Page
SVKM’s NMIMS University
Mukesh Patel School of Technology Management & Engineering
Course: Python Programming
PROGRAMME: [Link]/MBATech.
First Year AY 2022-2023 Semester: II

Topic covered: Built-in Package: Pandas (data frames).

Learning Objective: Learner would be able to

1. Learn to create data frames using pandas package.
2. Learn data analysis using Pandas package.
3. Handle large amount of data to create data visualization.

Theory:
Pandas is a Python library used for working with data sets. It has functions for analyzing,
cleaning, exploring, and manipulating data. Pandas use the loc attribute to return one or
more specified row(s)

# Create dataset
import pandas as pd
mydataset = { 'cars': ["BMW", "Volvo", "Ford"], 'passings': [3, 7, 2] }
myvar = [Link](mydataset)
print(myvar)

Pandas series is one-dimensional array holding data of any type. It is like is like a column in
a table
# create series with index value
import pandas as pd
a = [1, 7, 2]
myvar = [Link](a, index = ["x", "y", "z"])
print(myvar)
print(myvar["y"]) # return value of y

# Create series from Dictionary

import pandas as pd
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = [Link](calories)
print(myvar)

O/P - day1 420

day2 380
day3 390

3|Page
SVKM’s NMIMS University
Mukesh Patel School of Technology Management & Engineering
Course: Python Programming
PROGRAMME: [Link]/MBATech.
First Year AY 2022-2023 Semester: II

dtype: int64
# Create dataframe from two series
import pandas as pd
data = { "calories": [420, 380, 390], "duration": [50, 40, 45] }
myvar = [Link](data)
print(myvar)
print([Link][0]) #refer to the 0 row index
myvar 1= [Link](data, index = ["day1", "day2", "day3"])
print(myvar1)
print([Link][0]) #refer to the 0 row index

If your data sets are stored in a file, Pandas can load them into a DataFrame.
import pandas as pd
df = pd.read_csv('[Link]') # Give path if not in same directory
print(df)
print([Link].max_rows) # Checks maximum number of rows:
print([Link](10)) # printing the first 10 rows of the DataFrame:
print([Link]()) # printing the last 5 rows of the DataFrame:
print([Link]()) # Print information about the data:

Data Cleaning

Data cleaning means fixing bad data in your data set. Bad data could be: Empty cells, Data in
wrong format, Wrong data, Duplicates
In this tutorial you will learn how to deal with all of them.

import pandas as pd
df = pd.read_csv('[Link]')
new_df = [Link]() # Returns a new DataFrame without changing original dataframe
[Link](inplace = True) # Changes original DataFrame
print(new_df)

[Link](130, inplace = True) # Replace NULL values with the number 130:
df["Calories"].fillna(130, inplace = True) # Replace column calories only
x = df["Calories"].mean()
df["Calories"].fillna(x, inplace = True) # Replace by mean value

Data Plotting

Pandas uses the plot() method to create diagrams. We can use Pyplot, a submodule of the
Matplotlib library to visualize the diagram on the screen.
import pandas as pd
import [Link] as plt
df = pd.read_csv('[Link]')
[Link]()
[Link](kind='scatter',x='Duration’,y='Calories') # Draws Scatter plot

4|Page
SVKM’s NMIMS University
Mukesh Patel School of Technology Management & Engineering
Course: Python Programming
PROGRAMME: [Link]/MBATech.
First Year AY 2022-2023 Semester: II

# To Do -> Plot between Duration and Maxpulse

[Link]()
Histogram - A histogram shows us the frequency of each interval, e.g. how many workouts
lasted between 50 and 60 minutes?

Use the kind argument to specify that you want a histogram: kind = 'hist'
df["Duration"].plot(kind = 'hist')

5|Page
SVKM’s NMIMS University
Mukesh Patel School of Technology Management & Engineering
Course: Python Programming
PROGRAMME: [Link]/MBATech.
First Year AY 2022-2023 Semester: II

PRACTICAL 7
Part B (to be completed by students)

Built-in Package: Pandas (dataframes)

(Students must submit the soft copy as per the following segments. A soft copy containing
Part A and Part B answered must be uploaded on the platform specified by the Practical
Teacher. The filename should be RollNo_Name_Exp7)

Roll No.: A113 Name: Aayush Singh

Prog/Yr/Sem: 2 Batch: 2
Date of Experiment: Date of Submission:

1. Program Code along with Sample Output: (Paste your programs [1,2,3,4,5], input and
output screen shot for programs [1,2,3,4,5])

2. Conclusion (Learning Outcomes): Reflect on the questions answered by you jot down
your learnings about the Topic: Data Types, Input / Output Statements and Operators.

6|Page

Common questions

The learning objectives include creating DataFrames, performing data analysis, and handling large datasets for visualization. These skills cultivate a student's ability to organize, manipulate, and derive insights from data, laying a foundation for advanced data science tasks and enhancing their capability to manage real-world data challenges .

Visualizations with Pandas and matplotlib help highlight trends and patterns within datasets. Scatter plots are useful for analyzing relationships between two variables, bar plots can compare different categories or frequencies, and line plots are valuable for observing trends over time. Stacked bar plots, both normalized and unnormalized, are useful for displaying compositions and distributions across multiple categories .

Advantages of using Pandas for large-scale data visualization include its comprehensive functionality for data manipulation and direct integration with plotting libraries, enabling efficient generation of detailed visual summaries. Challenges involve managing memory usage and computational efficiency, particularly when working with substantial datasets, which can lead to performance bottlenecks and the need for optimization techniques .

Pandas facilitates data visualizations through its integration with matplotlib, allowing users to generate plots directly from DataFrames using the `plot()` method. Visualizations are crucial for data analysis as they provide intuitive, visual summaries of data trends and distributions, making patterns easily recognizable and aiding in the communication of complex data insights .

Implementing 'Points' as a weighted value in Pandas involves creating a new column where each gold medal contributes 3 points, each silver medal 2 points, and each bronze medal 1 point. This is done by a function that calculates and returns this weighted score column. This approach assumes a linear value of importance among different medal types and that gold medals inherently carry more significance compared to silver and bronze .

The 'iloc' function is used for integer-location based indexing, allowing selection by row and column numerical indices. It is crucial for programmatic data manipulation when indices are not labeled explicitly. An example usage: `df.iloc[0, 2]` retrieves the value at the first row, third column of a DataFrame, aiding in precise data access without needing label information .

Pandas handles missing data by using methods such as `dropna()`, which removes any row with missing data from the DataFrame. Alternatively, `fillna()` can be used to fill in missing values with a specified value or a statistic such as the mean of the column. Strategies for data cleaning outlined include replacing null values, ensuring data is in the correct format, and removing duplicates .

Data cleaning is essential as it prepares and corrects data, eliminating errors and inconsistencies that can lead to inaccurate analyses. Pandas provides methods such as `dropna()` to remove missing data, `fillna()` to substitute missing values, and various functions to detect and correct data types, remove duplicates, and ensure uniform formatting, contributing to reliable analysis results .

The 'loc' function in Pandas is used for label-based indexing to access a group of rows and columns by labels or boolean arrays. It is particularly useful for selecting specific rows or columns by their explicit indices, for example, to retrieve or manipulate data in complex DataFrame operations .

A Pandas Series is created using `pd.Series()` and represents a one-dimensional labeled array capable of holding data of any type. It is like a single column from a DataFrame. In contrast, a DataFrame is a two-dimensional, size-mutable, heterogeneous tabular data structure with labeled axes (rows and columns), allowing for complex data analysis and manipulation .

Pandas Practical Programs Guide
No ratings yet
Pandas Practical Programs Guide
40 pages
Data Science Python Assignments Solutions
No ratings yet
Data Science Python Assignments Solutions
7 pages
Ad3411 Dsa Lab Manual (Print)
No ratings yet
Ad3411 Dsa Lab Manual (Print)
31 pages
Python EDA Workshop with Olympic Data
No ratings yet
Python EDA Workshop with Olympic Data
12 pages
Python Programs for Data Analysis in Pandas
No ratings yet
Python Programs for Data Analysis in Pandas
38 pages
Data Analysis and Visualization Course
No ratings yet
Data Analysis and Visualization Course
4 pages
Class 12 Python Pandas Practical Guide
100% (3)
Class 12 Python Pandas Practical Guide
18 pages
Z-Test Implementation with Pandas
No ratings yet
Z-Test Implementation with Pandas
39 pages
Laboratory Record for Data Science Experiments
No ratings yet
Laboratory Record for Data Science Experiments
31 pages
Python Practical Programs for XII
No ratings yet
Python Practical Programs for XII
42 pages
Ip Practical File
No ratings yet
Ip Practical File
39 pages
Data Science Lab Experiments Guide
No ratings yet
Data Science Lab Experiments Guide
34 pages
Pandas Practical Programs Guide
No ratings yet
Pandas Practical Programs Guide
37 pages
Python Pandas Practical Exercises
No ratings yet
Python Pandas Practical Exercises
15 pages
Lab Manual
No ratings yet
Lab Manual
48 pages
Lab Manual Python For Engineers BE04000131 Removed
No ratings yet
Lab Manual Python For Engineers BE04000131 Removed
12 pages
EDA Assignment No-1
No ratings yet
EDA Assignment No-1
8 pages
CBSE 12 Informatics Practices Record
No ratings yet
CBSE 12 Informatics Practices Record
42 pages
Olympic Games Analysis Project File
No ratings yet
Olympic Games Analysis Project File
25 pages
Class XI Informatics Practices Sample Paper
No ratings yet
Class XI Informatics Practices Sample Paper
3 pages
Hrithik Saini's Informatics Homework
No ratings yet
Hrithik Saini's Informatics Homework
25 pages
Pandas Programming Exercises Guide
No ratings yet
Pandas Programming Exercises Guide
18 pages
Samplepython Only
No ratings yet
Samplepython Only
48 pages
Data Science Lab Record 2025
No ratings yet
Data Science Lab Record 2025
36 pages
Data Analysis & Visualization with Python
No ratings yet
Data Analysis & Visualization with Python
27 pages
Python DataFrame Operations Guide
No ratings yet
Python DataFrame Operations Guide
66 pages
Practical Data Science Experiments
No ratings yet
Practical Data Science Experiments
13 pages
Python Practical Assignments for XII
No ratings yet
Python Practical Assignments for XII
6 pages
Pandas Programming Examples and Visualizations
No ratings yet
Pandas Programming Examples and Visualizations
22 pages
Class 12 IP Lab Manual Practices
No ratings yet
Class 12 IP Lab Manual Practices
21 pages
Data Analytics Lab Manual for B.Tech
No ratings yet
Data Analytics Lab Manual for B.Tech
32 pages
Machine Learning Practical File
No ratings yet
Machine Learning Practical File
31 pages
Class XII Informatics Practices Practical
No ratings yet
Class XII Informatics Practices Practical
41 pages
Python Data Analysis Assignment Guide
No ratings yet
Python Data Analysis Assignment Guide
3 pages
Data Science Practical Record BCA DS
No ratings yet
Data Science Practical Record BCA DS
25 pages
Computer Science Lab Practical Record
No ratings yet
Computer Science Lab Practical Record
28 pages
Python Programming Practical Record
No ratings yet
Python Programming Practical Record
25 pages
Data Visualization and Analysis with Python
No ratings yet
Data Visualization and Analysis with Python
6 pages
Data Visualization with Pyplot Guide
No ratings yet
Data Visualization with Pyplot Guide
22 pages
Informatics Practices Practical File
No ratings yet
Informatics Practices Practical File
6 pages
Working with Pandas and Matplotlib
No ratings yet
Working with Pandas and Matplotlib
11 pages
Python Functions and Control Flow Basics
No ratings yet
Python Functions and Control Flow Basics
58 pages
Data Science Lab Experiments Guide
No ratings yet
Data Science Lab Experiments Guide
59 pages
Data Analysis with Python and R
No ratings yet
Data Analysis with Python and R
28 pages
Python Data Frame Operations Guide
No ratings yet
Python Data Frame Operations Guide
13 pages
Advanced Python: Pandas & SciPy Guide
No ratings yet
Advanced Python: Pandas & SciPy Guide
13 pages
Python Machine Learning Lab Manual
No ratings yet
Python Machine Learning Lab Manual
40 pages
Python Series and DataFrame Exercises
No ratings yet
Python Series and DataFrame Exercises
29 pages
Python Series and DataFrame Exercises
No ratings yet
Python Series and DataFrame Exercises
34 pages
Machine Learning Practical Lab Record
No ratings yet
Machine Learning Practical Lab Record
20 pages
Python DataFrame Operations Guide
No ratings yet
Python DataFrame Operations Guide
28 pages
DSF Lab Manual
No ratings yet
DSF Lab Manual
59 pages
Python Series and DataFrame Exercises
No ratings yet
Python Series and DataFrame Exercises
29 pages
Python Programs for Data Analysis
No ratings yet
Python Programs for Data Analysis
38 pages
Python Series and DataFrame Operations
No ratings yet
Python Series and DataFrame Operations
29 pages
Class XII Informatics Practices Practical Guide
No ratings yet
Class XII Informatics Practices Practical Guide
15 pages
CLASS XII - IP List of Practicals With Coding
No ratings yet
CLASS XII - IP List of Practicals With Coding
15 pages
Data Science Lab Syllabus: Python Exercises
No ratings yet
Data Science Lab Syllabus: Python Exercises
49 pages
Data Analysis and Visualization Certificate
No ratings yet
Data Analysis and Visualization Certificate
25 pages
Python Programming: Numpy Practical Guide
No ratings yet
Python Programming: Numpy Practical Guide
8 pages
Cadbury Dairy Milk Cost Analysis
No ratings yet
Cadbury Dairy Milk Cost Analysis
5 pages
Academic Calendar-MBA Tech, B. Tech, BTI, MCA & M Tech-I Yr. - MPSTME Mumbai-2022-23
No ratings yet
Academic Calendar-MBA Tech, B. Tech, BTI, MCA & M Tech-I Yr. - MPSTME Mumbai-2022-23
2 pages
Laws of Blackbody Radiation Explained
100% (3)
Laws of Blackbody Radiation Explained
11 pages
Python Programming Examples and Exercises
No ratings yet
Python Programming Examples and Exercises
11 pages
Understanding Nature of Expenses in Accounting
No ratings yet
Understanding Nature of Expenses in Accounting
7 pages
AC Circuit Fundamentals and Analysis
No ratings yet
AC Circuit Fundamentals and Analysis
61 pages
Understanding DC Circuits Basics
No ratings yet
Understanding DC Circuits Basics
60 pages
CCN Cycle-1 Laboratory Report
No ratings yet
CCN Cycle-1 Laboratory Report
41 pages
Liduation of PVT LTD
No ratings yet
Liduation of PVT LTD
47 pages
B.Sc. Nursing Syllabus Kerala University
No ratings yet
B.Sc. Nursing Syllabus Kerala University
67 pages
Project Management Task Overview
No ratings yet
Project Management Task Overview
6 pages
Residential Lease Agreement Template
No ratings yet
Residential Lease Agreement Template
12 pages
Antistat Catalogue Edition 2 Final 1
No ratings yet
Antistat Catalogue Edition 2 Final 1
60 pages
Understanding Time Value of Money Concepts
No ratings yet
Understanding Time Value of Money Concepts
3 pages
Maths Project: Price Survey Analysis
No ratings yet
Maths Project: Price Survey Analysis
18 pages
Salesforce Admin Complete Theory (Repaired)
No ratings yet
Salesforce Admin Complete Theory (Repaired)
18 pages
Pakistan Foreign Ministers Overview
No ratings yet
Pakistan Foreign Ministers Overview
10 pages
Lahti Precision Belt Scales Overview
No ratings yet
Lahti Precision Belt Scales Overview
8 pages
2025 Liu If3.9 7月发表mimic3.1参考研究过程应激性高血糖比值与冠状动脉旁路移植术后患者房颤发生率的关系：基于mimic Iv数据库的回顾性研究
No ratings yet
2025 Liu If3.9 7月发表mimic3.1参考研究过程应激性高血糖比值与冠状动脉旁路移植术后患者房颤发生率的关系：基于mimic Iv数据库的回顾性研究
14 pages
Importance of MOS and SOS in Engineering
No ratings yet
Importance of MOS and SOS in Engineering
4 pages
Customs Broker Exam School Performance 2014
No ratings yet
Customs Broker Exam School Performance 2014
3 pages
Reliable Time-Series Anomaly Benchmark
No ratings yet
Reliable Time-Series Anomaly Benchmark
31 pages
Temperature's Impact on Reaction Rates
No ratings yet
Temperature's Impact on Reaction Rates
11 pages
CPI与GDP平减指数比较
No ratings yet
CPI与GDP平减指数比较
20 pages
Ebook Is Your Part Numbering Scheme Costing You Millions
No ratings yet
Ebook Is Your Part Numbering Scheme Costing You Millions
15 pages
Result - National Institute of Open Schooling
No ratings yet
Result - National Institute of Open Schooling
1 page
Pulse Generator Project Using 555 IC
No ratings yet
Pulse Generator Project Using 555 IC
10 pages
Method Statement for Highway Embankment Construction
No ratings yet
Method Statement for Highway Embankment Construction
22 pages
NACH Debit Mandate Cancellation Form
No ratings yet
NACH Debit Mandate Cancellation Form
1 page
Disbarment of Atty. Ariola for Fraud
No ratings yet
Disbarment of Atty. Ariola for Fraud
2 pages
RRB Technician I Answer Key English
No ratings yet
RRB Technician I Answer Key English
17 pages
Grade 9 Mathematics: Variation Concepts
No ratings yet
Grade 9 Mathematics: Variation Concepts
4 pages
Ejaz Ali Khan: Registered Nurse Profile
No ratings yet
Ejaz Ali Khan: Registered Nurse Profile
3 pages
Internship Report Portfolio
No ratings yet
Internship Report Portfolio
36 pages
Grade 5 Teacher's Program SY 2025-2026
No ratings yet
Grade 5 Teacher's Program SY 2025-2026
9 pages
HSK Admission Ticket Instructions
No ratings yet
HSK Admission Ticket Instructions
4 pages
Rethinking Development Strategies 2025
No ratings yet
Rethinking Development Strategies 2025
10 pages

Python Programming: Pandas Practical 7

Uploaded by

Python Programming: Pandas Practical 7

Uploaded by

SVKM’s NMIMS University

Mukesh Patel School of Technology Management & Engineering

Built-in Package: Pandas (dataframes)

Problem Statement: Write Python program to

2. Based on the dataset ”[Link]” answer the following

i) Scatter plot of two columns – name and num_children , num_children and

iii) Line plot, multiple columns – name and num_children, num_pets

vii) Stacked bar plot with group by, normalized to 100%

Topic covered: Built-in Package: Pandas (data frames).

Learning Objective: Learner would be able to

# Create series from Dictionary

O/P - day1 420

# To Do -> Plot between Duration and Maxpulse

Built-in Package: Pandas (dataframes)

Roll No.: A113 Name: Aayush Singh

Common questions

What are the learning objectives associated with studying the Pandas package as outlined in the document, and how do they contribute to a student's comprehension of data manipulation?

How can visualizations be used to analyze data trends in a dataset using Pandas and matplotlib, and what types of plots are useful for different kinds of data comparisons?

What are the potential advantages and challenges associated with using Pandas for large-scale data visualization, as implied by the document?

In what ways does Pandas facilitate the creation of data visualizations, and why are these visualizations important for data analysis?

Explain the concept of using 'Points' as a weighted value for Olympic medals and how this can be implemented in Pandas. What assumptions does this approach make?

Discuss the significance of the 'iloc' function in selecting data within a DataFrame. Provide an example of its usage.

How can Pandas be used to handle missing data in a dataset, and what are some strategies for data cleaning outlined in the document?

Why is data cleaning an essential step in data analysis, and what methods does Pandas provide to ensure data quality?

What role does the 'loc' function play in Pandas, and how is it applied in data manipulation tasks?

Describe the process of creating a Series in Pandas and explain how it differs from a DataFrame.

You might also like