0% found this document useful (0 votes)
2 views17 pages

Week 4

The document outlines the Week 4 lab assignments for an Introduction to Data Science course, focusing on data cleaning and pivot table creation using Python. Students are required to modify a CSV file, create pivot tables based on movie genres and ticket sales, and visualize the results in a bar graph. Each lab task includes specific instructions for data manipulation and submission requirements.

Uploaded by

laplacedirac1209
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views17 pages

Week 4

The document outlines the Week 4 lab assignments for an Introduction to Data Science course, focusing on data cleaning and pivot table creation using Python. Students are required to modify a CSV file, create pivot tables based on movie genres and ticket sales, and visualize the results in a bar graph. Each lab task includes specific instructions for data manipulation and submission requirements.

Uploaded by

laplacedirac1209
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data Science Lab: Week4

BS202 / 2025 / Introduction to Data Science

Introduction to Data Science

Lecturer : Minsang Kim


(Chief TA; kimmsang96@[Link])
Notice
• Today's assignment submission is replaced by submitting the ipynb file.
• After you have completed your assignment, please submit the ipynb file
via LMS under "Week 4 Lab Session/Lab4 submission”
You may also modify the csv file using external tools instead of Python
BS202 / 2025 / Introduction to Data Science


code if you prefer
Notice
• For each problem, run the "For submission confirmation" part to verify
the results.
BS202 / 2025 / Introduction to Data Science
Lab4-1: Cleaning the data table (1 point)
• Problem
• In the original CSV file ('cinema_hall_ticket_sales.csv'), the 'Number_of_Person'
column contains the string value "Alone" in addition to numeric data.
• As a result, this column is interpreted as an object type, which can lead to issues
BS202 / 2025 / Introduction to Data Science

with numerical calculations and data analysis.


Lab4-1: Cleaning the data table (1 point)
• TODO
• Replace the "Alone" value in
the Number_of_Person column
with the number 1.
BS202 / 2025 / Introduction to Data Science

• Save the modified data as the


'[Link]' file to ensure the
correct data structure.

Check your results.


Lab4-1: Cleaning the data table (1 point)
• TODO
• Please provide a brief explanation of the lab process and submit it.
BS202 / 2025 / Introduction to Data Science
Lab4-2: Pivot table. (1 point)
• Problem
• Read the [Link] file that was created and then create [Link].
• The [Link] file is structured as follows:
• Genre: Movie genre
BS202 / 2025 / Introduction to Data Science

• Count of Tickets: Number of tickets sold for that movie genre


• Average Age: Average age of customers for that movie genre
• Average Number_of_People: Average number of people per ticket sale for that movie
genre
Lab4-2: Pivot table. (1 point)
• TODO
• Read the "[Link]" file.
• Group the data by Movie_Genre.
• For each group, calculate the
BS202 / 2025 / Introduction to Data Science

following:
• Count of Tickets(int)
• Average Age (float64)
• Average_Number_of_People(fl
oat64)
• Save the result as "[Link]".

After writing your code, execute this section.


If the correct values appear in place of "FILL IN HERE," then you pass.
Lab4-2: Pivot table. (1 point)
• TODO
• The order of the Genre rows can be
changed.
• However, the order of the columns
BS202 / 2025 / Introduction to Data Science

must be exactly:
• Genre
• Count of Tickets
• Average Age
• Average_Number_of_People.
Lab4-2: Pivot table. (1 point)
• TODO
• Please provide a brief explanation of the lab process and submit it.
BS202 / 2025 / Introduction to Data Science
Lab4-3: Pivot table - Genre split by seat types (1
point)
• Problem
• Read the [Link] file created in Lab4-1 and then create [Link].
• The [Link] file is structured as follows:
• Genre: Movie genre
BS202 / 2025 / Introduction to Data Science

• Standard, Premium, VIP: Seat types


Lab4-3: Pivot table - Genre split by seat types (1
point)
• TODO
• Read the "[Link]" file.
• Create a pivot table between movie_Genre and seat_Type.
• Sort the columns in the order: Standard, Premium, VIP.
BS202 / 2025 / Introduction to Data Science

• Save the completed pivot table as the "[Link]" file.

Aggregation of seat types by genre


Lab4-3: Pivot table - Genre split by seat types (1
point)
• TODO
• The order of the Genre rows can be changed.
• The column labels must be exactly: Movie_Genre, Standard, Premium, VIP.
• The Standard, Premium, and VIP columns should be of type integer.
BS202 / 2025 / Introduction to Data Science
Lab4-3: Pivot table - Genre split by seat types (1
point)
• TODO
• Please provide a brief explanation of the lab process and submit it.
BS202 / 2025 / Introduction to Data Science
Lab4-4: Pivot table - Genre split by seat types (1
point)
• Problem
• Open the "[Link]" file, sort the movie genres in descending order based on the
"Count of Tickets", and display them as a bar graph.
BS202 / 2025 / Introduction to Data Science
Lab4-4: Pivot table - Genre split by seat types (1
point)
• TODO
1. The graph must be sorted in descending 2
order.
2. The graph must include all the essential
BS202 / 2025 / Introduction to Data Science

elements (title, axis titles, and units).


3. The most popular movie_genre should 1
be highlighted with an accent color
4. You are free to choose the colors.

3
Lab4-4: Pivot table - Genre split by seat types (1
point)
• TODO
• Save the graph as "[Link]" and then run the code below.
BS202 / 2025 / Introduction to Data Science

You might also like