Data Science Lab: Week4
BS202 / 2025 / Introduction to Data Science
Introduction to Data Science
Lecturer : Minsang Kim
(Chief TA; kimmsang96@[Link])
Notice
• Today's assignment submission is replaced by submitting the ipynb file.
• After you have completed your assignment, please submit the ipynb file
via LMS under "Week 4 Lab Session/Lab4 submission”
You may also modify the csv file using external tools instead of Python
BS202 / 2025 / Introduction to Data Science
•
code if you prefer
Notice
• For each problem, run the "For submission confirmation" part to verify
the results.
BS202 / 2025 / Introduction to Data Science
Lab4-1: Cleaning the data table (1 point)
• Problem
• In the original CSV file ('cinema_hall_ticket_sales.csv'), the 'Number_of_Person'
column contains the string value "Alone" in addition to numeric data.
• As a result, this column is interpreted as an object type, which can lead to issues
BS202 / 2025 / Introduction to Data Science
with numerical calculations and data analysis.
Lab4-1: Cleaning the data table (1 point)
• TODO
• Replace the "Alone" value in
the Number_of_Person column
with the number 1.
BS202 / 2025 / Introduction to Data Science
• Save the modified data as the
'[Link]' file to ensure the
correct data structure.
Check your results.
Lab4-1: Cleaning the data table (1 point)
• TODO
• Please provide a brief explanation of the lab process and submit it.
BS202 / 2025 / Introduction to Data Science
Lab4-2: Pivot table. (1 point)
• Problem
• Read the [Link] file that was created and then create [Link].
• The [Link] file is structured as follows:
• Genre: Movie genre
BS202 / 2025 / Introduction to Data Science
• Count of Tickets: Number of tickets sold for that movie genre
• Average Age: Average age of customers for that movie genre
• Average Number_of_People: Average number of people per ticket sale for that movie
genre
Lab4-2: Pivot table. (1 point)
• TODO
• Read the "[Link]" file.
• Group the data by Movie_Genre.
• For each group, calculate the
BS202 / 2025 / Introduction to Data Science
following:
• Count of Tickets(int)
• Average Age (float64)
• Average_Number_of_People(fl
oat64)
• Save the result as "[Link]".
After writing your code, execute this section.
If the correct values appear in place of "FILL IN HERE," then you pass.
Lab4-2: Pivot table. (1 point)
• TODO
• The order of the Genre rows can be
changed.
• However, the order of the columns
BS202 / 2025 / Introduction to Data Science
must be exactly:
• Genre
• Count of Tickets
• Average Age
• Average_Number_of_People.
Lab4-2: Pivot table. (1 point)
• TODO
• Please provide a brief explanation of the lab process and submit it.
BS202 / 2025 / Introduction to Data Science
Lab4-3: Pivot table - Genre split by seat types (1
point)
• Problem
• Read the [Link] file created in Lab4-1 and then create [Link].
• The [Link] file is structured as follows:
• Genre: Movie genre
BS202 / 2025 / Introduction to Data Science
• Standard, Premium, VIP: Seat types
Lab4-3: Pivot table - Genre split by seat types (1
point)
• TODO
• Read the "[Link]" file.
• Create a pivot table between movie_Genre and seat_Type.
• Sort the columns in the order: Standard, Premium, VIP.
BS202 / 2025 / Introduction to Data Science
• Save the completed pivot table as the "[Link]" file.
Aggregation of seat types by genre
Lab4-3: Pivot table - Genre split by seat types (1
point)
• TODO
• The order of the Genre rows can be changed.
• The column labels must be exactly: Movie_Genre, Standard, Premium, VIP.
• The Standard, Premium, and VIP columns should be of type integer.
BS202 / 2025 / Introduction to Data Science
Lab4-3: Pivot table - Genre split by seat types (1
point)
• TODO
• Please provide a brief explanation of the lab process and submit it.
BS202 / 2025 / Introduction to Data Science
Lab4-4: Pivot table - Genre split by seat types (1
point)
• Problem
• Open the "[Link]" file, sort the movie genres in descending order based on the
"Count of Tickets", and display them as a bar graph.
BS202 / 2025 / Introduction to Data Science
Lab4-4: Pivot table - Genre split by seat types (1
point)
• TODO
1. The graph must be sorted in descending 2
order.
2. The graph must include all the essential
BS202 / 2025 / Introduction to Data Science
elements (title, axis titles, and units).
3. The most popular movie_genre should 1
be highlighted with an accent color
4. You are free to choose the colors.
3
Lab4-4: Pivot table - Genre split by seat types (1
point)
• TODO
• Save the graph as "[Link]" and then run the code below.
BS202 / 2025 / Introduction to Data Science