APPLIED DATA SCIENCE
DS100-3 B8
Members
Alcantara, Klaire Ann D. Aspa,Llewellyn Ann N.
Arenas, Alyssa Sarah E. Cruz, Mary Clare Gianne P.
FINAL PROJECT
DISCOVERY
In today’s world, transportation takes too much of our time. We seldom have time to eat breakfast before
going to our school or work. One could also observe that there is a congestion of passengers in the different modes
of transformation. One of such are the rail transit systems, most specifically for the sample program below, the
Manila Metro Rail Transit System (MRT). Descriptive analysis of the current data gathered regarding the passenger
traffic in MRT stations can start up further efficiency and maximize the utilization of the trains to avoid further
passenger traffic. Predictive analysis can help the management to identify the busiest time of the day and give
more attention, increase the number of trains and the likes to avoid further delay. This project thus aims to look on
to the codes and help the management observe their facilities more and cater the need of its customers. Through
the program, they can monitor easily on which time and day ,through graphs and the likes , should they send trains
more frequently and on which time slot they can minimize the passage of trains, of course this should only be
considered when the addition of trains to the busy time slot affects that of the time with few passengers.
First and foremost, the ones who would benefit from this are the passengers. With improved schedule of
trains, congestion of passengers can be minimized thus, save up their time and later on, increase their efficiency as
workers or students. By being able to accommodate all of the passengers, the frequent customers will increase and
thus more income will be observed. The gathered money can then be utilized to improve the facilities of the rai
transit system or increase pay for the employees. Aside from this benefit, the management would also experience a
lot more easier way of monitoring and hypothesizing the next steps to be done to improve their services. The
gathered information is important because it is the one that the current rail transit systems lack which causes the
usual problems like congestion of passengers and overuse of trains which then causes them to overheat or
completely malfunction.
The data could of course be gathered through the utilization of the access cards or beep cards used by
passengers to use the train. This will get on to the system of the MRT (and the likes) and automatically record the
time and date of that passenger’s utility of the train. This method can be called as the frequency count, used in
quantitative [Link] recording all of the data, the number of passengers can be counted per time , date and
station, thus be the data used to monitor and run the program. The structure of the data include the Primitive Data
Structures, specifically the integer form for the number of passengers and the likes and the string data type for the
months. Collectively, we could also observe a homogeneous linear type, an array of the data starting with the first
month, day, time, and stations and so on.
DATA PREPARATION
Raw Data
The raw data contains the hourly number of passengers that rides and exits each MRT station from
January 1, 2019 until August 31, 2019.
2019_mrt_hourly_daily_ridership.csv
Page 1 of 7
APPLIED DATA SCIENCE
Due to large population of the data, sample data was chosen from the population. The sample contains the
data from the Taft Station on the 21st of January.
2019_mrt_hourly_sample_ridership.csv
Importing Data Method/s
Cleaning Data Method/s
Exploratory Data Analysis
Visual Exploratory Data Analysis
MODEL BUILDING AND VALIDATION
Page 2 of 7
APPLIED DATA SCIENCE
Importing Data Method/s
Cleaning Data Method/s
Page 3 of 7
APPLIED DATA SCIENCE
Exploratory Data Analysis
Page 4 of 7
APPLIED DATA SCIENCE
Visual Exploratory Data Analysis
Page 5 of 7
APPLIED DATA SCIENCE
RESULTS AND KEY FINDINGS
● The busiest time when people enter the station in the morning is 7:00 am (represented by 7 in the x-axis).
● The busiest time of people leaving the station in the evening is 6:00 pm (represented by 18 in the x-axis).
● This data would help in minimizing the congestion of passengers in the MRT and have an efficient time
arrivals and departure of the trains especially during the time that most passengers use the trains.
● This may affect the trains that will be provided during the time that lesser people use the transportation that
is why observation with this procedure is needed.
● The group would need more data like the total number of trains that the MRT has and the tally of the
functioning and nonfunctioning trains in the future for the innovation of their studies.
SUMMARY
Page 6 of 7
APPLIED DATA SCIENCE
Component Result/s
Discovery Passengers Problem Framed Proper dissemination of train depending on the time
where most of the people use the MRT, co-operations
coming from the passengers and the staff of MRT are
needed.
Initial Hypothesis Proving more trains on the day and time where most of
the people use the train stations.
Data Summary of the number of passengers using the MRT
from the different stations during the hourly time of
operations from January 1 to August 31, 2019
Results and Key Findings 1. Identifying the most number of
passengers during a specific time and
day would give the authority of the
MRT know when to provide more
trains for the passengers
2. This may also affect the trains that will
be provided during the time that lesser
people use the transportation
3. The busiest time when people enter
the station in the morning is 7:00 am
(represented by 7 in the x-axis).
4. The busiest time of people leaving the
station in the evening is 6:00 pm
(represented by 18 in the x-axis).
Page 7 of 7