0% found this document useful (0 votes)
26 views3 pages

Enhancing Indian Railways e-Ticketing with Big Data

1. The document discusses using big data analytics to help improve the e-ticketing system for Indian Railways by analyzing past booking data to better predict customer preferences and handle high traffic volumes. 2. Key challenges include the high velocity, volume, and variety of booking data, as well as issues obtaining data since it is not publicly available. 3. A distributed in-memory database like Hadoop could be used to implement a scalable solution to address these problems.

Uploaded by

Sakshi Nijhawan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views3 pages

Enhancing Indian Railways e-Ticketing with Big Data

1. The document discusses using big data analytics to help improve the e-ticketing system for Indian Railways by analyzing past booking data to better predict customer preferences and handle high traffic volumes. 2. Key challenges include the high velocity, volume, and variety of booking data, as well as issues obtaining data since it is not publicly available. 3. A distributed in-memory database like Hadoop could be used to implement a scalable solution to address these problems.

Uploaded by

Sakshi Nijhawan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

BA501: BIG DATA ANALYTICS

ASSIGNMENT - 1

SUBMITTED TO: SUBMITTED BY:


Mr. Mohit Bhatnagar Sakshi Nijhawan
MB18GID247
Business Problem:

Helping Indian Railways to get a more effective e-Ticketing System with the help of Big Data.

Motivation:
I belong to a very small town in Uttarakhand and the two ways to reach there is either by road or by train. To
avoid the traffic, I always prefer taking a train. For the same, when I try to book my ticket through the IRCTC
website, it becomes a pain. Although it has become better over the years, yet far from streamlined (taking into
the account that it is one of the most frequently visited websites in the country). And the more complicated task
is to book a Tatkal ticket, the site would just not open. The server errors and the slow speed tends to become
very annoying. And hence the motivation to do something for Indian Railways using Big Data!

Industry Sector: Travel and Transport

How can Big Data help?

Why do we want to solve a problem? To make it more effective and easy for the customers and consumers! With
Big Data and its tools, it is possible to handle more people logging on the website simultaneously. Ease of
booking tickets is the major concern. Solution could be to analyse a person’s booking history for frequently
visiting places and providing the details for the same in a very few seconds. Using the past data, it’s not very
difficult to predict a person’s preference these days. Having that could be very handy while designing such
systems.

Similarly, during the Tatkal booking – many people are logging in on the website at the same time which leads
to the crash of the system or the failure of payment for the bookings. The in-memory capacity can be improved
(by increasing the memory) using Big Data technology today.

Looks like Indian Railways deserves a Big Data Junction itself!

What is the business value of such a solution (quantify in monetary terms)?

A Big Data solution takes into account the following processes:

1. Design and Infrastructure


2. Once it is designed, it becomes important to check the hardware and network configuration
3. Data integration, system development and training of the big data experts to be able to implement the
solution effectively.
4. Test of the integrated system and deployment.
5. Improving the infrastructure

Apart from this, infrastructure and hardware needed to implement big data is also required.

And as we know data is increasing every second, every day. And as the data evolves, so does the cost. As of
now, I am not aware of the monetary requirements for such a solution and hence won’t be able to comment on it
either.

What kind of data source(s) will be used?

Indian Railways hasn’t brought any of their data in the public data, though Open Government Data(OGD)
Platform India is the only source of data from Indian Government as of now.

Distributed in-memory database will be required.


Any issues in obtaining data and how do you plan to address the same?

Since Indian Railways don’t bring their data in public domain, there will be issues in obtaining data. As I
mentioned in the previous answer, we can use the OGD Platform India to get the available data. Although there
isn’t much data available regarding the IRCTC website.

To design a proper effective solution for Indian Railways, we will have to go by their way. Every 5 years,
Centre for Railway Information Systems (CRIS) issues a tender in the market to provide them with the solution
to their problems. You, as a part of an organization can fill the tender and apply for the same. If they are
satisfied with your solution, you get the project and the data!

Comment on velocity, veracity, volume and variety of the data source?

Velocity: The data is being generated and collected every second through the IRCTC website. Every second of
every data is increasing and hence there is high velocity of IRCTC data.

Veracity: Accuracy of data is very important otherwise it would be difficult to predict the person’s preferences
and hence we won’t be able to provide the details required to the customers in a few seconds.

Volume: Millions of people or even more than that use Indian Railways very often to travel as it is
comparatively cheaper and faster. Hence the data that is being generated every second is huge and it’s
impossible to store and analyse such huge amount of data using traditional database technologies. Hence Big
Data becomes very important!

Variety: Since there is a set way of putting your data on website, we are most likely to have structured data and
hence low variety.

Comment on the adequacy of HADOOP ecosystem to implement a solution for the need identified?

A distributed system will provide scalability and high availability. All operational data can be served through
multiple nodes of the distributed in-memory operational database rather than querying the back-office system
without impacting the performance.

Include references (online links) to any similar implementations / previous attempts to address the need?

CRIS based their IRCTC application on something known as Pivotal GemFire, a distributed in-memory
database which is part of Pivotal Big Data Suite.

The link to the case study is: [Link]

You might also like