FRANCIS XAVIER ENGINEERING COLLEGE, TIRUNELVELI
(An Autonomous Institution)
DEPARTMENT OF INFORMATION TECHNOLOGY
Course Code | Name: 24IF2706 | BIG DATA ACQUISITION
Question Bank: Unit 1,2,3
PART – A
Max.
[Link] Question Unit BL KC PI
Marks
1 A grocery chain receives massive amounts of transaction data daily. How
2 1 K2 C 2.1.2
can the 3Vs of Big Data be used to categorize and manage this data?
2 An e-commerce website faces high traffic fluctuations. How can cloud
2 1 K1 C 2.2.2
computing help in scaling big data analytics efficiently?
A global customer service chatbot receives queries in multiple languages.
3 How does an NLP-based language detection system assist in routing 2 1 K2 C 3.1.5
these queries correctly?
A bank’s fraud detection system identifies an unusual transaction pattern
4 in a customer’s account. How can anomaly detection algorithms help 2 1 K1 C 2.2.4
prevent fraudulent activities?
5 A media streaming platform stores petabytes of video data. How does
2 1 K2 C 3.1.5
HDFS ensure fault tolerance and data reliability in this case?
6 A weather forecasting agency needs to process terabytes of satellite
2 1 K1 C 2.2.4
images daily. How can MapReduce help in efficiently analyzing this data?
A flight booking system stores customer reservations and needs fast
7 traversal between passengers, flights, and seat assignments. How does a 2 2 K2 C 2.1.2
navigational data model efficiently handle this data?
A university maintains student records, courses, and faculty details. How
8 can a relational data model be used to manage these relationships 2 2 K2 C 2.1.3
effectively?
A social media platform experiences a sudden spike in user traffic. Why
9 would a NoSQL database be preferred over a traditional relational 2 2 K2 C 3.1.1
database in this case?
An e-commerce website wants to store millions of user sessions quickly
10 and retrieve shopping cart details in milliseconds. How does a key-value 2 2 K2 C 2.2.2
database help in achieving this?
A telecom company stores call records of millions of customers and
11 frequently queries usage patterns. How can a column-based database 2 2 K2 C 3.1.1
optimize these queries?
A dating app wants to find mutual connections between users based on
12 their social circles. How can a graph-based database improve the 2 2 K2 C 2.2.2
efficiency of this matchmaking process?
A retail chain wants to analyze daily sales across thousands of stores.
13 How does the Hadoop Ecosystem help in efficiently processing this vast 2 3 K2 C 1.1.1
amount of data?
A climate research institute needs to process satellite images to predict
14 weather patterns. How does parallel computation in Hadoop improve the 2 3 K1 C 1.1.2
speed of this task?
A financial institution runs fraud detection algorithms using MapReduce
15 but finds it slow. How can recent improvements in MapReduce make this 2 3 K1 C 1.1.2
process more efficient?
PART – B
Max.
[Link] Question Unit BL KC PI
Marks
A healthcare startup wants to predict disease outbreaks using social
media posts and patient records. Design a big data pipeline
1 13 1 K2 F 2.2.2
integrating machine learning models to analyze these datasets.
Explain how cloud computing can enhance the system’s efficiency.
A multinational company wants to store, retrieve, and process terabytes
of log files generated by its servers. Compare the Google File System
2 (GFS) and HDFS in terms of architecture, fault tolerance, and real-time 13 1 K3 C 2.4.1
accessibility. Which system would you recommend? Justify your answer
with a real-time implementation example.
A political research firm wants to analyze public sentiment about
election candidates using Twitter data. Describe how NLP techniques
3 (including sentiment analysis and named entity recognition) can be used 13 1 K2 F 3.4.3
to classify tweets as positive, negative, or neutral. Discuss the challenges
in handling slang, sarcasm, and multilingual data.
4 A news organization wants to extract trending topics from millions 13 1 K2 F 3.4.3
of online news articles daily. Explain how MapReduce can be
Max.
[Link] Question Unit BL KC PI
Marks
applied for efficient text mining, breaking down the process into
mapping, shuffling, and reducing phases.
An OTT platform (like Netflix or Hotstar) wants to enhance user
experience by providing personalized movie recommendations. Explain
5 13 1 K2 F 2.3.1
how a collaborative filtering-based recommender system works, and
discuss its limitations in real-world scenarios.
A hospital management system needs to track patients, doctors,
treatments, and medical history. Compare how a navigational data
6 13 2 K2 F 3.1.1
model and a relational data model would structure and query this
data. Which one would be better suited for this use case and why?
A ride-sharing app (like Uber or Ola) requires real-time data processing
for tracking drivers and matching them with passengers. Explain how a
7 NoSQL database (e.g., MongoDB or Cassandra) is more effective than a 13 2 K3 P 2.3.1
traditional SQL database for handling high-speed, geographically
distributed data.
A music streaming service needs to store user playlists, which
contain dynamic lists of song preferences and metadata. Compare
8 how key-value databases (e.g., Redis) and document-based 13 2 K3 C 3.4.3
databases (e.g., MongoDB) would store and retrieve this data. Which
one would be better and why?
A professional networking platform (like LinkedIn) wants to recommend
job connections based on mutual colleagues, skills, and interests. Explain
9 13 2 K3 C 2.2.2
how a graph-based database (e.g., Neo4j) can efficiently model and query
these relationships to provide meaningful recommendations.
A financial trading platform needs to analyze stock prices across multiple
companies and time periods. Explain how columnar storage in databases
10 13 2 K3 C 2.2.2
like Apache HBase or Google Bigtable optimizes analytical queries
compared to traditional row-based storage.
A healthcare organization wants to analyze millions of patient
records to identify disease outbreaks. Explain how different
11 components of the Hadoop Ecosystem (HDFS, MapReduce, Hive, 13 3 K3 C 2.3.1
HBase, Spark) can be used to store, process, and analyze this data
efficiently.
A self-driving car company collects vast amounts of sensor data from
vehicles in different locations. Describe how parallel computation and
12 13 3 K2 F 2.2.4
load balancing techniques in Hadoop help in processing this high-speed,
large-scale data.
PART – C
Max.
[Link] Question Unit BL KC PI
Marks
A cybersecurity firm is monitoring network logs in real-time to detect
potential cyber threats.
● Design a big data analytics framework that processes large
volumes of log data using machine learning models.
1 ● Explain how anomaly detection techniques (e.g., clustering, 15 1 K3 P 3.4.3
deep learning) can identify unusual patterns indicating
cyberattacks.
● Discuss how cloud computing can help in handling and scaling
real-time threat detection.
A smart city initiative aims to integrate real-time data from traffic
sensors, citizen reports, public transport schedules, and emergency
services.
● Design a hybrid database architecture that combines relational,
NoSQL (key-value, document-based), and graph-based models
2 to efficiently handle different types of data. 15 2 K2 P 2.1.2
● Explain the role of each data model in storing, retrieving, and
analyzing real-time information.
● Discuss the challenges and advantages of using a multi-model
database approach in such a large-scale project.