Enrolment No.
/Seat No_______________
GUJARAT TECHNOLOGICAL UNIVERSITY
BE - SEMESTER–VII EXAMINATION – SUMMER 2025
Subject Code:3170722 Date:14-05-2025
Subject Name:Big Data Analytics
Time:02:30 PM TO 05:00 PM Total Marks:70
Instructions:
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.
4. Simple and non-programmable scientific calculators are allowed.
MARKS
Q.1 (a) What is Big Data? Explain the five V’s of Big Data. 03
(b) Discuss the key differences between structured, unstructured, and 04
semi-structured data with examples.
(c) Explain Hadoop's architecture in detail with a diagram. How does 07
the NameNode and DataNode work together in Hadoop?
Q.2 (a) What is a MapReduce function? Explain the purpose of the Map 03
and Reduce phases.
(b) Discuss the concept of Data Locality in Hadoop. Why is it 04
significant?
(c) Write a detailed note on HDFS (Hadoop Distributed File System) 07
and explain its significance in Big Data processing.
OR
(c) Discuss how MapReduce handles failures during the execution of 07
jobs.
Q.3 (a) Define NoSQL databases and explain how they differ from 03
relational databases.
(b) Explain the Document store model in NoSQL databases 04
(c) Compare and contrast different types of NoSQL databases: Key- 07
Value, Document, Column-family, and Graph databases.
OR
Q.3 (a) Describe how NoSQL databases handle scalability and high 03
availability.
(b) What is Sharding in NoSQL databases? How does it help in 04
handling Big Data?
1
(c) Describe the architecture of MongoDB. Discuss its data model and 07
how it handles queries and indexing.
Q.4 (a) Explain the role of Apache Spark in Big Data Analytics. How is it 03
different from Hadoop MapReduce?
(b) Discuss Spark’s RDD (Resilient Distributed Dataset) and its 04
features.
(c) Write and explain a simple Spark application for word count using 07
PySpark
OR
Q.4 (a) What is in-memory processing? Why is it beneficial in Big Data 03
analytics?
(b) Explain Spark’s transformation and action operations with 04
examples.
(c) How is Spark used to perform Machine Learning tasks? Discuss 07
Spark MLlib with an example of a classification algorithm.
Q.5 (a) What is Mahout? Explain its use in scalable machine learning with 03
Big Data.
(b) Explain the concept of data streaming. How does Spark Streaming 04
process real-time data?
(c) Explain how data mining techniques are applied in Big Data 07
Analytics. Mention some of the challenges faced when mining
large datasets.
OR
Q.5 (a) Describe the use of Big Data in retail industries. How can 03
companies benefit from Big Data Analytics in decision-making?
(b) What are the security challenges associated with Big Data? 04
(c) Explain data warehousing in the context of Big Data. How does 07
Hive enable querying large datasets?
*************