0% found this document useful (0 votes)
24 views2 pages

Big Data Analytics Exam Questions

The document outlines the Continuous Assessment Test for Big Data Analytics, detailing the structure, including Part A and Part B questions focused on key concepts like Hadoop, NoSQL databases, and case studies related to smart cities and customer retention. It emphasizes the importance of understanding big data's 4Vs and the application of various database models in real-world scenarios. The test aims to assess students' comprehension of big data concepts and their practical implications in different industries.

Uploaded by

Ravi Prakash
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views2 pages

Big Data Analytics Exam Questions

The document outlines the Continuous Assessment Test for Big Data Analytics, detailing the structure, including Part A and Part B questions focused on key concepts like Hadoop, NoSQL databases, and case studies related to smart cities and customer retention. It emphasizes the importance of understanding big data's 4Vs and the application of various database models in real-world scenarios. The test aims to assess students' comprehension of big data concepts and their practical implications in different industries.

Uploaded by

Ravi Prakash
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Reg. No.

Continuous Assessment Test- I [CAT - I]


Year : III
Semester : 06
Branch :B.E/[Link].- CSE/CSBS/AI&DS
Sub. Code : CCS334
Subject Name : BIG DATA ANALYTICS
QP Code : 226507

[Regulations 2021]

Date: 14-02-25 Time: 90 Min Marks: 50


Answer ALL Questions
Part A [6 x 2 = 12 Marks]
1.1 Explain the role of Hadoop in big data processing. [A2] CO1
1.2 Propose a scenario where the analysis of unstructured data such as social media [B2] CO1
posts could benefit a business.
1.3 Define Big Data. [A1] CO1
1.4 Explain the convergence of key trends that have contributed to the growth of [A2] CO1
Big Data.
1.5 Compare and contrast traditional data and unstructured data. [B2] CO1
1.6 Can you recall an industry that extensively uses big data? [A1] CO1
Part B [ 1x13=13 Marks]
1.7 a Case Study: Big Data Analytics in a Smart City [C1] CO1
Scenario: A Smart City initiative is collecting data from various sources,
such as traffic sensors, public transportation systems, environmental
monitoring stations, and social media platforms. This data is being used to
optimize traffic management, improve public health, enhance city services,
and support better decision-making by city planners.
Question:
How can the 4Vs of Big Data (Volume, Velocity, Variety, and Veracity) be
effectively managed and utilized in a Smart City project to improve urban
living conditions?
[OR]

b Case Study: Improving Customer Retention with Big Data Analytics [C1] CO1
A retail company has been experiencing a decline in customer retention over
the past year. The company collects vast amounts of data from its online and
in-store interactions with customers. This includes purchase histories,
website browsing behavior, social media interactions, and feedback surveys.
Question:
How can Big Data Analytics be used to identify patterns and insights from
this vast array of customer data to improve customer retention? Outline the
steps, tools, and techniques you would apply in analyzing the data.
Additionally, discuss potential challenges the company may face when
working with this data, and propose solutions for overcoming them.
Part A [6 x 2 = 12 Marks]
2.1 Explain the concept of NoSQL databases and highlight the main reasons behind [B1] CO2
their emergence in the field of data management.
2.2 Compare and contrast the key-value data model and the document data model [A2] CO2
used in NoSQL databases. Provide examples of scenarios where each model
might be more suitable.
2.3 Assess the advantages and disadvantages of schemaless databases. [B1] CO2
2.4 Discuss the concept of graph database? [B2] CO2
2.5 Differentiate between schema less databases and traditional relational databases. [A2] CO2
2.6 Describe the types of data and relationships that would benefit from Graph [A1] CO2
database model.
Part B [1x13=13 Marks]
2.7 a Case Study: Implementing a Graph-Based Database in Healthcare for [C1] CO2
Disease Diagnosis and Treatment
Scenario: A Healthcare Organization is looking to enhance its capabilities
for disease diagnosis, patient care, and treatment planning by using a graph-
based database to analyze complex relationships between patients, medical
conditions, symptoms, treatments, and healthcare providers. This system
aims to help in identifying potential disease patterns, providing personalized
treatment recommendations, and improving clinical decision-making.
Question:
How can a graph-based database be used in healthcare to improve disease
diagnosis, personalize treatment plans, and enhance medical research?
[OR]
b Case Study: Choosing the Right NoSQL Database for a Social Media [C1] CO2
Platform

A startup is building a new social media platform aimed at connecting users


globally. The platform must handle massive amounts of data, including user
profiles, real-time posts, comments, likes, multimedia uploads, and activity
logs. The company is considering implementing a NoSQL database solution
due to the scalability needs and diverse data types.

The startup's requirements are as follows:


1. User Profile Data: Highly structured data with a focus on
relationships between users (e.g., friends, followers, connections).
2. Real-Time Posts and Comments: Unstructured, high-volume data
with frequent reads and writes.
3. Multimedia Files: Large binary data (e.g., images, videos) stored and
retrieved.
4. Activity Logs: Time-series data generated by user interactions (e.g.,
clicks, views, shares).
Question:
Given the startup's diverse data needs, which types of NoSQL databases
would you recommend for each of the four requirements? Explain why you
would choose the specific NoSQL database type (Document Store, Key-
Value Store, Column-Family Store, or Graph Database) for each use case,
and how the features of these databases align with the company’s needs for
scalability, performance, and flexibility.

Common questions

Powered by AI

By analyzing unstructured data, such as social media posts, businesses can gain insights into consumer sentiments, preferences, and trends in real-time. This can help in tailoring marketing strategies, improving customer engagement, and rapidly responding to market changes or customer feedback. Additionally, such analysis can identify potential areas for product improvement and innovation, ultimately leading to enhanced customer satisfaction and competitive advantage .

The emergence of NoSQL databases is primarily driven by the need to handle large volumes of unstructured data, scalability requirements, and the need for faster real-time data processing. Unlike traditional relational databases which are schema-bound and usually less scalable, NoSQL databases are designed for flexible schema, horizontal scaling, and are optimized for specific types of data models such as key-value pairs, documents, or graph structures. This makes NoSQL databases more suitable for modern applications that require high performance and agility in handling diverse data types .

In a Smart City project, managing Volume involves using scalable storage solutions and distributed systems like Hadoop to handle the vast amounts of data collected from multiple sources. Velocity is addressed by implementing real-time data processing frameworks, enabling timely decision-making and interventions in urban management. Variety is managed through integrated data platforms that can analyze both structured and unstructured data, facilitating a holistic view of the urban environment. Ensuring Veracity involves deploying robust data quality checks and integrating trustworthy data sources to enhance the reliability of insights generated, thereby improving urban planning and services .

Graph databases are ideal for healthcare systems because they naturally represent and analyze complex networks and relationships between various entities such as patients, treatments, and symptoms. They provide a flexible data model enabling seamless integration and exploration of related data points, essential for identifying disease patterns and correlations. This insight supports personalized treatment plans and enhanced clinical decision-making, thereby improving patient outcomes and healthcare services efficiency .

Key trends contributing to Big Data growth include the exponential increase in data generated from digital activities, advancements in storage technologies, improved data processing capabilities, and the proliferation of Internet of Things (IoT) devices. The convergence of these trends has revolutionized data management practices by enabling efficient storage, real-time data processing, and the seamless integration of diverse data types. As a result, organizations are now able to utilize big data analytics to drive strategic decision-making, operational efficiencies, and customer personalization .

Schemaless databases, like those used in NoSQL systems, offer flexibility in handling data by allowing each document to have a potentially different structure, which is advantageous in environments where data models frequently change. They are well-suited for managing large-scale, varied datasets. However, challenges include increased complexity in data integrity maintenance and query performance, as the absence of a fixed schema may lead to difficulties in ensuring consistency and efficient data retrieval. Thus, careful design and indexing strategies are necessary to overcome these challenges .

Graph-based databases offer numerous benefits in healthcare, including improved data interoperability, enhanced ability to model complex relationships, and efficient querying of connected data, facilitating better disease diagnosis and personalized treatment recommendations. They support comprehensive analysis of patterns and trends in patient data which can improve clinical decisions. However, potential risks include data privacy concerns due to handling sensitive patient information, increased complexity in database management, and the need for specialized skills to implement and maintain the system effectively .

Hadoop is an open-source framework that enables the processing and storage of large datasets in a distributed computing environment. Its key components include Hadoop Distributed File System (HDFS) for storing data across multiple machines, and MapReduce for processing data in parallel across clusters. HDFS provides scalability and fault tolerance by distributing data across multiple nodes, while MapReduce simplifies the task of processing large volumes of data by breaking it down into smaller tasks that can be executed in parallel. These features effectively handle the challenges posed by big data, such as volume and variety .

Traditional structured data is organized in fixed schemas, typically in tables with rows and columns, making it straightforward to store, query, and analyze using relational database systems. Unstructured data, on the other hand, lacks a consistent format or schema, encompassing varied content like text, images, video, and more. Handling them differently is crucial in big data analytics because structured data lends itself to straightforward querying and analysis, whereas unstructured data requires more advanced tools and techniques for processing, such as natural language processing and image recognition, to extract actionable insights .

For user profile data, which is highly structured and relational, a Graph Database would be suitable for efficiently managing relationships such as friends and followers. Real-time posts and comments, which are high-volume and require quick reads and writes, can benefit from a Document Store like MongoDB, supporting unstructured data with high read/write throughput. Multimedia files require a Key-Value Store such as Amazon S3 for scalable storage and fast retrieval of large binary data. Activity logs, being time-series data, find an ideal match in a Column-Family Store like Cassandra, which optimizes for sequential data ingestion and retrieval .

You might also like