0% found this document useful (0 votes)

84 views12 pages

MapReduce Concepts in NoSQL Databases

Uploaded by

Raghu Nayak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views12 pages

MapReduce Concepts in NoSQL Databases

Uploaded by

Raghu Nayak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Question 1: Explain with diagram, partitioning in MapReduce
Question 2: Two stages MapReduce example
Question 3: Explain single stage MapReduce
Calculations in MapReduce
Key-Value Data Stores
Usage examples of Redis

NOSQL Database 21CS745 Question Bank & Answers

MODULE 3

Question Bank with Answers

1 Explain with a neat diagram, the partitioning and combining in MapReduce

Parallelism with Partitioning:

 In a basic setup, the outputs of all mappers are concatenated and sent into a single
reduce function. This can become inefficient, especially as the size of the data grows.

 To increase parallelism and minimize bottlenecks, we partition the output of the

mappers. Each reducer operates on a subset of data associated with a specific key.
This allows multiple reduce tasks to run in parallel, speeding up the process.

 In this setup, the key-value pairs are grouped into partitions based on the key. These
partitions are then shuffled and distributed to the corresponding reducers. Multiple
reducers work on different partitions in parallel, and the results are merged at the end.

Data Transfer Reduction with Combining:

 A significant issue in map-reduce jobs is the amount of data being transferred

between the map and reduce phases. Much of the data consists of repeated key-value
pairs for the same key.

 The solution to this is a combiner function, which processes the data on the map side
before it is transferred to the reducers. The combiner aggregates values for the same
key, reducing the amount of data transferred. This helps cut down on network
overhead.

 A combiner function is essentially a mini-reduce function. In many cases, the

combiner function can be the same as the reducer function, but with a constraint: the
output of the combiner must match the input of the reduce function. These are called
combinable reducers.

Non-Combinable Reducers:

 Some reduce functions cannot be used as combiners. For instance, a reduce function
that counts unique customers for a product might not be combinable. This is because
the output of such a reduce function (the total count) differs from the input (individual
product-customer pairs).

 In such cases, a different approach is used, such as eliminating duplicates before they
reach the reducer, but this doesn’t combine the data in the same way as a combiner
would.

1
Koustav Biswas. Dept. Of CSE, DSATM
NOSQL Database 21CS745 Question Bank & Answers

Combining Across Mappers:

 When using combinable reducers, not only can the map-reduce job run in parallel on
different partitions, but combining can occur across nodes as well. This flexibility
allows for earlier combining before all the mappers have completed, and even allows
some data combining to happen on the map side before it’s sent over the network.

Framework Considerations:

 Some map-reduce frameworks require all reducers to be combinable, which

maximizes flexibility by allowing parallel and serial reductions. If a non-combinable
reducer is necessary, it’s typically handled by breaking the processing into pipelined
map-reduce steps.

2
Koustav Biswas. Dept. Of CSE, DSATM
NOSQL Database 21CS745 Question Bank & Answers

2 Explain two stages Map reduce example, with neat diagram

This "pipes-and-filters" model is beneficial when processing tasks involve multiple phases,
each of which can build upon the output of the previous stage.

Stage 1: Aggregate Monthly Sales

In the first stage, the goal is to summarize sales by product and month for each year. This
stage involves:

1. Mapping: Each input record (a single sale) is mapped to a key-value pair where the
key combines the year, month, and product, and the value is the quantity sold.

2. Reducing: All records with the same key (i.e., the same product in the same month of
the same year) are aggregated, summing up quantities. This gives the total sales for
each product in each month.

Example: For each sales record, the mapper might output:

 Key: [Link] puerh

 Value: quantity

The reducer then aggregates these records to produce one record per product per month, such
as:

 {year: 2011, month: 12, product: puerh, quantity: 1200}.

Stage 2: Year-on-Year Comparison

In the second stage, the output from Stage 1 is processed to compare the sales of each product
in a given month with the previous year. This is achieved by:

1. Mapping: Each record is mapped, and the mapper identifies whether it belongs to the
current year (2011) or the previous year (2010).

2. Reducing: The reducer merges records for the same product and month from both
years, calculates the percentage increase or decrease, and produces a final record
showing the comparison.

Example: For the same product "puerh" in December 2011 and 2010, the reducer might
produce:

 {product: puerh, month: 12, current_quantity: 1200, prior_quantity: 1000, increase:

20%}.

Benefits of the Two-Stage Approach

3
Koustav Biswas. Dept. Of CSE, DSATM
NOSQL Database 21CS745 Question Bank & Answers

 Parallelism: Each map and reduce task can be executed in parallel, making it efficient
for large datasets.

 Reusability: The intermediate data can be stored, reused, or analyzed separately.

 Cluster-Suitability: The final outputs are ideal for distributed storage, which enables
quick data access for downstream processing.

Using tools like Apache Pig or Hive on Hadoop further simplifies this model by providing
high-level abstractions for MapReduce operations. This is particularly helpful as data scales
and demands for high-volume processing increase.

Reusable Intermediate Outputs: Intermediate results from MapReduce can be stored as

materialized views, saving time and resources for future calculations.

Optimizing Query Patterns: Build materialized views based on actual queries, as

speculative reuse can be inefficient.

Language Support: Tools like Apache Pig and Hive simplify MapReduce with user-friendly
scripting and SQL-like syntax, making it easier to use with Hadoop.

Beyond NoSQL: MapReduce is useful in many data environments, not just NoSQL, and is
ideal for distributed processing on large datasets.

Cluster-Friendly: MapReduce is well-suited for handling large volumes of data across

clusters, making it a crucial tool as data processing demands grow.

4
Koustav Biswas. Dept. Of CSE, DSATM
NOSQL Database 21CS745 Question Bank & Answers

5
Koustav Biswas. Dept. Of CSE, DSATM
NOSQL Database 21CS745 Question Bank & Answers

3 Explain basic map reduce, with neat diagram

The MapReduce framework is a programming model designed to handle large-scale data
processing across distributed systems. It allows complex computations on large datasets by
breaking down tasks into parallelizable units, making it especially effective for handling tasks
like data aggregation and analysis.

Core Components of MapReduce

1. Map Function: The first phase of MapReduce is the map function, which processes
each data record independently. Each record, or "aggregate" in database terms, is
converted into a series of key-value pairs. For example, when processing orders that
contain line items (product IDs, quantities, and prices), the map function extracts each
product and associates it with its details (product ID as the key, quantity, and price as
values). This setup enables efficient data processing by focusing only on relevant
details for each record.

2. Parallelism and Independence: The map function processes each aggregate (order)
independently, making it highly parallelizable. Since each map operation works
without reference to others, the framework can assign these tasks across multiple
nodes in a cluster. This parallelism enables faster data processing by distributing tasks
across the system.

3. Reduce Function: The second phase, known as the reduce function, aggregates data
by combining all values associated with each unique key. The reduce function
processes collections of values with the same key—such as all orders containing a
specific product—and consolidates them into a single output. For example, if the map
phase produced several entries for a product (each detailing quantity and revenue
from different orders), the reduce function sums these values to yield total sales for
that product.

4. Framework Coordination: The MapReduce framework automatically manages data

flow between the map and reduce phases, including moving and sorting key-value

6
Koustav Biswas. Dept. Of CSE, DSATM
NOSQL Database 21CS745 Question Bank & Answers

pairs and ensuring the appropriate data reaches the reduce function. This coordination
allows developers to focus on writing the map and reduce functions without needing
to handle data shuffling or parallel task management directly.

4 How are calculations composed in Map reduce? Explain with neat diagram
The MapReduce approach is a model designed for concurrent data processing, prioritizing
ease of parallelization over flexibility. Here’s an overview of its core principles and
limitations:

Constraints in MapReduce

 Single Aggregate per Map Task: Each map task can only work with individual
records or aggregates (e.g., single orders), meaning that processing must be designed
to operate independently on each data entry without reference to others.

 Single Key per Reduce Task: Each reduce task operates on values associated with a
specific key (e.g., one product ID), so computations must be structured around
aggregating values that share the same key.

Structuring Calculations

To use MapReduce effectively, calculations must fit within the model’s constraints. Here’s
how different calculations are handled:

1. Non-Composable Calculations (e.g., Averages):

o Calculating averages illustrates a limitation in MapReduce because averages

are not composable—you can’t merge two average values directly.

o Instead, each map task must output the total sum and count of quantities,
allowing the reduce function to combine these values. The final average is
computed from the combined sum and count, not from intermediate averages.

2. Counting Operations:

o Counts are straightforward in MapReduce. Each map task emits a count of 1

for each occurrence, and the reduce function simply sums these to get the total
count.

7
Koustav Biswas. Dept. Of CSE, DSATM
NOSQL Database 21CS745 Question Bank & Answers

Example Workflows:

 In a product order analysis, each map function could output entries with a product ID
key, a count of 1, and a quantity. The reduce function then combines all entries with
the same key to produce total counts and quantities, enabling further calculations like
averages based on the combined data.

What are key value stores? List out some popular key value database. Explain how all
5
data is stored in a single bucket of key value data store

Key-value stores are among the simplest and most high-performing types of NoSQL
databases, using a straightforward API model focused on basic operations for managing data.

Core Characteristics:

1. Basic Operations:

o Get: Retrieve the value associated with a key.

o Put: Insert or update a value for a key.

o Delete: Remove a key and its associated value.

2. Data Structure:

o The value in a key-value store is an opaque blob (binary large object),

meaning the database stores it without needing to interpret its content.

o Responsibility for understanding and managing the structure of stored data lies
entirely with the application.

8
Koustav Biswas. Dept. Of CSE, DSATM
NOSQL Database 21CS745 Question Bank & Answers

3. Primary-Key Access:

o Key-value stores operate solely on primary keys, allowing efficient, direct

access to data and making these databases highly performant and scalable.

Popular Key-Value Databases:

 Riak: Uses a "bucket" structure for segmenting keys, aiding organization.

 Redis: Often referred to as a data structure server, supports complex structures like
lists, sets, and hashes, enabling more versatile use.

 Memcached, Berkeley DB, HamsterDB, Amazon DynamoDB, Project

Voldemort.

Advanced Features in Key-Value Databases:

 Some stores, such as Redis, offer data structure support for lists, sets, and hashes,
allowing for a range of operations like unions and intersections.

Bucket Organization in Key-Value Stores:

 Single Bucket Approach: All data (e.g., session data, shopping carts) can be stored
within a single bucket under one key-value pair, creating a unified object. However,
this can risk key conflicts due to different data types being stored under the same
bucket.

 Separate Buckets for Data Types: By appending object names to keys or creating
specific buckets for each data type (e.g., sessionID_userProfile), it’s possible to avoid
key conflicts and access only the necessary object types without needing extensive
key design changes.

9
Koustav Biswas. Dept. Of CSE, DSATM
NOSQL Database 21CS745 Question Bank & Answers

Example of Redis Use:

 Redis supports lists and arrays, allowing it to store more structured information like
states, visit logs, or address types, making it ideal for data that requires order or
grouping

6 What are the key value features. Explain in detail

The key-value store model provides a simple and efficient approach to data management,
offering features that differ significantly from those of traditional relational databases.

1. Consistency

 Key-value stores are typically optimized for high performance, particularly in

distributed settings, using an eventually consistent model. This means that changes
made to the data may take time to propagate across all nodes, which can lead to
temporary inconsistencies. For instance, in Riak, users can choose either "last write
wins" or "multiple values returned" for handling conflicting writes, allowing client-
side resolution.

 This flexibility in consistency settings can be defined at the bucket level, where
options such as allow Siblings, n Val (replication factor), and w (write quorum)
enable control over the balance between data consistency and performance.

2. Transactions

 Transactions in key-value stores are limited or non-existent due to the lack of support
for multi-key or multi-document transactions. To manage transactional requirements,
some key-value stores, like Riak, employ a quorum model for writes and reads. By
configuring values like N (total replicas), W (write quorum), and R (read quorum),
users can achieve a level of reliability in write success and data availability.

3. Query Features

 Key-value stores primarily support direct key-based lookups, without the complex
query capabilities found in SQL databases. This design is fast but limits flexibility, as
querying by fields within the value requires either application-level filtering or special
indexing capabilities (like Riak Search, which enables Lucene-based querying).

 Key design becomes crucial, as the application must generate or derive meaningful
keys for efficient data retrieval. This constraint makes key-value stores ideal for
applications where queries are predictable, such as session storage or shopping carts.

4. Structure of Data

 The value part of key-value pairs is typically stored as a blob, leaving the content and
structure to the application. This flexibility allows for storing various data types (e.g.,

10
Koustav Biswas. Dept. Of CSE, DSATM
NOSQL Database 21CS745 Question Bank & Answers

JSON, XML, text), but it also shifts the responsibility of data interpretation to the
client application.

 For instance, Riak allows users to specify data types in requests via the Content-Type
header, which can simplify deserialization but does not affect how the database stores
the blob.

5. Scaling

 Sharding, or partitioning data across multiple nodes based on keys, enables key-value
stores to scale horizontally. Each node handles a subset of keys, based on a
deterministic function, allowing seamless expansion by adding more nodes to the
cluster.

 However, this approach also introduces risks; if a node responsible for certain keys
fails, data with those keys becomes unavailable until the node is restored. Key-value
stores address these issues with replication and settings for the CAP theorem (e.g., N,
R, and W values in Riak), offering a trade-off between consistency, availability, and
partition tolerance.

7 Explain with suitable use cases of key value stores

Key-value stores offer a simple and efficient storage model suitable for applications where
data can be represented as individual items with unique keys.:

1. Storing Session Information:

 Use Case: Each web session is assigned a unique sessionid.

 Advantage: Fast retrieval and storage in a single PUT or GET request, ideal for
storing session data.

 Example Solution: Memcached or Riak can be used, with Riak offering enhanced
availability for session consistency across requests.

2. User Profiles and Preferences:

 Use Case: User-specific settings such as language, timezone, or access permissions.

 Advantage: All user profile data can be stored in a single object, allowing quick
retrieval of preferences.

 Example Solution: The profile can be stored with a unique user ID as the key,
making it simple to access user settings with a single GET.

3. Shopping Cart Data:

 Use Case: Shopping carts tied to individual users across sessions, browsers, and
devices.

11
Koustav Biswas. Dept. Of CSE, DSATM
NOSQL Database 21CS745 Question Bank & Answers

 Advantage: All cart information is stored under a unique userid key, ensuring high
availability.

 Example Solution: A Riak cluster, which maintains availability and fault tolerance,
making it suitable for this application.

When Not to Use Key-Value Stores

While key-value stores are effective for certain types of data storage, they are not ideal for
every scenario:

1. Data Relationships:

 Challenge: Complex relationships or associations between data items are difficult to

model in a key-value store.

 Limitation: Key-value stores lack the querying capability and relational structure that
relational databases provide.

 Alternative: Consider a relational database or a graph database where relationships

among entities are critical.

----------------------------------------END OF MODULE 3----------------------------------------------

12
Koustav Biswas. Dept. Of CSE, DSATM

Common questions

Key-value stores are distinguished from traditional relational databases by their simplified data model, where data is stored as key-value pairs, offering fast direct key-based lookups. They are typically optimized for high performance in distributed environments following an eventually consistent model, lacking support for complex queries or multi-key transactions. The values are stored as blobs, and scaling is achieved through sharding, which partitions data across nodes using keys, enabling horizontal expansion. This model is ideal for applications with predictable access patterns, like session information or shopping carts .

MapReduce frameworks handle non-combinable reducers by utilizing a pipes-and-filters model, breaking tasks into pipelined steps. This model accommodates processes that cannot be reduced combinatively by structuring them into sequential phases, each building on the previous stage's output. This approach simplifies the processing of complex tasks, ensuring that non-combinable elements can be efficiently managed and processed in parts. It provides greater flexibility, enabling frameworks to support diverse processing needs while enhancing scaling and reusability of intermediate results .

Key-value stores may be unsuitable in scenarios where complex relationships or associations between data items are crucial, as they lack querying capabilities and relational structure. In such cases, relational databases or graph databases are more appropriate. These alternatives enable modeling detailed relationships and efficiently executing complex queries, which are challenging to achieve with the simplistic structure of key-value stores .

Key design is crucial in key-value stores as it directly impacts data retrieval efficiency. Properly designed keys help in fast lookups and reduce system overhead. However, it presents challenges because the application must accurately generate or derive these keys to enable meaningful and efficient data access, necessitating thoughtful planning and implementation. Predictable key patterns are essential, especially since complex queries are not supported natively; this requires any additional filtering or indexing to occur at the application level rather than the database .

Partitioning enhances parallelism in MapReduce by dividing the output of the map tasks into multiple partitions based on the keys. Each partition is processed by different reduce tasks, allowing them to run in parallel, thus speeding up the process by minimizing bottlenecks. The key-value pairs are grouped into partitions by the key, then shuffled and distributed to appropriate reducers, enabling multiple reducers to work on different partitions simultaneously .

A combiner function acts as a mini-reduce function that processes data on the map side before it is transferred to the reducers. It aggregates values for the same key, reducing the amount of data transferred over the network. This reduction in data volume helps cut down on network overhead by decreasing the amount of repeated key-value pairs reaching the reducers. Combiners can often be similar to reducers, but the output of a combiner must be compatible with the input of the reduce function .

The MapReduce framework manages data flows between map and reduce phases by automatically sorting and transferring the key-value pairs generated during the map phase to the correct reducers. This involves moving data, sorting by keys, and ensuring that data reaches the appropriate reduce functions, simplifying the process for developers. This coordination allows developers to focus solely on writing map and reduce functions without worrying about data shuffling, thus reducing complexity and allowing efficient scaling of processes across distributed systems .

Some reduce functions cannot be used as combiners because their output differs significantly from their input. For instance, a reducer that counts unique customers can't be a combiner because its output (a total count) distinctly differs from the input (individual customer-product pairs). In such scenarios, an alternative approach involves eliminating duplicates prior to reaching the reducer or breaking the process into multiple pipelined MapReduce steps, allowing for effective data processing even without combining data in the traditional manner .

Typical use cases for key-value stores include storing session information, user profiles, and shopping cart data. For session information, key-value stores allow for fast retrieval and storage using a single PUT or GET request per session, enhancing system efficiency. User profiles can be stored with a unique user ID as the key, streamlining the retrieval of preferences and settings. Shopping cart data benefits from high availability and fault tolerance when tied to a unique user ID key, ensuring accessibility across different sessions and devices. These use cases exemplify the strengths of key-value stores, such as simplicity, performance, and availability .

In the first stage of a MapReduce job aimed at summarizing sales data, mapping translates each sale into a key-value pair combining year, month, and product with the quantity sold as the value. The reducing phase aggregates these to sum up the quantities, providing total sales per product per month. The second stage maps the summarized data to differentiate between current and prior year's sales for the same product and month, with the reducer calculating percentage changes. This approach enables parallel processing of tasks, allows intermediate data to be reused separately, and is well-suited for distributed storage and cluster operations, thereby enhancing efficiency and scalability .

Understanding Map-Reduce Framework
No ratings yet
Understanding Map-Reduce Framework
36 pages
Anna University Regulations 2008 Overview
No ratings yet
Anna University Regulations 2008 Overview
14 pages
MapReduce Job Architecture Overview
100% (1)
MapReduce Job Architecture Overview
46 pages
Overview of NoSQL Database Types
No ratings yet
Overview of NoSQL Database Types
19 pages
Understanding CORBA Middleware Architecture
No ratings yet
Understanding CORBA Middleware Architecture
38 pages
NoSQL Database Models Explained
No ratings yet
NoSQL Database Models Explained
55 pages
Centralized and Client-Server DBMS Architectures
No ratings yet
Centralized and Client-Server DBMS Architectures
25 pages
BIS613D: Key Concepts in Cloud Computing
No ratings yet
BIS613D: Key Concepts in Cloud Computing
2 pages
Understanding Digital Data Types and Big Data
No ratings yet
Understanding Digital Data Types and Big Data
136 pages
Types of Coupling in Software Engineering
No ratings yet
Types of Coupling in Software Engineering
3 pages
Data Consolidation Techniques in IoT
No ratings yet
Data Consolidation Techniques in IoT
5 pages
E-R Model for Library Management System
No ratings yet
E-R Model for Library Management System
37 pages
Understanding MapReduce Execution Pipeline
No ratings yet
Understanding MapReduce Execution Pipeline
27 pages
MapReduce Job Execution Overview
No ratings yet
MapReduce Job Execution Overview
24 pages
Thread Scheduling in Operating Systems
No ratings yet
Thread Scheduling in Operating Systems
4 pages
GATE Exam Process Management Questions
No ratings yet
GATE Exam Process Management Questions
12 pages
Concurrency Control in Distributed Databases
No ratings yet
Concurrency Control in Distributed Databases
5 pages
UNIX Programming: User and Group IDs
No ratings yet
UNIX Programming: User and Group IDs
54 pages
Understanding Access Matrix in OS
No ratings yet
Understanding Access Matrix in OS
7 pages
Object-Oriented Database Overview
No ratings yet
Object-Oriented Database Overview
13 pages
Event Logging with NoSQL Document Stores
No ratings yet
Event Logging with NoSQL Document Stores
2 pages
VTU BCS602 ML Detailed Notes
No ratings yet
VTU BCS602 ML Detailed Notes
4 pages
Hadoop: The Definitive Guide Overview
100% (1)
Hadoop: The Definitive Guide Overview
57 pages
Business Drivers for Big Data Adoption
No ratings yet
Business Drivers for Big Data Adoption
45 pages
Operating Systems: Key Concepts & Q&A
100% (11)
Operating Systems: Key Concepts & Q&A
15 pages
Functional Dependency Rules in DBMS
No ratings yet
Functional Dependency Rules in DBMS
39 pages
Database Concurrency Control Techniques
No ratings yet
Database Concurrency Control Techniques
46 pages
Coping with Change in Software Engineering
No ratings yet
Coping with Change in Software Engineering
19 pages
HDD Scheduling in Operating Systems
No ratings yet
HDD Scheduling in Operating Systems
22 pages
Overview of Hadoop Ecosystem Components
No ratings yet
Overview of Hadoop Ecosystem Components
126 pages
DBMS Transactions and Concurrency Control
No ratings yet
DBMS Transactions and Concurrency Control
13 pages
Multimedia Applications and Tools Overview
No ratings yet
Multimedia Applications and Tools Overview
75 pages
MongoDB Operations and CAP Theorem
No ratings yet
MongoDB Operations and CAP Theorem
34 pages
Cloud Computing Module-5
No ratings yet
Cloud Computing Module-5
5 pages
CPU Components and Functionality Overview
100% (1)
CPU Components and Functionality Overview
58 pages
Key Characteristics of DBMS
No ratings yet
Key Characteristics of DBMS
2 pages
PHP Conditional and Numeric Functions
No ratings yet
PHP Conditional and Numeric Functions
14 pages
MongoDB E-Commerce Case Study
100% (8)
MongoDB E-Commerce Case Study
32 pages
Understanding ER Diagrams and Symbols
No ratings yet
Understanding ER Diagrams and Symbols
6 pages
Overview of MapReduce Applications
No ratings yet
Overview of MapReduce Applications
11 pages
RDBMS Architecture and ER Diagrams
No ratings yet
RDBMS Architecture and ER Diagrams
40 pages
Pragmatic Software Metrics Overview
No ratings yet
Pragmatic Software Metrics Overview
10 pages
Requirements Modeling in Software Engineering
No ratings yet
Requirements Modeling in Software Engineering
39 pages
Understanding Data Flow Diagrams
No ratings yet
Understanding Data Flow Diagrams
14 pages
Concurrency Control in Distributed Databases
100% (1)
Concurrency Control in Distributed Databases
12 pages
SQL Command for Inserting Required Values
No ratings yet
SQL Command for Inserting Required Values
38 pages
Access Matrix Implementation in OS
No ratings yet
Access Matrix Implementation in OS
14 pages
Virtualization in Cloud Computing Explained
No ratings yet
Virtualization in Cloud Computing Explained
48 pages
Data Warehouse Design Overview
0% (1)
Data Warehouse Design Overview
20 pages
Frequent Itemsets and Clustering Guide
No ratings yet
Frequent Itemsets and Clustering Guide
54 pages
Cloud Service Pricing Models Explained
No ratings yet
Cloud Service Pricing Models Explained
29 pages
Introduction to Operating Systems Basics
No ratings yet
Introduction to Operating Systems Basics
19 pages
DBMS Module 1 Overview
100% (1)
DBMS Module 1 Overview
100 pages
Understanding Redundancy in DBMS
No ratings yet
Understanding Redundancy in DBMS
24 pages
Understanding Virtual Clusters in Cloud Computing
No ratings yet
Understanding Virtual Clusters in Cloud Computing
6 pages
Infix to Postfix Conversion in C
0% (1)
Infix to Postfix Conversion in C
2 pages
MapReduce Concepts and Techniques Explained
No ratings yet
MapReduce Concepts and Techniques Explained
15 pages
Understanding MapReduce for Big Data
No ratings yet
Understanding MapReduce for Big Data
5 pages
Map-Reduce in Aggregate Databases
No ratings yet
Map-Reduce in Aggregate Databases
18 pages
MapReduce Programming Workflow Explained
No ratings yet
MapReduce Programming Workflow Explained
14 pages
Telemedicine & Remote Patient Monitoring
No ratings yet
Telemedicine & Remote Patient Monitoring
32 pages
Overview of Confidential Computing Research
No ratings yet
Overview of Confidential Computing Research
22 pages
Confidential Computing Seminar Report
No ratings yet
Confidential Computing Seminar Report
16 pages
Big Data and Analytics Exam Guide
No ratings yet
Big Data and Analytics Exam Guide
12 pages
NoSQL Database Features and Queries
No ratings yet
NoSQL Database Features and Queries
8 pages
Understanding Outliers and Variance
No ratings yet
Understanding Outliers and Variance
6 pages
Computer Graphics Lab Manual for CSE
No ratings yet
Computer Graphics Lab Manual for CSE
80 pages
Machine Learning Internship Tasks Guide
No ratings yet
Machine Learning Internship Tasks Guide
14 pages
Understanding Research Methods and Process
No ratings yet
Understanding Research Methods and Process
6 pages
21RMI56 Notes
No ratings yet
21RMI56 Notes
85 pages
21RMI56 Notes
No ratings yet
21RMI56 Notes
126 pages
DBMS Lab Manual for CSE Students
No ratings yet
DBMS Lab Manual for CSE Students
57 pages
C# Programming Lab Manual V1
No ratings yet
C# Programming Lab Manual V1
34 pages
Bimbo IT Strategic Planning Overview
100% (1)
Bimbo IT Strategic Planning Overview
28 pages
Class XII Computer Application Question Bank
No ratings yet
Class XII Computer Application Question Bank
3 pages
1.7.1. Ethics and Computing Professional
No ratings yet
1.7.1. Ethics and Computing Professional
18 pages
QNA Search Criteria Overview Guide
No ratings yet
QNA Search Criteria Overview Guide
6 pages
MIS Design and Development Phases
No ratings yet
MIS Design and Development Phases
59 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
20 pages
Database Development for Online Bookstore
No ratings yet
Database Development for Online Bookstore
2 pages
Fundamentals of Data Management
No ratings yet
Fundamentals of Data Management
3 pages
Ionos Nms p5 Metering User Guide 253153970-05
No ratings yet
Ionos Nms p5 Metering User Guide 253153970-05
236 pages
AssistMyTeam Helpdesk Manual for Outlook
No ratings yet
AssistMyTeam Helpdesk Manual for Outlook
97 pages
Hive: A Map-Reduce Warehouse Solution
No ratings yet
Hive: A Map-Reduce Warehouse Solution
26 pages
Library Science Certificate Course Details
No ratings yet
Library Science Certificate Course Details
32 pages
Advanced Java Concepts for BCA 6th Sem
No ratings yet
Advanced Java Concepts for BCA 6th Sem
10 pages
CCC Series5 Reliant Controller MS83
No ratings yet
CCC Series5 Reliant Controller MS83
4 pages
Classification of Data Mining Techniques
No ratings yet
Classification of Data Mining Techniques
55 pages
CPS Test System User Guide 2008-2009
No ratings yet
CPS Test System User Guide 2008-2009
20 pages
Elmasri 6e - ISM 01
No ratings yet
Elmasri 6e - ISM 01
3 pages
Comprehensive Report Template Overview
No ratings yet
Comprehensive Report Template Overview
24 pages
Understanding Data and Database Systems
No ratings yet
Understanding Data and Database Systems
7 pages
Senior Python Developer with AI/ML Expertise
No ratings yet
Senior Python Developer with AI/ML Expertise
8 pages
Experienced UX Designer Portfolio
No ratings yet
Experienced UX Designer Portfolio
2 pages
Knowledge - Base - Articles - 2026 02 25 16 43 13
No ratings yet
Knowledge - Base - Articles - 2026 02 25 16 43 13
280 pages
DBMS Lab Experiments and Outcomes
No ratings yet
DBMS Lab Experiments and Outcomes
2 pages
Cognizant Software Bootcamp
No ratings yet
Cognizant Software Bootcamp
36 pages
RMAN Archivelog Deletion Issue
No ratings yet
RMAN Archivelog Deletion Issue
2 pages
CSS Computer Science Syllabus 2025-26
No ratings yet
CSS Computer Science Syllabus 2025-26
4 pages
AWS RAG Architectures for Generative AI
No ratings yet
AWS RAG Architectures for Generative AI
34 pages
Digital Sovereignty Control Framework For Military AI-based Cyber Security
No ratings yet
Digital Sovereignty Control Framework For Military AI-based Cyber Security
10 pages

MapReduce Concepts in NoSQL Databases

Uploaded by

MapReduce Concepts in NoSQL Databases

Uploaded by

NOSQL Database 21CS745 Question Bank & Answers

Question Bank with Answers

1 Explain with a neat diagram, the partitioning and combining in MapReduce

 To increase parallelism and minimize bottlenecks, we partition the output of the

Data Transfer Reduction with Combining:

 A significant issue in map-reduce jobs is the amount of data being transferred

 A combiner function is essentially a mini-reduce function. In many cases, the

Combining Across Mappers:

 Some map-reduce frameworks require all reducers to be combinable, which

2 Explain two stages Map reduce example, with neat diagram

Stage 1: Aggregate Monthly Sales

Example: For each sales record, the mapper might output:

 Key: [Link] puerh

 {year: 2011, month: 12, product: puerh, quantity: 1200}.

Stage 2: Year-on-Year Comparison

 {product: puerh, month: 12, current_quantity: 1200, prior_quantity: 1000, increase:

Benefits of the Two-Stage Approach

 Reusability: The intermediate data can be stored, reused, or analyzed separately.

Reusable Intermediate Outputs: Intermediate results from MapReduce can be stored as

Optimizing Query Patterns: Build materialized views based on actual queries, as

Cluster-Friendly: MapReduce is well-suited for handling large volumes of data across

3 Explain basic map reduce, with neat diagram

Core Components of MapReduce

4. Framework Coordination: The MapReduce framework automatically manages data

1. Non-Composable Calculations (e.g., Averages):

o Calculating averages illustrates a limitation in MapReduce because averages

o Counts are straightforward in MapReduce. Each map task emits a count of 1

o Get: Retrieve the value associated with a key.

o Put: Insert or update a value for a key.

o Delete: Remove a key and its associated value.

o The value in a key-value store is an opaque blob (binary large object),

o Key-value stores operate solely on primary keys, allowing efficient, direct

Popular Key-Value Databases:

 Riak: Uses a "bucket" structure for segmenting keys, aiding organization.

 Memcached, Berkeley DB, HamsterDB, Amazon DynamoDB, Project

Advanced Features in Key-Value Databases:

Bucket Organization in Key-Value Stores:

Example of Redis Use:

6 What are the key value features. Explain in detail

 Key-value stores are typically optimized for high performance, particularly in

7 Explain with suitable use cases of key value stores

1. Storing Session Information:

 Use Case: Each web session is assigned a unique sessionid.

2. User Profiles and Preferences:

 Use Case: User-specific settings such as language, timezone, or access permissions.

3. Shopping Cart Data:

When Not to Use Key-Value Stores

 Challenge: Complex relationships or associations between data items are difficult to

 Alternative: Consider a relational database or a graph database where relationships

----------------------------------------END OF MODULE 3----------------------------------------------

Common questions

What are the main features of key-value stores that distinguish them from traditional relational databases?

What are the main features of key-value stores that distinguish them from traditional relational databases?

How do MapReduce frameworks handle non-combinable reducers, and what benefits does the pipes-and-filters model provide in such cases?

How do MapReduce frameworks handle non-combinable reducers, and what benefits does the pipes-and-filters model provide in such cases?

In what scenarios might key-value stores be unsuitable, and what alternatives exist?

In what scenarios might key-value stores be unsuitable, and what alternatives exist?

What is the role and importance of key design in key-value stores, and what challenges does it present?

What is the role and importance of key design in key-value stores, and what challenges does it present?

How does partitioning enhance parallelism in a MapReduce task, and what is the role of key-value pairs in this process?

How does partitioning enhance parallelism in a MapReduce task, and what is the role of key-value pairs in this process?

What is a combiner function in MapReduce, and how does it help reduce data transfer between map and reduce phases?

What is a combiner function in MapReduce, and how does it help reduce data transfer between map and reduce phases?

How does the MapReduce framework coordinate between map and reduce phases, and what benefits does this coordination provide?

How does the MapReduce framework coordinate between map and reduce phases, and what benefits does this coordination provide?

Why can some reduce functions not be used as combiners in MapReduce, and what alternative approach is taken in such cases?

Why can some reduce functions not be used as combiners in MapReduce, and what alternative approach is taken in such cases?

Describe the typical use cases for key-value stores and the advantages they offer for each scenario.

Describe the typical use cases for key-value stores and the advantages they offer for each scenario.

Explain the two stages in a MapReduce job aimed at summarizing and comparing sales data and their benefits.

Explain the two stages in a MapReduce job aimed at summarizing and comparing sales data and their benefits.

You might also like