0% found this document useful (0 votes)
35 views2 pages

MapReduce Fundamentals and Examples

MapReduce is a programming model used for parallel and distributed processing of large datasets. It consists of two distinct tasks - the map task and the reduce task. The map task processes the data and generates intermediate key-value pairs. The reduce task aggregates the intermediate key-value pairs into a smaller set of key-value pairs that represent the final output. MapReduce was created by Google to solve the issue of bottlenecking that occurs when trying to process large, complex datasets on centralized systems. It divides tasks into smaller parts that are distributed across many computers to be processed in parallel.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views2 pages

MapReduce Fundamentals and Examples

MapReduce is a programming model used for parallel and distributed processing of large datasets. It consists of two distinct tasks - the map task and the reduce task. The map task processes the data and generates intermediate key-value pairs. The reduce task aggregates the intermediate key-value pairs into a smaller set of key-value pairs that represent the final output. MapReduce was created by Google to solve the issue of bottlenecking that occurs when trying to process large, complex datasets on centralized systems. It divides tasks into smaller parts that are distributed across many computers to be processed in parallel.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd

Fundamentals of MapReduce with Example

MapReduce is one of the core building blocks of processing in Hadoop


framework. MapReduce became the genesis of the Hadoop processing model. So,
MapReduce is a programming model that allows us to perform parallel and
distributed processing on huge data sets.

MapReduce consists of two distinct tasks – Map and Reduce. As the name
MapReduce suggests, reducer phase takes place after mapper phase has been
completed. So, the first is the map job, where a block of data is read and processed
to produce key-value pairs as intermediate outputs. The output of a Mapper or map
job (key-value pairs) is input to the Reducer. Then, the reducer aggregates those
intermediate data tuples (intermediate key-value pair) into a smaller set of tuples
or key-value pairs which is the final output.

But why MapReduce came into picture? The answer is pretty simple. Traditional
Enterprise Systems normally have a centralized server to store and process data.
This approach was not suitable to handle the data which has one or more of the
following aspects – velocity, variety, volume and complexity.

Google solved this bottleneck issue using an algorithm called MapReduce.


MapReduce divides a task into small parts and assigns them to many computers.
Later, the results are collected at one place and integrated to form the result
dataset.

The MapReduce algorithm performs the following actions-


Tokenize − Tokenizes the tweets into maps of tokens and writes them as key-value
pairs.
Filter − Filters unwanted words from the maps of tokens and writes the filtered
maps as key-value pairs.
Count − Generates a token counter per word.
Aggregate Counters − Prepares an aggregate of similar counter values into small
manageable units.
MapReduce consists of 2 steps:
• Map Function – It takes a set of data and converts it into another set of data,
where individual elements are broken down into tuples (Key-Value pair).
Example -
Input - Bus, Car, bus, car, train, car, bus, car, train, bus, TRAIN,BUS, buS,
caR, CAR, car, BUS, TRAIN.
Convert into another set of data(Key, Value) - (Bus,1), (Car,1), (bus,1),
(car,1), (train,1), (car,1), (bus,1), (car,1), (train,1), (bus,1), (TRAIN,1),
(BUS,1), (buS,1), (caR,1), (CAR,1), (car,1), (BUS,1), (TRAIN,1).
• Reduce Function – Takes the output from Map as an input and combines
those data tuples into a smaller set of tuples.
Example -
Input – Set of tuples from previous step.
Output – Smaller set of tuples – (BUS,7), (CAR,7), (TRAIN,7)

You might also like