0% found this document useful (0 votes)

7 views41 pages

Big Data Analytics and Hadoop Ecosystem

The document outlines the Big Data Analytics Ecosystem, detailing components such as distributed servers, storage, programming models, and processing operations. It emphasizes the Hadoop ecosystem, including its core components like HDFS, YARN, and MapReduce, and explains the MapReduce programming model for processing large datasets. Additionally, it discusses when to use Hadoop and introduces Apache Pig as a higher-level data processing language to simplify Hadoop's complexity.

Uploaded by

MOHIT SOLANKI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views41 pages

Big Data Analytics and Hadoop Ecosystem

Uploaded by

MOHIT SOLANKI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Big Data Analytics Ecosystem

Dr. Ashish Kumar

1
Big Data Analytics Ecosystem
• Where processing is hosted?
– Distributed Servers / Cloud (e.g. Amazon EC2)

• Where data is stored?

– Distributed Storage (e.g. Amazon S3)

• What is the programming model?

– Distributed Processing (e.g. MapReduce)

• How data is stored & indexed?

– High-performance schema-free databases (e.g. MongoDB)

• What operations are performed on data?

– Analytic / Semantic Processing
2
Big Data Analytics Ecosystem
 These is not any standard combination for the above question.
 It varies from project to project.
 Whereas go check this website for Big data tools and create
your own ecosystem depends upon your project:
[Link]

3
Hadoop Distributions / Ecosystem
It's a distributed framework for running
applications on large clusters of commodity
hardware which produces huge data and to
process it.
Apache Software Foundation Project
Java-based implementation of MapReduce
Open source
“Core” Hadoop Components :
 Hadoop Common (formerly Hadoop Core)
 Hadoop MapReduce
 Hadoop YARN (MapReduce 2.0)
 Hadoop Distributed File System (HDFS)

Concept : Moving computation is more

4
Why Hadoop?
Need to process Multi Petabyte Datasets
Expensive to build reliability in each
application.
Nodes fail every day
– Failure is expected, rather than exceptional.
– The number of nodes in a cluster is not
constant.
Need common infrastructure
– Efficient, reliable, Open Source Apache License
The above goals are same as Condor, but
Workloads are IO bound and not CPU bound

5
The Hadoop Ecosystem

6
The Hadoop Ecosystem - New

7
The Hadoop Ecosystem
Following are the components that collectively form
a Hadoop ecosystem:
 HDFS: Hadoop Distributed File System
 YARN: Yet Another Resource Negotiator
 MapReduce: Programming based Data Processing
 Spark: In-Memory data processing
 PIG, HIVE: Query based processing of data services
 HBase: NoSQL Database
 Mahout, Spark MLLib: Machine Learning algorithm
libraries
 Solar, Lucene: Searching and Indexing
 Zookeeper: Managing cluster
 Oozie: Job Scheduling
8
Typical Hadoop Cluster
Aggregation
switch

Rack
switch

 Typically in 2 level architecture

 40 nodes/rack, 1000-4000 nodes in cluster
 1 Gbps bandwidth within rack, 8 Gbps out of
rack
 Node specs (Yahoo terasort): 8 x 2GHz cores, 8
GB RAM, 4 disks (= 4 TB?)
9
Typical Hadoop Cluster

10
Hadoop Distributed File System
Files split into 128MB blocks
Blocks replicated across several datanodes
(usually 3)
Single namenode stores metadata (file names,
block locations, etc)
Optimized for large files, sequential reads
Files are append-only

HDFS

11
MapReduce
MapReduce is a software framework and programming model
used for processing huge amounts of data.
MapReduce program work in two phases:
 Map- tasks deal with splitting and mapping of data.
Reduce- tasks shuffle and reduce the data.
MapReduce programs are parallel in nature. Very useful for
performing large-scale data analysis using multiple machines in
the cluster.
MapReduce is a programming model or pattern within the
Hadoop framework that is used to access big data stored in the
Hadoop File System (HDFS).
It is a core component, integral to the functioning of the Hadoop
framework.
12
MapReduce
Goal: count the number of books in the
library.

Map:
You count up shelf #1, I count up shelf #2.
(The more people we get, the faster this
part goes)

Reduce:
We all get together and add up our
individual counts.
13
MapReduce In A Nutshell

14
A Word Count Example of MapReduce
Let us understand, how a MapReduce works
by taking an example where I have a text file
called [Link] whose contents are as
follows:

Deer, Bear, River, Car, Car, River, Deer, Car

and Bear

Now, suppose, we must perform a word count

on the [Link] using MapReduce. So, we
will be finding the unique words and the
number of occurrences of those unique 15
A Word Count Example of MapReduce

16
Another Example

17
MapReduce Jobs
Input Splits: An input to a MapReduce job is
divided into fixed-size pieces called input
splits Input split is a chunk of the input that is
consumed by a single map
Mapping: This is the very first phase in the
execution of map-reduce program. In this phase
data in each split is passed to a mapping function
to produce output values.
Shuffling: This phase consumes the output of
Mapping phase. Its task is to consolidate the
relevant records from Mapping phase output.
Reducing: In this phase, output values from the
Shuffling phase are aggregated. This phase
combines values from Shuffling phase and18
MapReduce Tasks:
Hadoop divides these jobs into tasks.

1. Map tasks (Splits & Mapping)

2. Reduce tasks (Shuffling, Reducing)

19
The MapReduce Paradigm
Platform for reliable, scalable parallel
computing (Programing)
Abstracts issues of distributed and parallel
environment from programmer.
Runs over distributed file systems
 Google File System
 Hadoop Distributed File System (HDFS)

20
MapReduce Programming Model
Inspired from map and reduce operations
commonly used in functional programming
languages like Lisp.
Input: a set of key/value pairs
User supplies two functions:
map(k,v)  list(k1,v1)
reduce(k1, list(v1))  v2
(k1,v1) is an intermediate key/value pair
Output is the set of (k1,v2) pairs

21
MapReduce: The Map Step

22
MapReduce: The Reduce Step

23
Pseudo-code
map(String input_key, String input_value):
// input_key: document name
// input_value: document contents
for each word w in input_value:
EmitIntermediate(w, "1");
// Group by step done by system on key of intermediate Emit
above, and // reduce called on list of values in each group.
reduce(String output_key, Iterator
intermediate_values):
// output_key: a word
// output_values: a list of counts
int result = 0;
for each v in intermediate_values:
result += ParseInt(v);
Emit(AsString(result));

24
MapReduce: Execution Overview

25
MapReduce Characteristics
 Very large scale data: peta, exa bytes
 Write once and read many data: allows for parallelism
without mutexes
 Map and Reduce are the main operations: simple code
 There are other supporting operations such as combine and
partition (out of the scope of this talk).
 All the map should be completed before reduce operation
starts.
 Map and reduce operations are typically performed by the
same physical processor.
 Number of map tasks and reduce tasks are configurable.
 Operations are provisioned near the data.
 Commodity hardware and storage.
 Runtime takes care of splitting and moving data for
operations.
 Special distributed file system. Example: Hadoop26
Classes of problems “MapReducable”
Benchmark for comparing: Jim Gray’s challenge
on data-intensive computing. Ex: “Sort”
Google uses it (we think) for wordcount,
adwords, pagerank, indexing data.
Simple algorithms such as grep, text-indexing,
reverse indexing
Bayesian classification: data mining domain
Facebook uses it for various operations:
demographics
Financial services use it for analytics
Astronomy: Gaussian analysis for locating extra-
terrestrial objects.
Expected to play a critical role in semantic web
27
Hadoop Cluster Architecture

28
MapReduce vs YARN

29
When to use Hadoop?
Generally, always when “standard tools”
don’t work anymore because of sheer data
size
(rule of thumb: if your data fits on a
regular hard drive, your better off sticking
to Python/SQL/Bash/etc.!)

Aggregation across large data sets: use

the power of Reducers!

Large-scale ETL operations (extract,

transform, load) 30
What is Apache Pig?
Making Hadoop Easy.
Pig Latin, a high level data processing language.
An engine that executes Pig Latin locally or on a
Hadoop cluster.
Pig is Hadoop Subproject.
Apache Incubator: October’07-October’08
Higher level languages:
 Increase programmer productivity
 Decrease duplication of effort
 Open the system to more users
Pig insulates you against hadoop complexity
 Hadoop version upgrades
 JobConf configuration tuning
 Job chains
31
An Example Problem
• Data
– User records
– Pages served
• Question: the 5
pages most
visited by users
aged 18 - 25.

32
In Map Reduce
import
import
[Link];
[Link]; }
[Link]("OK"); [Link]([Link]);
[Link]([Link]);
import [Link]; [Link]([Link]);
import [Link]; // Do the cross product and collect the values [Link](lp, new
for (String s1 : first) { Path("/user/gates/pages"));
import [Link]; for (String s2 : second) { [Link](lp,
import [Link]; String outval = key + "," + s1 + "," + s2; new Path("/user/gates/tmp/indexed_pages"));
import [Link]; [Link](null, new Text(outval)); [Link](0);
import [Link]; [Link]("OK"); Job loadPages = new Job(lp);
import [Link]; }
import [Link]; } JobConf lfu = new JobConf([Link]);
import [Link]; } lfu.s
e tJobName("Load and Filter Users");
import [Link]; } [Link]([Link]);
import [Link]; public static class LoadJoined extends MapReduceBase [Link]([Link]);
import org.a
p [Link]; implements Mapper<Text, Text, Text, LongWritable> { [Link]([Link]);
import [Link]; [Link]([Link]);
import [Link]; public void map( [Link]
InputPath(lfu, new
import [Link]; Text k, Path("/user/gates/users"));
import [Link]; Text val, [Link](lfu,
import [Link]; OutputColle
ctor<Text, LongWritable> oc, new Path("/user/gates/tmp/filtered_users"));
import [Link]; Reporter reporter) throws IOException { [Link](0);
import [Link]; // Find the url Job loadUsers = new Job(lfu);
import [Link]; String line = [Link]();
import [Link]; int firstComma = [Link](','); JobConf join = new JobConf(
[Link]);
import [Link]
ontrol; int secondComma = [Link](',', first
Comma); [Link]("Join Users and Pages");
import [Link]; String key = [Link](firstComma, secondComma); [Link]([Link]);
// drop the rest of the record, I don't need it anymore, [Link]([Link]);
public class MRExample { // just pass a 1 for the combiner/reducer to sum instead. [Link]([Link]);
public static class LoadPages extends MapReduceBase Text outKey = new Text(key); [Link](IdentityMap
[Link]);
implements Mapper<LongWritable, Text, Text, Text> { [Link](outKey, new LongWritable(1L)); [Link]([Link]);
} [Link](join, new
public void map(LongWritable k, Text val, } Path("/user/gates/tmp/indexed_pages"));
OutputCollector<Text, Text> oc, public static class ReduceUrls extends MapReduceBase [Link](join, new
Reporter reporter) throws IOException { implements Reducer<Text, LongWritable, WritableComparable, Path("/user/gates/tmp/filtered_users"));
// Pull the key out Writable> { [Link]
tOutputPath(join, new
String line = [Link](); Path("/user/gates/tmp/joined"));
int firstComma = [Link](','); public void reduce( [Link](50);
String key = [Link]
string(0, firstComma); Text ke
y, Job joinJob = new Job(join);
String value = [Link](firstComma + 1); Iterator<LongWritable> iter, [Link](loadPages);
Text outKey = new Text(key); OutputCollector<WritableComparable, Writable> oc, [Link](loadUsers);
// Prepend an index to the value so we know which file Reporter reporter) throws IOException {
// it came from. // Add up all the values we see JobConf group = new JobConf(MRE
[Link]);
Text outVal = new Text("1
" + value); [Link]("Group URLs");
[Link](outKey, outVal); long sum = 0; [Link]([Link]);
} wh
ile ([Link]()) { [Link]([Link]);
} sum += [Link]().get(); [Link]([Link]);
public static class LoadAndFilterUsers extends MapReduceBase [Link]("OK"); [Link](SequenceFi
[Link]);
implements Mapper<LongWritable, Text, Text, Text> { } [Link]([Link]);
[Link]([Link]);
public void map(LongWritable k, Text val, [Link](key, new LongWritable(sum)); [Link]([Link]);
OutputCollector<Text, Text> oc, } [Link](group, new
Reporter reporter) throws IOException { } Path("/user/gates/tmp/joined"));
// Pull the key out public static class LoadClicks extends MapReduceBase [Link](group, new
String line = [Link](); i
mplements Mapper<WritableComparable, Writable, LongWritable, Path("/user/gates/tmp/grouped"));
int firstComma = [Link](','); Text> { [Link](50);
String value = [Link](
firstComma + 1); Job groupJob = new Job(group);
int age = [Link](value); public void map( [Link](joinJob);
if (age < 18 || age > 25) return; WritableComparable key,
String key = [Link](0, firstComma); Writable val, JobConf top100 = new JobConf([Link]);
Text outKey = new Text(key); OutputCollector<LongWritable, Text> oc, [Link]("Top 100 sites");
// Prepend an index to the value esoknow
w which file Reporter reporter)
throws IOException { [Link]([Link]);
// it came from. [Link]((LongWritable)val, (Text)key); [Link]([Link]);
Text outVal = new Text("2" + value); } [Link]([Link]);
[Link](outKey, outVal); } [Link](SequenceFileOutputF
[Link]);
} public static class LimitClicks extends MapReduceBase [Link]([Link]);
} implements Reducer<LongWritable, Text, LongWritable, Text> { [Link]([Link]);
public static class Join extends MapReduceBase [Link]([Link]);
implements Reducer<Text, Text, Text, Text> { int count = 0; [Link](top100, new
publicvoid reduce( Path("/user/gates/tmp/grouped"));
public void reduce(Text key, LongWritable key, [Link](top100, new
Iterator<Text> iter, Iterator<Text> iter, Path("/user/gates/top100sitesforusers18to25"));
OutputCollector<Text, Text> oc, OutputCollector<LongWritable, Text> oc, [Link](1);
Reporter reporter) throws IOException { Reporter reporter) throws IOException { Job limit = new Job(top100);
// For each value, figure out which file it's from and [Link](groupJob);
store it // Only output the first 100 records
// accordingly. while (count
< 100 && [Link]()) { JobControl jc = new JobControl("Find 100
top sites for users
List<String> first = new ArrayList<String>(); [Link](key, [Link]()); 18 to 25");
List<String> second = new ArrayList<String>(); count++; [Link](loadPages);
} [Link](loadUsers);
while ([Link]()) { } [Link](joinJob);
Text t = [Link](); } [Link](groupJob);
String value = [Link]
String(); public static void main(String[] args) throws IOException { [Link](limit);
if ([Link](0) == '1') JobConf lp = new JobConf([Link]); [Link]();
[Link]([Link](1)); [Link]
t JobName("Load Pages"); }
else [Link]([Link](1)); [Link]([Link]); }

33
In Pig Latin
Users = load ‘users’ as (name, age);
Fltrd = filter Users by
age >= 18 and age <= 25;
Pages = load ‘pages’ as (user, url);
Jnd = join Fltrd by name, Pages by user;
Grpd = group Jnd by url;
Smmd = foreach Grpd generate group,
COUNT(Jnd) as clicks;
Srtd = order Smmd by clicks desc;
Top5 = limit Srtd 5;
store Top5 into ‘top5sites’;

34
How it Works
Pig Latin script is translated to a set of
operators which are placed in one or more
MR jobs and executed.
Filter $1 > 0
A =
load ‘myfile’;
B =
filter A by $1 > 0;
C =
group B by $0;
D =
foreach C generate Combiner
group, COUNT(B) as cnt; COUNT(B)
E = filter D by cnt > 5;
dump E;
Reducer
SUM(COUNT(B))
Filter cnt > 5

35
Ease of Translation
Notice how naturally the components of
the job translate into Pig Latin.
Load Load
Users Pages
Users = load …
Filter by
age
Fltrd = filter …
Pages = load …
Join on Jnd = join …
name
Group on
Grpd = group …
url Smmd = … COUNT()…
Count Srtd = order …
clicks Top100 = limit …
Order by
clicks
Take top 5 36
Apache Pig Architecture
[Link]: When a Pig Latin script is sent to
Hadoop Pig, it is first handled by the parser.
The parser is responsible for checking the
syntax of the script, along with other
miscellaneous checks. Parser gives an
output in the form of a Directed Acyclic
Graph (DAG) that contains Pig Latin
statements, together with other logical
operators represented as nodes.
[Link]: After the output from the
parser is retrieved, a logical plan for DAG is
passed to a logical optimizer. The optimizer
is responsible for carrying out the logical
optimizations.
[Link]: The role of the compiler comes
in when the output from the optimizer is
received. The compiler compiles the logical
plan sent by the optimizing The logical plan
is then converted into a series of
MapReduce tasks or jobs.
[Link] Engine: After the logical plan is
37
converted to MapReduce jobs, these jobs
Pig Vs Map Reduce
Faster development time
Data flow versus programming logic
Many standard data operations (e.g. join)
included
Manages all the details of connecting jobs
and data flow
Copes with Hadoop version change issues

38
Pig Commands
load Read data from file system.
store Write data to file system.
foreach Apply expression to each record and output one or more
records.
filter Apply predicate and remove records that do not return true.
group/cogroup Collect records with the same key from one or more inputs.
join Join two or more inputs based on a key. Various join
algorithms available.
order Sort records based on a key.
distinct Remove duplicate records.
union Merge two data sets.
split Split data into 2 or more sets, based on filter conditions.
stream Send all records through a user provided executable.
sample Read a random sample of the data.
limit Limit the number of records.
39
Users Extending Pig: PigPy
Created by Mashall Weir at Zattoo
Uses Python to create Pig Latin scripts on
the fly
Enables looping
Branching based on job results
Submits Pig jobs from Python scripts
Cache intermediate calculations
Avoid variable name collisions in large
scripts

40
Thank You

Vietnam Real Estate Big Data Insights
No ratings yet
Vietnam Real Estate Big Data Insights
44 pages
Overview of Hadoop Ecosystem Components
No ratings yet
Overview of Hadoop Ecosystem Components
41 pages
Understanding Hadoop Ecosystem Basics
No ratings yet
Understanding Hadoop Ecosystem Basics
38 pages
MapReduce and Hadoop Overview Guide
No ratings yet
MapReduce and Hadoop Overview Guide
69 pages
MapReduce in Cloud Computing
No ratings yet
MapReduce in Cloud Computing
138 pages
Understanding Hadoop and MapReduce
No ratings yet
Understanding Hadoop and MapReduce
31 pages
LM - Unit-3 Sec
No ratings yet
LM - Unit-3 Sec
12 pages
Big Data 4.1
No ratings yet
Big Data 4.1
39 pages
Understanding Hadoop and MapReduce Basics
No ratings yet
Understanding Hadoop and MapReduce Basics
31 pages
Introduction to Hadoop and Its Ecosystem
No ratings yet
Introduction to Hadoop and Its Ecosystem
43 pages
Introduction to Hadoop Architecture
No ratings yet
Introduction to Hadoop Architecture
12 pages
Understanding the Hadoop Ecosystem
No ratings yet
Understanding the Hadoop Ecosystem
30 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
CH 2. HADOOP-Mapreduce.
No ratings yet
CH 2. HADOOP-Mapreduce.
102 pages
Introduction to Hadoop and Big Data
No ratings yet
Introduction to Hadoop and Big Data
20 pages
Understanding MapReduce in Hadoop
No ratings yet
Understanding MapReduce in Hadoop
23 pages
CC - Unit-5 - Map Reduce
No ratings yet
CC - Unit-5 - Map Reduce
20 pages
Big Data and Hadoop Overview
No ratings yet
Big Data and Hadoop Overview
32 pages
Hadoop Ecosystem: MapReduce Overview
No ratings yet
Hadoop Ecosystem: MapReduce Overview
83 pages
BDA3rdunit Final
No ratings yet
BDA3rdunit Final
41 pages
Introduction to Hadoop Architecture
No ratings yet
Introduction to Hadoop Architecture
101 pages
Data Analytics Frameworks & Visualization
No ratings yet
Data Analytics Frameworks & Visualization
82 pages
MapReduce: Key-Value Processing Explained
No ratings yet
MapReduce: Key-Value Processing Explained
44 pages
Da Unit 5
No ratings yet
Da Unit 5
52 pages
Overview of Hadoop Framework
No ratings yet
Overview of Hadoop Framework
2 pages
Data Analytics: MapReduce & Visualization
No ratings yet
Data Analytics: MapReduce & Visualization
58 pages
Cloud Computing Technologies Overview
No ratings yet
Cloud Computing Technologies Overview
9 pages
Hadoop Unit 2
No ratings yet
Hadoop Unit 2
89 pages
History and Overview of Hadoop Framework
No ratings yet
History and Overview of Hadoop Framework
13 pages
CAP Theorem in Big Data and NoSQL
No ratings yet
CAP Theorem in Big Data and NoSQL
16 pages
Understanding Apache Hadoop Framework
No ratings yet
Understanding Apache Hadoop Framework
19 pages
Hadoop and Big Data
No ratings yet
Hadoop and Big Data
41 pages
Understanding MapReduce Counters in Hadoop
100% (1)
Understanding MapReduce Counters in Hadoop
31 pages
Understanding the MapReduce Framework
No ratings yet
Understanding the MapReduce Framework
55 pages
NoSQL and Hadoop for Big Data Processing
No ratings yet
NoSQL and Hadoop for Big Data Processing
19 pages
Understanding Hadoop: Big Data Framework
No ratings yet
Understanding Hadoop: Big Data Framework
13 pages
Hadoop
No ratings yet
Hadoop
4 pages
Overview of Hadoop Ecosystem Components
No ratings yet
Overview of Hadoop Ecosystem Components
38 pages
Introduction to Hadoop Ecosystem
No ratings yet
Introduction to Hadoop Ecosystem
50 pages
Big Data Processing with Apache Hadoop
No ratings yet
Big Data Processing with Apache Hadoop
40 pages
Overview of Hadoop Framework and Components
No ratings yet
Overview of Hadoop Framework and Components
24 pages
Unit 5 Complete
No ratings yet
Unit 5 Complete
63 pages
Understanding Hadoop and MapReduce
No ratings yet
Understanding Hadoop and MapReduce
44 pages
Big Data Processing with Hadoop & MapReduce
No ratings yet
Big Data Processing with Hadoop & MapReduce
40 pages
Introduction to Hadoop Framework
No ratings yet
Introduction to Hadoop Framework
55 pages
Understanding Hadoop and MapReduce Basics
No ratings yet
Understanding Hadoop and MapReduce Basics
52 pages
BDS - Chapter 5 - Part 1 - Hadoop
No ratings yet
BDS - Chapter 5 - Part 1 - Hadoop
45 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
11 pages
Week 2 - Hadoop Slides
No ratings yet
Week 2 - Hadoop Slides
44 pages
Overview of the Hadoop Ecosystem
No ratings yet
Overview of the Hadoop Ecosystem
26 pages
Apache Hadoop Overview and MapReduce Guide
No ratings yet
Apache Hadoop Overview and MapReduce Guide
35 pages
Understanding Hadoop Ecosystem Basics
No ratings yet
Understanding Hadoop Ecosystem Basics
87 pages
Cloud Computing with MapReduce & Hadoop
No ratings yet
Cloud Computing with MapReduce & Hadoop
53 pages
Overview of Hadoop Ecosystem
No ratings yet
Overview of Hadoop Ecosystem
7 pages
Overview of the Hadoop Ecosystem
No ratings yet
Overview of the Hadoop Ecosystem
13 pages
3-4. Hadoop & Map Reduce
No ratings yet
3-4. Hadoop & Map Reduce
73 pages
Overview of Hadoop and Big Data Analytics
100% (1)
Overview of Hadoop and Big Data Analytics
25 pages
BigData - Hadoop - MapReduce - Assignment (1)
No ratings yet
BigData - Hadoop - MapReduce - Assignment (1)
11 pages
Overview of Hadoop Modules and Ecosystem
No ratings yet
Overview of Hadoop Modules and Ecosystem
33 pages
Apache Spark: Fast Big Data Processing
No ratings yet
Apache Spark: Fast Big Data Processing
45 pages
Understanding CORBA Architecture in Java
No ratings yet
Understanding CORBA Architecture in Java
3 pages
Microcontroller Integration and Basics
No ratings yet
Microcontroller Integration and Basics
14 pages
RTOS for ARM Microcontrollers
No ratings yet
RTOS for ARM Microcontrollers
19 pages
UAC Virtualisation Error in Sage 300 Ops
No ratings yet
UAC Virtualisation Error in Sage 300 Ops
5 pages
Computer Fundamentals: 20CS11T Guide
67% (3)
Computer Fundamentals: 20CS11T Guide
10 pages
GP/LP Communication Cable Details
No ratings yet
GP/LP Communication Cable Details
6 pages
GameCenter Startup Log Analysis
No ratings yet
GameCenter Startup Log Analysis
9 pages
Site Quality Acceptance Certificate
No ratings yet
Site Quality Acceptance Certificate
43 pages
HPE Aruba's Secure Access Edge Solutions
No ratings yet
HPE Aruba's Secure Access Edge Solutions
25 pages
Ransomware Simulation Script for Windows
No ratings yet
Ransomware Simulation Script for Windows
2 pages
Understanding Citrix Personal vDisk
No ratings yet
Understanding Citrix Personal vDisk
53 pages
Computer Architecture Final Exam Questions
No ratings yet
Computer Architecture Final Exam Questions
3 pages
Aruba 7200 Series Mobility Controllers
No ratings yet
Aruba 7200 Series Mobility Controllers
7 pages
Prelim Lab Quiz 2 Review
No ratings yet
Prelim Lab Quiz 2 Review
18 pages
BMC Remedy AR Systems 8.1.00 Online Documentation PDF
No ratings yet
BMC Remedy AR Systems 8.1.00 Online Documentation PDF
4,492 pages
Understanding Software Engineering Basics
No ratings yet
Understanding Software Engineering Basics
5 pages
Sales Order Z-Report Functional Spec
No ratings yet
Sales Order Z-Report Functional Spec
10 pages
Yaskawa A1000 Instruction
100% (1)
Yaskawa A1000 Instruction
12 pages
An Introduction To Programming For Hackers
No ratings yet
An Introduction To Programming For Hackers
62 pages
2635186395e6858091f3622fd9c1e340
No ratings yet
2635186395e6858091f3622fd9c1e340
257 pages
Cloud Computing Overview for BCA Students
No ratings yet
Cloud Computing Overview for BCA Students
84 pages
Sensors: Learning To Detect Cracks On Damaged Concrete Surfaces Using Two-Branched Convolutional Neural Network
No ratings yet
Sensors: Learning To Detect Cracks On Damaged Concrete Surfaces Using Two-Branched Convolutional Neural Network
18 pages
NCA-5710 BIOS User Manual Guide
No ratings yet
NCA-5710 BIOS User Manual Guide
73 pages
Understanding Flex Fields in Oracle Fusion
No ratings yet
Understanding Flex Fields in Oracle Fusion
3 pages
Chinese CLIP Model Implementation Guide
No ratings yet
Chinese CLIP Model Implementation Guide
8 pages
Book Billing System Project Report
No ratings yet
Book Billing System Project Report
20 pages
Ethernet to RS485 Bridge ASIC Design
No ratings yet
Ethernet to RS485 Bridge ASIC Design
30 pages
ECG-2250: Reliable Diagnosis & Monitoring
No ratings yet
ECG-2250: Reliable Diagnosis & Monitoring
4 pages
Instruction Set Architecture Overview
No ratings yet
Instruction Set Architecture Overview
62 pages

Big Data Analytics and Hadoop Ecosystem

Uploaded by

Big Data Analytics and Hadoop Ecosystem

Uploaded by

Big Data Analytics Ecosystem

Dr. Ashish Kumar

• Where data is stored?

• What is the programming model?

• How data is stored & indexed?

• What operations are performed on data?

Concept : Moving computation is more

 Typically in 2 level architecture

Deer, Bear, River, Car, Car, River, Deer, Car

Now, suppose, we must perform a word count

1. Map tasks (Splits & Mapping)

2. Reduce tasks (Shuffling, Reducing)

Aggregation across large data sets: use

Large-scale ETL operations (extract,

You might also like