0% found this document useful (0 votes)

44 views18 pages

Hadoop MapReduce Lab Manual Guide

Kkkk

Uploaded by

Vishnu Kanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views18 pages

Hadoop MapReduce Lab Manual Guide

Kkkk

Uploaded by

Vishnu Kanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

DEPARTMENT OF DATA SCIENCE AND ANALYTICS

24BDA6C20-MAP REDUCE PROGRAMMING LAB MANUAL

LIST OF EXPERIMENTS
1. Exercises to implement Stock count Map reduce program
2 Exercises for implementing sorting technique using Mapreduce
3 Exercises to implement file management tasks using Hadoop
4 Exercises for implementing two different map reduce programs using joins
5 Create a student database in MongoDB with the fields: (SRN, Sname, Degree, Sem, CGPA)and use,
Create collection, insert data, find, find one, sort, limit, skip, distinct, projection (CRUD operations)
6 Create an employee database inMongoDB and use update modifiers
7 Create an employee database in MongoDB and use Create collection, insert data, find, find one,
update, upsert, multi (CRUD operations)
8 Exercise for implementing general commands in HBase
9 Exercise for Table Management Commands in HBase
10 Exercises for Data Manipulation Commands in Hbase

REFERENCES:
Boris Lublinsky Kevin T. Smith Alexey Yakubovich ,ProfessionalHadoop® Solutions,
Wiley, ISBN: 9788126551071, 2015.
HADOOP INTRODUCTION:
Hadoop is an open-source framework that allows to store and process big data in a distributed
environment across clusters of computers using simple programming models. It is designed to
scale up from single servers to thousands of machines, each offering localcomputation and
storage.

Hadoop Architecture:

The Apache Hadoop framework includes following four modules:

Hadoop Common: Contains Java libraries and utilities needed by other Hadoop modules.
These libraries give file system and OS level abstraction and comprise of the essential Java
files and scripts that are required to start Hadoop.

Hadoop Distributed File System (HDFS): A distributed file-system that provides high-
throughput access to application data on the community machines thus providing very high
aggregate bandwidth across the cluster.

Hadoop YARN: A resource-management framework responsible for job scheduling and

cluster resource management.

Hadoop MapReduce: This is a YARN- based programming model for parallel processing of
large data sets.

Hadoop Ecosystem:
Hadoop has gained its popularity due to its ability of storing, analyzing and accessing large
amount of data, quickly and cost effectively through clusters of commodity hardware. It won‘t
be wrong if we say that Apache Hadoop is actually a collection of several components and not
just a single product.

With Hadoop Ecosystem there are several commercial along with an open source products which
are broadly used to make Hadoop laymen accessible and more usable.

MapReduce

Hadoop MapReduce is a software framework for easily writing applications which process big
amounts of data in-parallel on large clusters of commodity hardware in a reliable, fault- tolerant
manner. In terms of programming, there are two functions which are most common in
MapReduce.
The Map Task: Master computer or node takes input and convert it into divide it into smaller
parts and distribute it on other worker nodes. All worker nodes solve their own small problem
and give answer to the master node.

The Reduce Task: Master node combines all answers coming from worker node andforms it
in some form of output which is answer of our big distributed problem.

Generally both the input and the output are reserved in a file-system. The framework is
responsible for scheduling tasks, monitoring them and even re-executes the failed tasks.

Hadoop Distributed File System (HDFS)

HDFS is a distributed file-system that provides high throughput access to data. When data is
pushed to HDFS, it automatically splits up into multiple blocks and stores/replicates the data
thus ensuring high availability and fault tolerance.

Note: A file consists of many blocks (large blocks of 64MB and above).

Here are the main components of HDFS:

Name Node: It acts as the master of the system. It maintains the name system
i.e.,directories and files and manages the blocks which are present on the Data Nodes.

Data Nodes: They are the slaves which are deployed on each machine and provide
theactual storage. They are responsible for serving read and write requests for the
clients.

Secondary Name Node: It is responsible for performing periodic checkpoints. In the

event of Name Node failure, you can restart the Name Node using the checkpoint.

Hive
Hive is part of the Hadoop ecosystem and provides an SQL like interface to Hadoop. It is a
data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries,
and the analysis of large datasets stored in Hadoop compatible file systems.

HBase (Hadoop DataBase)

HBase is a distributed, column oriented database and uses HDFS for the underlying storage.
As said earlier, HDFS works on write once and read many times pattern, but this isn‘t a case
always. We may require real time read/write random access for huge dataset; this is where
HBase comes into the picture. HBase is built on top of HDFS and distributed on column-
oriented database.
INSTALLATION – STYLE -1:

i) Perform setting up and Installing Hadoop in its three operating modes:

• Standalone
• Pseudo Distributed
• Fully Distributed

DESCRIPTION:

Hadoop is written in Java, so you will need to have Java installed on your machine, version 6
or later. Sun's JDK is the one most widely used with Hadoop, although others have been
reported to work.

Hadoop runs on Unix and on Windows. Linux is the only supported production platform, but
other flavors of Unix (including Mac OS X) can be used to run Hadoop for development.
Windows is only supported as a development platform, and additionally requires Cygwin to
run. During the Cygwin installation process, you should include the openssh package if you
plan to run Hadoop in pseudo-distributed mode

ALGORITHM

STEPS INVOLVED IN INSTALLING HADOOP IN STANDALONE MODE:-

1. Command for installing ssh is “sudo apt-get install ssh”.

2. Command for key generation is ssh-keygen –t rsa –P “ ”.

3. Store the key into [Link] by using the command cat $HOME/.ssh/id_rsa.pub >>

$HOME/.ssh/authorized_keys

4. Extract the java by using the command tar xvfz [Link].

5. Extract the eclipse by using the command tar xvfz eclipse-jee-mars-R-linux-

[Link]

6. Extract the hadoop by using the command tar xvfz [Link]

7. Move the java to /usr/lib/jvm/ and eclipse to /opt/ paths. Configure the java path in
the [Link] file

8. Export java path and hadoop path in ./bashrc

9. Check the installation successful or not by checking the java version and hadoop
version

10. Check the hadoop instance in standalone mode working correctly or not by using an
implicit hadoop jar file named as word count.

11. If the word count (EXAMPLE PROGRAM) is displayed correctly in part-r-00000 file
it means that standalone mode is installed successfully.

ALGORITHM

STEPS INVOLVED IN INSTALLING HADOOP IN PSEUDO DISTRIBUTED

MODE:-

1. In order install pseudo distributed mode we need to configure the hadoop configuration
files resides in the directory /home/lendi/hadoop-2.7.1/etc/hadoop.

2. First configure the [Link] file by changing the java path.

3. Configure the [Link] which contains a property tag, it contains name and value.
Name as [Link] and value as hdfs://localhost:9000

4. Configure [Link].

5. Configure [Link].

6. Configure [Link] before configure the copy [Link] to mapred-

[Link].

7. Now format the name node by using command hdfs namenode –format.

8. Type the command [Link],[Link] means that starts the daemons like
NameNode,DataNode,SecondaryNameNode ,ResourceManager,NodeManager.
9. Run JPS which views all daemons. Create a directory in the hadoop by using command
hdfs dfs –mkdr /csedir and enter some data into [Link] using command nano [Link] and
copy from local directory to hadoop using command hdfs dfs – copyFromLocal [Link]
/csedir/and run sample jar file wordcount to check whether pseudo distributed mode is
working or not.

10. Display the contents of file by using command hdfs dfs –cat /newdir/part-r-00000.

FULLY DISTRIBUTED MODE INSTALLATION: ALGORITHM

1. Stop all single node clusters

$[Link]

2. Decide one as NameNode (Master) and remaining as DataNodes(Slaves).

3. Copy public key to all three hosts to get a password less SSH access

$ssh-copy-id –I $HOME/.ssh/id_rsa.pub lendi@l5sys24

4. Configure all Configuration files, to name Master and Slave Nodes.

$cd $HADOOP_HOME/etc/hadoop

$nano [Link]

$ nano [Link]

5. Add hostnames to file slaves and save it.

$ nano slaves

6. Configure $ nano [Link]

7. Do in Master Node

$ hdfs namenode –format

$ [Link]

$[Link]

8. Format NameNode
9. Daemons Starting in Master and Slave Nodes

10. END

INPUT

ubuntu @localhost> jps

OUTPUT:

Data node, name nodem Secondary name node, NodeManager, Resource Manager
INSTALLATION STYLE 2:

Installation Steps –

Following are steps to Install Apache Hadoop on Ubuntu 14.04

Step1. Install Java (OpenJDK) - Since hadoop is based on java, make sure you have
java jdkinstalled on the system. Please check the version of java (It should be 1.7 or
$ java –version
above it)
If it returns "The program java can be found in the following packages", If Java
isn't beeninstalled yet, so execute the following command:
$sudo apt-get install default-jdk

Step2: Configure Apache Hadoop

1. Open bashrc in gedit mode

1. Open
$sudo geditbashrc file in geditor
~/.bashrc

2. Set java environment variable

export JAVA_HOME=/usr/jdk1.7.0_45/

3. Set Hadoop environment variable

export HADOOP_HOME=/usr/Hadoop 2.6/

4. Apply environment variables

$source ~/.bashrc

Step3: Install eclipse

Step4: Copy Hadoop plug-ins such as
• [Link]
• [Link]
• [Link] from release folder of hadoop2x-eclipse-plugin-master to
eclipse plugins
Step5: In eclipse, start new MapReduce project

File->new->other->MapReduce project
Step 6: Copy Hadoop packages such as [Link] [Link]
in src file ofMapReduce project
Step 7: Create Mapper, Reducer, and driver
Inside a project->src->File->new->other->Mapper/Reducer/Driver

Step 8: Copy Log file [Link] from src file of hadoop in src file of MapReduce project

Hadoop is powerful because it is extensible and it is easy to integrate with any component. Its
popularity is due in part to its ability to store, analyze and access large amounts of data, quickly
and cost effectively across clusters of commodity hardware.

Apache Hadoop is not actually a single product but instead a collection of several
components. When all these componentsare merged, it makes the Hadoop very user friendly.

Adding Files and Directories to HDFS

Before you can run Hadoop programs on data stored in HDFS, you‘ll need to put the data into
HDFS first. Let‘s create a directory and put a file in it. HDFS has a default working directory of
/user/$USER, where $USER is your login user name. This directory isn‘t automatically created for
you, though, so let‘s create it with the mkdir command. For the purpose of illustration, we use
chuck. You should substitute your user name in the example commands.

hadoop fs -mkdir /user/chuck

hadoop fs -put [Link]

hadoop fs -put [Link] /user/chuck

Retrieving Files from HDFS

The Hadoop command get copies files from HDFS back to the local filesystem. To retrieve
[Link], we can run the following command:

hadoop fs -cat [Link]

Deleting Files from HDFS

hadoop fs -rm [Link]

• Command for creating a directory in hdfs is “hdfs dfs –mkdir /lendicse”.
• Adding directory is done through the command “hdfs dfs –put lendi_english /”.

Copying Data from NFS to HDFS

Copying from directory command is “hdfs dfs –copyFromLocal
/home/lendi/Desktop/shakes/glossary /lendicse/”
• View the file by using the command “hdfs dfs –cat /lendi_english/glossary”
• Command for listing of items in Hadoop is “hdfs dfs –ls hdfs://localhost:9000/”.
• Command for Deleting files is “hdfs dfs –rm r /kartheek”.

EXPECTED OUTPUT:
SAMPLE EXERCISE-

AIM: To Develop a MapReduce program to calculate the frequency of a given word in

agiven file

Map Function – It takes a set of data and converts it into another set of data, where
individual
elements are broken down into tuples (Key-Value pair).

Example – (Map function in Word Count)

Input
Set of data
Bus, Car, bus, car, train, car, bus, car, train, bus, TRAIN,BUS, buS, caR, CAR, car, BUS,
TRAIN

Output
Convert into another set of data
(Key,Value)
(Bus,1), (Car,1), (bus,1), (car,1), (train,1), (car,1), (bus,1), (car,1), (train,1), (bus,1),
(TRAIN,1),(BUS,1), (buS,1), (caR,1), (CAR,1), (car,1), (BUS,1), (TRAIN,1)

Reduce Function – Takes the output from Map as an input and combines those data tuples
into a smaller set of tuples.

Example – (Reduce function in Word Count)

Input Set of Tuples

(output of Map function)
(Bus,1), (Car,1), (bus,1), (car,1), (train,1), (car,1), (bus,1), (car,1), (train,1), (bus,1),
(TRAIN,1),(BUS,1),
(buS,1),(caR,1),(CAR,1), (car,1), (BUS,1), (TRAIN,1)

Output Converts into smaller set of tuples

(BUS,7), (CAR,7), (TRAIN,4)

Work Flow of Program

Workflow of MapReduce consists of 5 steps
1. Splitting – The splitting parameter can be anything, e.g. splitting by space,
comma, semicolon, or even by a new line (‘\n’).
2. Mapping – as explained above
3. Intermediate splitting – the entire process in parallel on different clusters. In order
to group them in “Reduce Phase” the similar KEY data should be on same cluster.
4. Reduce – it is nothing but mostly group by phase
5. Combining – The last phase where all the data (individual result set from each
cluster) is combine together to form a Result.

Now Let’s See the Word Count Program in Java

Make sure that Hadoop is installed on your system with java idk Steps to follow
Step 1. Open Eclipse> File > New > Java Project > (Name it – MRProgramsDemo) >
Finish
Step 2. Right Click > New > Package ( Name it - PackageDemo) > Finish
Step 3. Right Click on Package > New > Class (Name it - WordCount)
Step 4. Add Following Reference Libraries –
Right Click on Project > Build Path> Add External Archivals
• /usr/lib/hadoop-0.20/[Link]
• Usr/lib/hadoop-0.20/lib/[Link]
SOURCE CODE:
package PackageDemo;
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
public class WordCount {
public static void main(String [] args) throws Exception
{
Configuration c=new Configuration();
String[] files=new GenericOptionsParser(c,args).getRemainingArgs();
Path input=new Path(files[0]);
Path output=new Path(files[1]);
Job j=new Job(c,"wordcount");
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
[Link](j, input);
[Link](j, output);
[Link]([Link](true)?0:1);
}
public static class MapForWordCount extends Mapper<LongWritable, Text, Text,
IntWritable>{
public void map(LongWritable key, Text value, Context con) throws IOException,
InterruptedException
{
String line = [Link]();
String[] words=[Link](",");
for(String word: words )
{
Text outputKey = new Text([Link]().trim());
IntWritable outputValue = new IntWritable(1);
[Link](outputKey, outputValue);
}
}
}
public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text,
IntWritable>
{
public void reduce(Text word, Iterable<IntWritable> values, Context con) throws
IOException,
InterruptedException
{
int sum = 0;
for(IntWritable value : values)
{
sum += [Link]();
}
[Link](word, new IntWritable(sum));
}
}
}

Make Jar File

Right Click on Project> Export> Select export destination as Jar File > next> Finish
To Move this into Hadoop directly, open the terminal and enter the following
commands:
[training@localhost ~]$ hadoop fs -put wordcountFile wordCountFile

Run Jar file

(Hadoop jar [Link] [Link] PathToInputTextFile
PathToOutputDirectry)
[training@localhost ~]$ Hadoop jar [Link]
[Link] wordCountFile MRDir1
Result: Open Result
[training@localhost ~]$ hadoop fs -ls MRDir1
Found 3 items
-rw-r--r-- 1 training supergroup
0 2016-02-23 03:36 /user/training/MRDir1/_SUCCESS
drwxr-xr-x - training supergroup
0 2016-02-23 03:36 /user/training/MRDir1/_logs
-rw-r--r-- 1 training supergroup
20 2016-02-23 03:36 /user/training/MRDir1/part-r-00000
[training@localhost ~]$ hadoop fs -cat MRDir1/part-r-00000
BUS 7
CAR 4
TRAIN 6

Install Hadoop: Modes & File Management
No ratings yet
Install Hadoop: Modes & File Management
42 pages
Big Data Analytics With Hadoop
No ratings yet
Big Data Analytics With Hadoop
22 pages
Install Hadoop in Pseudo Distributed Mode
No ratings yet
Install Hadoop in Pseudo Distributed Mode
13 pages
Big Data Lab Experiment 1
No ratings yet
Big Data Lab Experiment 1
8 pages
In-Memory Hadoop Cluster Overview
No ratings yet
In-Memory Hadoop Cluster Overview
40 pages
Big Data Analytics Lab Report 2024-25
No ratings yet
Big Data Analytics Lab Report 2024-25
30 pages
Install and Manage Hadoop Steps
No ratings yet
Install and Manage Hadoop Steps
4 pages
Hadoop Setup and Configuration Guide
No ratings yet
Hadoop Setup and Configuration Guide
17 pages
Hadoop Lab Practical Guide
No ratings yet
Hadoop Lab Practical Guide
33 pages
2 Hadoop
No ratings yet
2 Hadoop
20 pages
Hadoop Ecosystem Overview and Commands
No ratings yet
Hadoop Ecosystem Overview and Commands
47 pages
Hadoop Setup and Data Management Guide
No ratings yet
Hadoop Setup and Data Management Guide
22 pages
BDA EXP 1 - New
No ratings yet
BDA EXP 1 - New
10 pages
Experiment No-01
No ratings yet
Experiment No-01
11 pages
Understanding Hadoop for Big Data
No ratings yet
Understanding Hadoop for Big Data
38 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
8 pages
Hadoop Installation and File Management Guide
No ratings yet
Hadoop Installation and File Management Guide
9 pages
Hadoop and HBase Single Node Setup Guide
No ratings yet
Hadoop and HBase Single Node Setup Guide
5 pages
BD - 7
No ratings yet
BD - 7
13 pages
Hadoop Installation and File Management Guide
No ratings yet
Hadoop Installation and File Management Guide
14 pages
BDS - Chapter 5 - Part 1 - Hadoop
No ratings yet
BDS - Chapter 5 - Part 1 - Hadoop
45 pages
Hadoop Installation Guide: Single & Multi-Node
No ratings yet
Hadoop Installation Guide: Single & Multi-Node
27 pages
Java Data Structures and Hadoop Setup Guide
No ratings yet
Java Data Structures and Hadoop Setup Guide
45 pages
Hadoop Practical Guide: Setup & Programs
No ratings yet
Hadoop Practical Guide: Setup & Programs
28 pages
Introduction to Hadoop and MapReduce
No ratings yet
Introduction to Hadoop and MapReduce
44 pages
Installing and Configuring Hadoop
No ratings yet
Installing and Configuring Hadoop
27 pages
Install and Configure Hadoop Steps
No ratings yet
Install and Configure Hadoop Steps
21 pages
Setting Up a Single Node Hadoop Cluster
No ratings yet
Setting Up a Single Node Hadoop Cluster
20 pages
Hadoop Installation and Setup Guide
No ratings yet
Hadoop Installation and Setup Guide
10 pages
Hadoop Operation Modes Explained
No ratings yet
Hadoop Operation Modes Explained
39 pages
Introduction to Hadoop Framework
No ratings yet
Introduction to Hadoop Framework
50 pages
Overview of Hadoop Modules and Ecosystem
No ratings yet
Overview of Hadoop Modules and Ecosystem
33 pages
Hadoop Installation and Experiments Guide
No ratings yet
Hadoop Installation and Experiments Guide
9 pages
Hadoop Basics: Data Formats & Setup Guide
No ratings yet
Hadoop Basics: Data Formats & Setup Guide
22 pages
Overview of Hadoop Framework in Big Data
No ratings yet
Overview of Hadoop Framework in Big Data
32 pages
Hadoop Architecture and Components Guide
No ratings yet
Hadoop Architecture and Components Guide
88 pages
Hadoop Architecture and Components Guide
No ratings yet
Hadoop Architecture and Components Guide
121 pages
Bda Unit-3
No ratings yet
Bda Unit-3
30 pages
Formatting Hadoop Namenode Guide
No ratings yet
Formatting Hadoop Namenode Guide
27 pages
UNIT-2 Cloud Computing Final PDF
No ratings yet
UNIT-2 Cloud Computing Final PDF
45 pages
Installing Hadoop on Ubuntu
No ratings yet
Installing Hadoop on Ubuntu
16 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
60 pages
Hadoop Unit 2
No ratings yet
Hadoop Unit 2
89 pages
Big Data Concepts and Hadoop Overview
No ratings yet
Big Data Concepts and Hadoop Overview
67 pages
Big Data File FIN
No ratings yet
Big Data File FIN
32 pages
Install Hadoop on Windows 10 Guide
No ratings yet
Install Hadoop on Windows 10 Guide
29 pages
Lab1 BD
No ratings yet
Lab1 BD
11 pages
Notes For Big Data Technologies
No ratings yet
Notes For Big Data Technologies
82 pages
Hadoop Installation Guide for Windows 10
No ratings yet
Hadoop Installation Guide for Windows 10
21 pages
Unstructured Data in Hadoop Analysis
No ratings yet
Unstructured Data in Hadoop Analysis
57 pages
Comprehensive Guide to Hadoop Framework
No ratings yet
Comprehensive Guide to Hadoop Framework
32 pages
Big Data Analytics Lab Manual (CCS334)
No ratings yet
Big Data Analytics Lab Manual (CCS334)
49 pages
Big Data Analytics Hadoop Installation Guide
No ratings yet
Big Data Analytics Hadoop Installation Guide
9 pages
Big Data Processing with Hadoop & Cloud
No ratings yet
Big Data Processing with Hadoop & Cloud
39 pages
Overview of Apache Hadoop Framework
No ratings yet
Overview of Apache Hadoop Framework
47 pages
History and Components of Hadoop
No ratings yet
History and Components of Hadoop
127 pages
History and Components of Hadoop
No ratings yet
History and Components of Hadoop
9 pages
Spatial Domain Image Enhancement Techniques
No ratings yet
Spatial Domain Image Enhancement Techniques
74 pages
Harvard vs Von Neumann Architecture
75% (4)
Harvard vs Von Neumann Architecture
3 pages
Digital Majority Voting System Report
No ratings yet
Digital Majority Voting System Report
13 pages
ACS580-04 Drive Module Installation Guide
No ratings yet
ACS580-04 Drive Module Installation Guide
28 pages
Quality Pre-Owned Equipment Solutions
No ratings yet
Quality Pre-Owned Equipment Solutions
72 pages
HSSC-II Computer Science Model Paper
No ratings yet
HSSC-II Computer Science Model Paper
12 pages
AI Unit 5
No ratings yet
AI Unit 5
14 pages
BTS7960 High-Power Motor Driver Module
No ratings yet
BTS7960 High-Power Motor Driver Module
1 page
AI's Impact on Daily Life Explained
No ratings yet
AI's Impact on Daily Life Explained
4 pages
Unit 1 (Part 2) Software Engineer
No ratings yet
Unit 1 (Part 2) Software Engineer
24 pages
vidaXL Dropshipping Account Setup Guide
No ratings yet
vidaXL Dropshipping Account Setup Guide
6 pages
Human-Robot Collaboration in Manufacturing
No ratings yet
Human-Robot Collaboration in Manufacturing
20 pages
MATLAB/Simulink Basics for Mechatronics Lab
No ratings yet
MATLAB/Simulink Basics for Mechatronics Lab
17 pages
Software Engineering Student Profile
No ratings yet
Software Engineering Student Profile
2 pages
AS Level Computer Science Mock Exam Pack
No ratings yet
AS Level Computer Science Mock Exam Pack
4 pages
C++ Final Exam Questions and Answers
No ratings yet
C++ Final Exam Questions and Answers
19 pages
AVerGC311-圆钢GC311 使用说明书
No ratings yet
AVerGC311-圆钢GC311 使用说明书
20 pages
Arduino 4-Digit 7-Segment Display Lab
No ratings yet
Arduino 4-Digit 7-Segment Display Lab
3 pages
MS Teams Guide for Students
No ratings yet
MS Teams Guide for Students
7 pages
Overview of Health Information Systems
No ratings yet
Overview of Health Information Systems
3 pages
Inverse Functions for Grade 11 Math
No ratings yet
Inverse Functions for Grade 11 Math
18 pages
Overview of Metadata Standards
100% (1)
Overview of Metadata Standards
4 pages
RF-7850M-HH Multiband Handheld Radio
100% (1)
RF-7850M-HH Multiband Handheld Radio
2 pages
Conditional Statements in C Programming
No ratings yet
Conditional Statements in C Programming
12 pages
Kits de Visor e Display para Smartphones
No ratings yet
Kits de Visor e Display para Smartphones
135 pages
There Are Many Ways in Which To Build A Field Catalog
No ratings yet
There Are Many Ways in Which To Build A Field Catalog
2 pages
Understanding Relational Databases and SQLite
No ratings yet
Understanding Relational Databases and SQLite
96 pages
Photoshop Blending Modes Overview
No ratings yet
Photoshop Blending Modes Overview
3 pages
HikCentral Professional OpenAPI Guide
No ratings yet
HikCentral Professional OpenAPI Guide
542 pages
Operating System Tools Overview
No ratings yet
Operating System Tools Overview
14 pages

Hadoop MapReduce Lab Manual Guide

Uploaded by

Hadoop MapReduce Lab Manual Guide

Uploaded by

DEPARTMENT OF DATA SCIENCE AND ANALYTICS

24BDA6C20-MAP REDUCE PROGRAMMING LAB MANUAL

The Apache Hadoop framework includes following four modules:

Hadoop YARN: A resource-management framework responsible for job scheduling and

Hadoop Distributed File System (HDFS)

Here are the main components of HDFS:

Secondary Name Node: It is responsible for performing periodic checkpoints. In the

HBase (Hadoop DataBase)

i) Perform setting up and Installing Hadoop in its three operating modes:

STEPS INVOLVED IN INSTALLING HADOOP IN STANDALONE MODE:-

1. Command for installing ssh is “sudo apt-get install ssh”.

2. Command for key generation is ssh-keygen –t rsa –P “ ”.

4. Extract the java by using the command tar xvfz [Link].

5. Extract the eclipse by using the command tar xvfz eclipse-jee-mars-R-linux-

6. Extract the hadoop by using the command tar xvfz [Link]

8. Export java path and hadoop path in ./bashrc

STEPS INVOLVED IN INSTALLING HADOOP IN PSEUDO DISTRIBUTED

2. First configure the [Link] file by changing the java path.

6. Configure [Link] before configure the copy [Link] to mapred-

FULLY DISTRIBUTED MODE INSTALLATION: ALGORITHM

1. Stop all single node clusters

2. Decide one as NameNode (Master) and remaining as DataNodes(Slaves).

$ssh-copy-id –I $HOME/.ssh/id_rsa.pub lendi@l5sys24

4. Configure all Configuration files, to name Master and Slave Nodes.

5. Add hostnames to file slaves and save it.

6. Configure $ nano [Link]

$ hdfs namenode –format

ubuntu @localhost> jps

Following are steps to Install Apache Hadoop on Ubuntu 14.04

Step2: Configure Apache Hadoop

1. Open bashrc in gedit mode

2. Set java environment variable

3. Set Hadoop environment variable

4. Apply environment variables

Step3: Install eclipse

Adding Files and Directories to HDFS

hadoop fs -mkdir /user/chuck

hadoop fs -put [Link]

Retrieving Files from HDFS

hadoop fs -cat [Link]

hadoop fs -rm [Link]

Copying Data from NFS to HDFS

AIM: To Develop a MapReduce program to calculate the frequency of a given word in

Example – (Map function in Word Count)

Example – (Reduce function in Word Count)

Input Set of Tuples

Output Converts into smaller set of tuples

Work Flow of Program

Now Let’s See the Word Count Program in Java

Make Jar File

Run Jar file

You might also like