0% found this document useful (0 votes)

14 views9 pages

Big Data Analytics Hadoop Installation Guide

Big Data Analytics File

Uploaded by

Kartikey TRIPATHI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views9 pages

Big Data Analytics Hadoop Installation Guide

Big Data Analytics File

Uploaded by

Kartikey TRIPATHI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

NETAJI SUBASH UNIVERSITY OF TECHNOLOGY

EAST CAMPUS
Geeta Colony, New Delhi- 110031

BIG DATA ANALYTICS

Course Code – CBCPC11

PRACTICAL FILE

Submitted By: Aarav Jain Submitted To: Shajal Afaq

Roll No: 2022UCB6063

INDEX
SNO EXPERIMENT DATE SUBMISSION SIGN
Experiment – 1
AIM: Installation of VMWare to set up Hadoop Environment and its ecosystems.

OUTPUT:
Experiment -2
AIM: To perform setting up of Hadoop in three operating modes: a)stand alone
(b)pseudo-distributed (c)fully-distributed

DESCRIPTION:
Hadoop is written in java, so you will need to have java in your machine,v6 or later. Sun's
Java Development Kit is the one most widely used with Hadoop, although others have
been reported to work.

Hadoop runs on Unix and Windows, Linux is the only supported production platform
,but the flavours of Unix (including MAC OS(x)) can be used to run Hadoop for
development. windows is only supported as a dev platform ,and additionally requires
Cygwin . During the installation you should include the open SSH packet if you plan to run
Hadoop in solo distributed mode.

ALGORITHM:
a) STEPS INVOLVED IN INSTALLING HDOOP IN STAND ALONE MODE

1. COMMAND FOR INSTALLING SSH IS : pseudo app get install ssh

2. COMMAND FOR KEY GENERATION IS : ssh-keygen-trsa-P's'
3. STORE THE KEY INTO [Link] BY USING THE COMMAND : CAT
$[Link]/id-isaput>>$HOME/.ssh/authorised_keys

4. Extract java using the command :tar-fz jdk [Link]

5. Extract eclipse using the command :tar [Link]
6. Extract hdoop using the command :[Link]
7. move the java to /usr/library/jbm and eclipse to /opt/path. configure the java path in the
java
8. Export java path and hdoop path in ./bashrc
9. check is installation is successful or not by checking the java version and hdoop version
10. check the hadoop in stand alone mode working correctly or not by usng an
implicit hadoop jad file as word count.
11. if the word count is displayed correctly in -r-00000-filename it means the stand alone
mode is installed successfully
b) STEPS INVOLVED IN INSTALLING HDOOP IN PSSEUDO DISTRIBUTED
MODE:

1. In order to install hdoop in pseudo distributed mode we need to configure hddop

configuration file resides in the directory/home/systemname/hdoop/2.7.1/etc/hdoop

2. First configure the hdoop - [Link] file by changing the java path
3. Configure the core [Link] which contains the property tag, it contains the name and
value. Name as [Link] and value as hdfs://localhost:9000
configure [Link]

4. Configure [Link] before configure the copy [Link] to

[Link]
5. Now format the namenode by using the command hdfs-namenode-format

start -[Link]
start [Link] ,start the daemons like namenode,datnode run

jps which views all daemons.

[Link] a directory by using the cmd hdfsdfs-mkdir/csedir and enter some data into syastem
[Link] and copy from local directory to hdoop using cmd hdfsdfs-copy from localcsedir/
and run sample jar [Link] to check whether pseudo distributed is working or not

7. display content of file by using cmd hdfsdfs-cat/newdirectory/part-r-00000.

FULLY DISTRIBUTION MODE INSTALLATION:

ALGO:

1. Stop all single node cluster

$STOP [Link]

2. Decide one as namenode [master] and remaining as datanodes[slave] copy

public key to all 3 host to get a password less ssh access

$ssh-copy-id-I $HOME/.ssh/id_rsa.pub systemname @systemno.

3. Configure all configuration file, to name master and slave nodes

$ cd $HDOOP_HOME/etc/hdoop
$ nano [Link]
$ nano [Link]
4. Add host name to file slaves and save it
$nano slaves

5. Configure $nano [Link]

6. Do in master node
$hdfs namenode format
$start [Link]
$start [Link]

[Link] namenode

8. Daemons starting in master and slave node

9. End

INPUT FORMAT:
ubantu @localhost>jps

OUTPUT FORMAT:

Datanode, Name node, Secondary namenode, Node manager , resource manager.

Experiment – 3
AIM: Implement the following task management task in Hadoop: a) Add file to
directory (b) retrieving files (c) delete files

DESCRIPTION:
HDFS is a scalable distributed file system designed to scale to petabytes of data while
running on top of underline file system of OS. HDFS keeps track of where the data resides in
a network by associating the names of RACK or network switch with the data set. This
allows Hadoop to efficiently schedule task to those nodes that contains data, or which are
nearest to optimizing bandwidth utilisation. Hadoop provide a set of command line, utilities
that work similarly to the Linux file commands and serve as primary interface with HDFS.
We are going to have a look into HDFS by interacting with it from the command line. We
will take a look at the most common file management task in Hadoop, which
includes;
a) add files and directory to HDFS.
b) retrieve files from HDFS to local file system
c) deleting files from HDFS.

ALGORITHM:
1)Adding Files and Directories to HDFS:

Before you can run Hadoop programme on data stored in HDFS, you will need to put the
data into HDFS first. Lets create a directory and put a file in it. HDFS has a default working
directory of /user/$USER where $USER is your user name.
This directory is automatically created for you, though, lets create it with mkdir
command. For the purpose of illustration we use chuck. You should substitute your
username in the example command.
hadoop fs-mkdir/user/chuck
hadoop fs-put [Link]
hadoop fs-put [Link]/user/chuck
2)Retrieve file from HDFS :

The Hadoop command get copies file from HDFS back to the local file system to
retrieve [Link] , we can run the following command
hadoop fs-cat [Link]

3)Delete file from HDFS :

hadoop fs-rm [Link]

command for create A DIRECTORY IN HDFS is :
hdfs dfs mkdir/lendicse
command for add A DIRECTORY IN HDFS is : "hdfs dfs-put lendi_english

4)NFS to HDFS copying from directory command is :

‘hdfs dfs -copyFROMLOCAL/home/lendi/desktop/shakes/glossary/lendicse/’ View

the file by command “ hdfs dfs-cta/lendi_english/glossary”

Command for listing items in hadoop is :

“hdfs dfs-ls hdfs://localhost:9000/”

Command for deleting the file is : “hdfs dfs rmr/”

INPUT : As any data of format structured, semi-structured and unstructured.

EXPECTED OUTPUT:

Common questions

The 'hadoop fs-put' command is used for uploading files and directories from a local file system into HDFS, allowing data processing within the distributed environment of Hadoop. Conversely, 'hadoop fs-get' retrieves files from HDFS to the local file system. These commands are significant for managing data transfer between local and distributed systems, facilitating data ingestion into Hadoop for processing, and enabling access to processed data locally for analysis and reporting .

The steps to set up Hadoop in standalone mode include: installing SSH using 'pseudo app get install ssh', generating SSH keys using 'ssh-keygen-trsa-P's'', storing the key, extracting Java with 'tar-fz jdk 8u60-linux-i586.tar.gz', installing Eclipse, extracting Hadoop using 'tar-XVfz-hadoop-2.71.tar.gz', moving Java and Eclipse to the appropriate paths, exporting Java and Hadoop paths in '.bashrc', and verifying the installation by checking Java and Hadoop versions. The word count function is used to test the standalone mode to ensure correct installation, as successful execution shows that Hadoop can process data .

Formatting the namenode is necessary to set up the Hadoop Distributed File System. During this step, the file system metadata is initialized, ensuring the system starts with a clean state. In pseudo-distributed and fully distributed setups, this step prepares the namenode to manage data blocks and file metadata across the cluster, enabling efficient data storage and retrieval. Not formatting could lead to conflicts or errors if residual data or settings from a previous setup exist .

HDFS enhances bandwidth utilization by associating data with the names of RACKs or network switches, allowing Hadoop to efficiently schedule tasks to nodes that either contain the data or are located nearby. By prioritizing data locality, Hadoop minimizes data movement across the network, optimizing bandwidth usage, reducing latency, and improving overall system performance during data processing .

Files in HDFS are removed using the 'hadoop fs-rm' command. Before executing this command, one should consider the importance and dependency of the data, as deletion is irreversible. Ensuring data is backed up if necessary and confirming the correct path of the files intended for deletion is critical to avoid accidental data loss. Understanding the impact of the removal on dependent applications and workflows is also essential .

Pseudo-distributed mode offers the advantage of simulating a multi-node cluster on a single machine, which facilitates testing and development without the need for actual multiple hardware setups, making it cost-effective and accessible. However, it may pose challenges such as limited resource availability since all operations still run on a single machine, potentially leading to performance bottlenecks that wouldn't occur in a true multi-node environment. Additionally, discrepancies in performance metrics can arise compared to running the same tasks on a full cluster, which may affect tuning and scaling insights .

Configuring XML files such as hadoop-env.sh, core-site.xml, yarn-site.xml, and mapred-site.xml is crucial in setting up Hadoop in pseudo-distributed mode because these files define system properties and daemon settings. Proper configuration of these files ensures that Hadoop can run in a simulated multi-node environment on a single machine, allowing tasks to be distributed and managed as if running on separate physical nodes. This setup provides a realistic test environment that replicates a full cluster system and aids in performance tuning and resource management .

The '-ls' command in Hadoop is used to list items within directories of HDFS. It contributes to file system management by providing users with a way to view and verify the files and directories present in the HDFS, akin to checking file structures in a standard operating system. This command is crucial for maintaining an organized HDFS, enabling users to navigate the file system efficiently and manage resources effectively .

To convert a single-node cluster into a fully distributed cluster, one must first stop the single-node cluster using '$STOP CALL.SH'. Then, designate one machine as the namenode (master) and others as datanodes (slaves). Public key distribution using $ssh-copy-id is crucial for password-less SSH access between nodes, which simplifies node communication and management. After setting up SSH, configure core-site.xml and hdfs-site.xml on all nodes to identify master and slave roles, add hostnames to the 'slaves' configuration file, and update yarn-site.xml. Format the namenode and start Hadoop services with DFS and YARN commands on both master and slave nodes .

Multiple configuration files such as hadoop-env.sh, core-site.xml, yarn-site.xml, and mapred-site.xml allow for granular control over various components and processes within Hadoop, enhancing deployment efficiency. By separating configurations, administrators can fine-tune parameters specific to particular engines and services, leading to optimized resource use and system performance across standalone, pseudo-distributed, and fully distributed modes. However, this complexity requires thorough configuration management to ensure settings are correctly aligned across all files, minimizing potential for errors during deployment .

Big Data Analytics Lab Manual (CCS334)
No ratings yet
Big Data Analytics Lab Manual (CCS334)
49 pages
Hadoop Installation and File Management Guide
No ratings yet
Hadoop Installation and File Management Guide
44 pages
Hadoop Installation and Configuration Lab
No ratings yet
Hadoop Installation and Configuration Lab
42 pages
Hadoop Installation and File Management Guide
No ratings yet
Hadoop Installation and File Management Guide
46 pages
Hadoop Installation and File Management Guide
No ratings yet
Hadoop Installation and File Management Guide
46 pages
Install Hadoop: Modes & File Management
No ratings yet
Install Hadoop: Modes & File Management
42 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
45 pages
Hadoop Big Data Analytics Lab Exercises
No ratings yet
Hadoop Big Data Analytics Lab Exercises
45 pages
Big Data Analytics Lab Manual 2022-23
No ratings yet
Big Data Analytics Lab Manual 2022-23
45 pages
Install Hadoop in Pseudo Distributed Mode
No ratings yet
Install Hadoop in Pseudo Distributed Mode
13 pages
Big Data Analytics Lab Record 2023-24
No ratings yet
Big Data Analytics Lab Record 2023-24
45 pages
CCS334 Big Data Analytics Lab Manual
No ratings yet
CCS334 Big Data Analytics Lab Manual
53 pages
AI & Big Data Lab Record Notebook
No ratings yet
AI & Big Data Lab Record Notebook
46 pages
Hadoop Installation and File Management Guide
No ratings yet
Hadoop Installation and File Management Guide
16 pages
Install and Configure Hadoop on Ubuntu
No ratings yet
Install and Configure Hadoop on Ubuntu
43 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
45 pages
Install and Configure Hadoop on Ubuntu
No ratings yet
Install and Configure Hadoop on Ubuntu
43 pages
Bda Labmanual1 5
No ratings yet
Bda Labmanual1 5
23 pages
Hadoop Installation and File Management Guide
No ratings yet
Hadoop Installation and File Management Guide
44 pages
Single Node Hadoop Setup Guide
No ratings yet
Single Node Hadoop Setup Guide
10 pages
Big Data Analytics Laboratory Guide
No ratings yet
Big Data Analytics Laboratory Guide
50 pages
CCS334 Big Data Analytics Lab Manual
No ratings yet
CCS334 Big Data Analytics Lab Manual
33 pages
Hadoop Installation and Configuration Guide
No ratings yet
Hadoop Installation and Configuration Guide
69 pages
Big Data Lab Manual for B.Tech CSE
No ratings yet
Big Data Lab Manual for B.Tech CSE
32 pages
Don't Make Any Changes
No ratings yet
Don't Make Any Changes
49 pages
Install and Manage Hadoop Steps
No ratings yet
Install and Manage Hadoop Steps
4 pages
Big Data Analytics Lab Record 2023-24
No ratings yet
Big Data Analytics Lab Record 2023-24
48 pages
Install and Configure Hadoop Setup
No ratings yet
Install and Configure Hadoop Setup
5 pages
Hadoop MapReduce Lab Manual Guide
No ratings yet
Hadoop MapReduce Lab Manual Guide
18 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
45 pages
BDA LAB Manual Final
No ratings yet
BDA LAB Manual Final
51 pages
Installing Hadoop in Pseudo Distributed Mode
No ratings yet
Installing Hadoop in Pseudo Distributed Mode
13 pages
Installing Hadoop and Hive on Ubuntu
No ratings yet
Installing Hadoop and Hive on Ubuntu
49 pages
Step-by-Step Hadoop Installation Guide
No ratings yet
Step-by-Step Hadoop Installation Guide
72 pages
Hadoop Big Data Experiments Guide
No ratings yet
Hadoop Big Data Experiments Guide
28 pages
Hadoop 6
No ratings yet
Hadoop 6
5 pages
Installing and Configuring Hadoop Guide
No ratings yet
Installing and Configuring Hadoop Guide
66 pages
Hadoop MapReduce Dashboard Setup Guide
No ratings yet
Hadoop MapReduce Dashboard Setup Guide
39 pages
Hadoop Installation and File Management Guide
No ratings yet
Hadoop Installation and File Management Guide
9 pages
Program 1 & 2 (Big Data)
No ratings yet
Program 1 & 2 (Big Data)
12 pages
Installing and Configuring Hadoop Guide
No ratings yet
Installing and Configuring Hadoop Guide
43 pages
Hadoop Installation and Configuration Guide
No ratings yet
Hadoop Installation and Configuration Guide
25 pages
Hadoop Installation and File Management Guide
No ratings yet
Hadoop Installation and File Management Guide
9 pages
Big Data Lab Manual - 115102
No ratings yet
Big Data Lab Manual - 115102
62 pages
BDA Manual: Installing and Managing Hadoop
No ratings yet
BDA Manual: Installing and Managing Hadoop
26 pages
In-Memory Hadoop Cluster Overview
No ratings yet
In-Memory Hadoop Cluster Overview
40 pages
Hadoop Installation and File Management Guide
No ratings yet
Hadoop Installation and File Management Guide
28 pages
Hadoop Installation and File Management Guide
No ratings yet
Hadoop Installation and File Management Guide
34 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
51 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
49 pages
Install and Configure Hadoop on Ubuntu
No ratings yet
Install and Configure Hadoop on Ubuntu
49 pages
Hadoop Startup and File Management Guide
No ratings yet
Hadoop Startup and File Management Guide
4 pages
Introduction to Hadoop and HDFS Setup
No ratings yet
Introduction to Hadoop and HDFS Setup
35 pages
BigData Analytics Lab Manual SJIT
No ratings yet
BigData Analytics Lab Manual SJIT
56 pages
Install Hadoop on Ubuntu via VMWare
No ratings yet
Install Hadoop on Ubuntu via VMWare
14 pages
Apache Hive Installation Guide
No ratings yet
Apache Hive Installation Guide
13 pages
Bda Labmanual
No ratings yet
Bda Labmanual
66 pages
COS 226: WordNet Assignment Guide
No ratings yet
COS 226: WordNet Assignment Guide
4 pages
Essential Linux Filter Commands
No ratings yet
Essential Linux Filter Commands
10 pages
Python Socket Chat Client Code
No ratings yet
Python Socket Chat Client Code
3 pages
Network Performance for Multiplayer Games
No ratings yet
Network Performance for Multiplayer Games
182 pages
Integer Programming Overview and Methods
No ratings yet
Integer Programming Overview and Methods
36 pages
Weekly Class Schedule for Astana IT University
No ratings yet
Weekly Class Schedule for Astana IT University
128 pages
Battery Management Systems Overview
No ratings yet
Battery Management Systems Overview
43 pages
LPI DevOps Tools Engineer Certification Guide
No ratings yet
LPI DevOps Tools Engineer Certification Guide
2 pages
English Task 1: Conversations & Memos
No ratings yet
English Task 1: Conversations & Memos
3 pages
Student Management Web App Guide
No ratings yet
Student Management Web App Guide
7 pages
GLOFA-GM7 Functional Specifications
No ratings yet
GLOFA-GM7 Functional Specifications
30 pages
Node.js Installation and Basics Guide
100% (3)
Node.js Installation and Basics Guide
29 pages
Basics of CNC Machines Overview
No ratings yet
Basics of CNC Machines Overview
21 pages
Exam Questions on Network Protocols
No ratings yet
Exam Questions on Network Protocols
16 pages
(PAPAYA 3D) Installation Manual Eng Ver1.5 - CE 2460
No ratings yet
(PAPAYA 3D) Installation Manual Eng Ver1.5 - CE 2460
52 pages
AI Unit 5
No ratings yet
AI Unit 5
14 pages
CompTIA-SY0-601 2022
No ratings yet
CompTIA-SY0-601 2022
123 pages
Zscaler Risk Management Lab Guide
No ratings yet
Zscaler Risk Management Lab Guide
121 pages
Key Principles of UI Design Explained
No ratings yet
Key Principles of UI Design Explained
2 pages
Anexo Seccion 1 (Power Quality)
No ratings yet
Anexo Seccion 1 (Power Quality)
20 pages
Paket Bulk Internet GadingNet Cirebon
No ratings yet
Paket Bulk Internet GadingNet Cirebon
10 pages
Alpha PDF
100% (7)
Alpha PDF
120 pages
Optimize Your DevOps Release Management
100% (3)
Optimize Your DevOps Release Management
4 pages
Lists vs Dictionaries vs Tuples in Python
No ratings yet
Lists vs Dictionaries vs Tuples in Python
2 pages
IGNOU BCA_NEW Programme Guide 2024
No ratings yet
IGNOU BCA_NEW Programme Guide 2024
72 pages
ICDL Presentation Skills Guide
No ratings yet
ICDL Presentation Skills Guide
160 pages
Tabela Geral 19-12
No ratings yet
Tabela Geral 19-12
8 pages
Dynapac Fault Code Definitions Guide
100% (1)
Dynapac Fault Code Definitions Guide
9 pages
Deep Learning Laboratory Course Outcomes
No ratings yet
Deep Learning Laboratory Course Outcomes
7 pages
Git Commands Cheat Sheet
No ratings yet
Git Commands Cheat Sheet
1 page

Big Data Analytics Hadoop Installation Guide

Uploaded by

Big Data Analytics Hadoop Installation Guide

Uploaded by

NETAJI SUBASH UNIVERSITY OF TECHNOLOGY

BIG DATA ANALYTICS

Submitted By: Aarav Jain Submitted To: Shajal Afaq

Roll No: 2022UCB6063

1. COMMAND FOR INSTALLING SSH IS : pseudo app get install ssh

4. Extract java using the command :tar-fz jdk [Link]

1. In order to install hdoop in pseudo distributed mode we need to configure hddop

4. Configure [Link] before configure the copy [Link] to

jps which views all daemons.

7. display content of file by using cmd hdfsdfs-cat/newdirectory/part-r-00000.

FULLY DISTRIBUTION MODE INSTALLATION:

1. Stop all single node cluster

2. Decide one as namenode [master] and remaining as datanodes[slave] copy

$ssh-copy-id-I $HOME/.ssh/id_rsa.pub systemname @systemno.

5. Configure $nano [Link]

8. Daemons starting in master and slave node

Datanode, Name node, Secondary namenode, Node manager , resource manager.

3)Delete file from HDFS :

hadoop fs-rm [Link]

4)NFS to HDFS copying from directory command is :

‘hdfs dfs -copyFROMLOCAL/home/lendi/desktop/shakes/glossary/lendicse/’ View

the file by command “ hdfs dfs-cta/lendi_english/glossary”

“hdfs dfs-ls hdfs://localhost:9000/”

INPUT : As any data of format structured, semi-structured and unstructured.

Common questions

Describe the process and significance of using 'hadoop fs-put' and 'hadoop fs-get' commands in managing files in HDFS.

What are the primary steps involved in setting up Hadoop in standalone mode, and why is it essential to verify installation using the word count function?

Why is formatting the namenode a necessary step in both pseudo-distributed and fully distributed Hadoop setups?

How does Hadoop's HDFS enhance bandwidth utilization during data processing?

What command is used to remove files in HDFS, and what considerations should be made before executing this command?

Discuss the benefits and potential challenges in utilizing pseudo-distributed mode for Hadoop testing and development.

Explain the significance of configuring XML files in setting up Hadoop pseudo-distributed mode.

What is the role of the '-ls' command in Hadoop, and how does it contribute to file system management within HDFS?

What are the steps to convert a single-node cluster into a fully distributed cluster, and why is SSH key management important in this process?

Evaluate the impact of utilizing multiple configuration files on the deployment efficiency of Hadoop in different modes.

You might also like