0% found this document useful (0 votes)
11 views5 pages

Installing Apache Hadoop 3.2.3 Guide

The document provides a step-by-step guide for installing Hadoop, including prerequisites like Java JDK 8 and necessary configurations in various files such as .bashrc, core-site.xml, hdfs-site.xml, mapred-site.xml, and yarn-site.xml. It also includes instructions for setting up SSH and starting Hadoop. Additionally, the document briefly mentions Hadoop commands for file management and directory operations.

Uploaded by

sudhakarpappu123
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

Installing Apache Hadoop 3.2.3 Guide

The document provides a step-by-step guide for installing Hadoop, including prerequisites like Java JDK 8 and necessary configurations in various files such as .bashrc, core-site.xml, hdfs-site.xml, mapred-site.xml, and yarn-site.xml. It also includes instructions for setting up SSH and starting Hadoop. Additionally, the document briefly mentions Hadoop commands for file management and directory operations.

Uploaded by

sudhakarpappu123
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Big Data DA-1

[Link] Sankar

22BIT0222

Installing Hadoop-
Step 1: Install java jdk 8
sudo apt install openjdk-8-jdk

Step 2: Add this configuration on you bash file


Now just open .bashrc file and paste these commands.

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:/usr/lib/jvm/java-8-openjdk-amd64/bin
export HADOOP_HOME=~/hadoop-3.2.3/
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-[Link]=$HADOOP_HOME/lib/native"
export HADOOP_STREAMING=$HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-
[Link]
export HADOOP_LOG_DIR=$HADOOP_HOME/logs
export PDSH_RCMD_TYPE=ssh

sudo apt-get install ssh

now go to [Link] website download the tar file

tar -zxvf ~/Downloads/[Link]

cd hadoop-3.2.3/etc/Hadoop

now open hadoop-env.h


sudo nano hadoop-env.h

JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

Step 3 : Add this file in [Link]


Now add this configuration in [Link] file

<configuration>
<property>
<name>[Link]</name>
<value>hdfs://localhost:9000</value> </property>
<property>
<name>[Link]</name> <value>*</value>
</property>
<property>
<name>[Link]</name> <value>*</value>
</property>
<property>
<name>[Link]</name> <value>*</value>
</property>
<property>
<name>[Link]</name> <value>*</value>
</property>
</configuration>

Step 4: Add this file in [Link]


Now add this configuration in [Link] file.

<configuration>
<property>
<name>[Link]</name>
<value>1</value>
</property>
</configuration>

Step 5: Add this file in [Link]


<configuration>
<property>
<name>[Link]</name> <value>yarn</value>
</property>
<property>
<name>[Link]</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share
/hadoop/mapreduce/lib/*</value>
</property>
</configuration>

Step 6: Add this file in [Link]-


<configuration>
<property>
<name>[Link]-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>[Link]-whitelist</name>

<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASS
PATH_PREP END_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>

Ssh-

ssh localhost
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
hadoop-3.2.3/bin/hdfs namenode -format

export PDSH_RCMD_TYPE=ssh

Step 7: Start hadoop


[Link]

Hadoop Comands
Creating a Directory-
Creating a file in hadoop and uploading a file from local system to Hadoop-

Downloading a file form Hadoop to local system and viewing what is in that file -

Showing what is in the file-

Size of the directory-


Last modified details-

Delete the directory-

Common questions

Powered by AI

A misconfigured dfs.replication parameter can lead to inefficient data redundancy or increased risk of data loss. If set too low, it reduces fault tolerance by minimizing the number of data copies, exposing the system to data loss risks. Conversely, setting it too high can strain the storage resources with unnecessary replication, diminishing the cluster's efficiency .

The yarn.nodemanager.aux-services property configures auxiliary services such as mapreduce_shuffle, which is fundamental for supporting MapReduce operations within the YARN framework. It allows data to be shuffled between mappers and reducers, thus enabling the distributed processing of large datasets .

The configuration settings in core-site.xml define fundamental Hadoop properties such as the default file system (fs.defaultFS), which is necessary for identifying the NameNode address. In hdfs-site.xml, settings like dfs.replication configure how data is stored across the Hadoop Distributed File System by defining the number of data replicas. These configurations ensure that Hadoop operates correctly and efficiently utilizes its distributed architecture .

The mapreduce.application.classpath configuration specifies the directories containing the MapReduce libraries and dependencies needed to execute jobs. It influences MapReduce operations by ensuring that all necessary jar files are available during job execution, thus preventing class loading errors and facilitating smooth execution of MapReduce applications .

HADOOP_MAPRED_HOME specifies the location of the MapReduce framework within the Hadoop installation. It is used to configure the classpath for running MapReduce jobs, thus ensuring that all necessary libraries are accessible for job execution and facilitating proper map and reduce task management .

Setting the JAVA_HOME environment variable is crucial because Hadoop requires Java for its execution. This variable tells the Hadoop installation where to find the Java runtime. If it is incorrectly configured, Hadoop might not start, as it would not be able to locate the Java binaries needed for its operation .

Incorrect SSH key setup could result in authentication issues when attempting to remotely execute commands across the Hadoop cluster. This could prevent the automation of service management and make it difficult to start or stop Hadoop services, leading to potential operational inefficiencies and manual overhead .

SSH configuration is essential for Hadoop setup because it allows for the secure and automated execution of commands across different nodes within the Hadoop cluster. It facilitates operations by enabling password-less login between nodes, which is critical for initiating Hadoop services and managing the cluster without manual intervention .

Setting PDSH_RCMD_TYPE to 'ssh' ensures that all parallel command executions in Hadoop use SSH as the remote command execution method. This setting is crucial for tasks that require remote execution, like starting or monitoring distributed processes within the cluster, enhancing secure and efficient management .

An incorrect fs.defaultFS setting can significantly impact a Hadoop cluster by directing data storage and processing commands to an incorrect or non-existent NameNode. This misconfiguration can lead to failures in file operations, as Hadoop clients would be unable to locate the primary coordination point for distributed storage and management .

You might also like