Lahore Garrison University
Virtual Systems & Services - Lab
Assignment # 01
Submitted To:
Mam Alishwa Amin
Submitted By:
Syed M. Ali Hamza
Roll No:
Fa-21/BS-IT/057
(Sec-B)
Installation and Setup Hadoop in Ubuntu
To install Hadoop, you will have to go through various steps, which include:
• Installing Java and configuring environment variables
• Creating user and configuring SSH
• Installation and configuration of Hadoop
Step 1: Installing Java on Ubuntu
To install java on Ubuntu,
sudo apt install default-jdk
sudo apt install default-jre
sudo apt install -y
To verify the installation, check the java version:
java -version
Step 2: Create a user for Hadoop and configure SSH
First, create a new user named hadoop:
sudo adduser hadoop
To enable superuser privileges to the new user, add it to the sudo group:
sudo usermod -aG sudo Hadoop
Once done, Switch the user hadoop:
sudo su - hadoop
Next, install the OpenSSH server and client:
sudo apt install openssh-server openssh-client -y
Now, use the following command to generate private and public keys:
ssh-keygen -t rsa
Here, it will ask you:
• Where to save the key (hit enter to save it inside your home directory)
• Create passphrase for keys (leave blank for no passphrase)
Now, add the public key to authorized_keys:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Use the chmod command to change the file permissions of authorized_keys:
sudo chmod 640 ~/.ssh/authorized_keys
Finally, verify the SSH configuration:
ssh localhost
If you have not configured the password, all you have to do is type yesand hit enter if you
added a passphrase for the keys, it will ask you to enter here:
Step 3: Download and install Apache Hadoop on Ubuntu
If you have created a user for Hadoop, first, log in as the hadoop user:
sudo su - hadoop
download this release:
wget[Link]
Once you are done with the download, extract the file using the following command:
tar -xvzf [Link]
Next, move the extracted file to the /usr/local/hadoop using the following command:
sudo mv hadoop-3.4.1 /usr/local/hadoop
Now, create a directory mkdir command to store logs:
sudo mkdir /usr/local/hadoop/logs
Finally, change the ownership of the /usr/local/hadoop to the user hadoop:
sudo chown -R hadoop:hadoop /usr/local/Hadoop
Step 4: Configure Hadoop on Ubuntu
First, open the .bashrc file using the following command:
sudo nano ~/.bashrc
Jump to the end of the line of nano text editor, and paste the following:
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-[Link]=$HADOOP_HOME/lib/native"
Save changes and exit from the nano text editor.
To enable the changes, source the .bashrc file:
source ~/.bashrc
Step 5: Configure java environment variables
To use Hadoop, you are required to enable its core functions which include YARN, HDFS,
MapReduce, and Hadoop-related project settings.
To do that, you will have to define java environment variables in [Link].
Edit the [Link]
First, open the [Link]:
sudo nano $HADOOP_HOME/etc/hadoop/[Link]
Press Alt + / to jump to the end of the file and paste the following lines in the file to add the
path of the Java: (use the version of jdk which is installed instead of java-21)
export JAVA_HOME=/usr/lib/jvm/java-21-openjdk-amd64
export HADOOP_CLASSPATH+=" $HADOOP_HOME/lib/*.jar"
Save changes and exit from the text editor.
Next, change your current working directory to /usr/local/hadoop/lib:
cd /usr/local/hadoop/lib
Here, download the javax activation file:
Sudo wget [Link]
api/1.2.0/[Link]
Once done, check the Hadoop version in Ubuntu:
hadoop version
Next, you will have to edit the [Link] to specify the URL for the name node.
Edit the [Link]
First, open the [Link] using the following command:
sudo nano $HADOOP_HOME/etc/hadoop/[Link]
And add the following lines in between <configuration> </configuration>:
<property>
<name>[Link]</name>
<value>hdfs://[Link]:9000</value>
<description>The default file system URI</description>
</property>
Save the changes and exit from the text editor.
Next, create a directory to store node metadata using the following command:
sudo mkdir -p /home/hadoop/hdfs/{namenode,datanode}
And change the ownership of the created directory to the hadoopuser:
sudo chown -R hadoop:hadoop /home/hadoop/hdfs
Edit the [Link] configuration file
By configuring the [Link], you will define the location for storing node
metadata, fs-image file.
So first open the configuration file:
sudo nano $HADOOP_HOME/etc/hadoop/[Link]
And paste the following line in between <configuration> ... </configuration>:
<property>
<name>[Link]</name>
<value>1</value>
</property>
<property>
<name>[Link]</name>
<value>[Link]
</property>
<property>
<name>[Link]</name>
<value>[Link]
</property>
Save changes and exit from the [Link].
Edit the [Link] file
By editing the [Link], you can define the MapReduce values.
To do that, first, open the configuration file using the following command:
sudo nano $HADOOP_HOME/etc/hadoop/[Link]
And paste the following line in between <configuration> ... </configuration>:
<property>
<name>[Link]</name>
<value>yarn</value>
</property>
Save and exit from the nano text editor.
Edit the [Link] file
This is the last configuration file that needs to be edited to use the Hadoop service.
The purpose of editing this file is to define the YARN settings.
First, open the configuration file:
sudo nano $HADOOP_HOME/etc/hadoop/[Link]
Paste the following in between <configuration> ... </configuration>:
<property>
<name>[Link]-services</name>
<value>mapreduce_shuffle</value>
</property>
Save changes and exit from the config file.
Finally, use the following command to validate the Hadoop configuration and to format the
HDFS NameNode:
hdfs namenode -format
Step 6: Start the Hadoop cluster
To start the Hadoop cluster, you will have to start the previously configured nodes.
So let's start with starting the NameNode and DataNode:
[Link]
Next, start the node manager and resource manager:
[Link]
To verify whether the services are running as intended, use the following command:
jps
Step 7: Access the Hadoop Web Interface
To access the Hadoop web interface, you will have to know your IP and append the port no
9870 in your address bar
My IP is [Link] so I will be entering the following:
[Link]