Code Chronicle: January 2016

From Apache Hadoop Website,

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

I hope Dr Imran will give me additional marks for this effort.. :)

Ok so now to installing Hadoop. I assume that you have installed Ubuntu or any other version of linux. I will go through the procedure step by step. Remember installing hadoop is easy but you need to follow the step by step procedure.

And you must know how to use the terminal.

Remember You should be connected to Internet for these commands
So Login to the Ubuntu start a terminal and enter the following command. (Remember Linux is case sensitive so the case of the commands should be exactly the same.

Install JDK

sudo apt-get install openjdk-7-jdk

It will ask for your password and then install java development kit

Install SSH Server

From Wikipedia

Secure Shell, or SSH, is a cryptographic (encrypted) network protocol to allow remote login and other network services to operate securely over an unsecured network.
After installation proceed to install openssh server

sudo apt-get install openssh-server

This will install ssh server. SSH Server is required to communicate with the machine from other machine. It is secure shell

Now to some system administration

Create Users and Groups

The command to add a Group is

addgroup <GROUP-NAME>

We want to add a group named hadoop so fire the following command and also add a user hduser

sudo addgroup hadoop

After asking for your password this will create a user hadoop

Now we need to Create a New User and add it to group hadoop

sudo adduser --ingroup hadoop hduser

This will ask you a couple of questions, Except for the Password, in most cases the Default is Ok.. So Keep on Pressing Enter

Now we have a hduser added to hadoop

Now Logout, and then login using the hduser and entering the password for that account.

You should not encounter any problems, if you encounter any problems Check the commands and its case..

Setup SSH Access by Creating a Certificate

Issue the following command

ssh-keygen -t rsa -P ''

This will generate a key for ssh access

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Test that your access is working by executing the command

ssh localhost

This will ask you to store the key.. and then ask for your password. If you manage to login using the above command, it means that your ssh server is working. Otherwise restart your computer and try again.

Now things are set.. We need to download Hadoop. I downloaded Hadoop from the following link

http://www.eu.apache.org/dist/hadoop/core/hadoop-2.7.1/hadoop-2.7.1.tar.gz

Download Hadoop

You can either use the browser to download it. If you don't want to leave the console, issue the following command

wget http://www.eu.apache.org/dist/hadoop/core/hadoop-2.7.1/hadoop-2.7.1.tar.gz

Once download is completed, issue the following command

sudo tar vxzf hadoop-2.2.0.tar.gz -C /usr/local

This command will extract the files to /usr/local directory

Change directory to /usr/local

cd /usr/local

rename hadoop-2.7.1 to hadoop

mv hadoop-2.7.1 hadoop

In order for hduser to access the hadoop directory change its ownership

sudo chown -R hduser:hadoop hadoop

That's the easy part and it is done. Congrats. Take a break, drink a cup of coffee or root beer.. :)

Now to configure Environment variables for Hadoop

Setting User Variables

Switch to your home Directory by issuing the following command

cd ~

gedit .bashrc

This will open the file .bashrc which keeps user information. You will see a lot of lines of code. Don't change any code. Move to the end of the file and paste the following

#Hadoop variables
export JAVA_HOME=/usr/lib/jvm/jdk/
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL

Setting Hadoop Variables

Switch to hadoop directory by this command

cd /usr/local/hadoop/etc/hadoop

Open hadoop-env.sh by issuing the command

gedit hadoop-env.sh

Add the following code at the end of the file

#modify JAVA_HOME

export JAVA_HOME=/usr/lib/jvm/jdk/

Logout and then login once again using the hduser

Now open a terminal Fire the command to test hadoop version

hadoop version

If it shows the version. Hadoop is installed.. Happy Coding.

Code Chronicle

Configuring Hadoop for a single Cluster Operations