From Apache Hadoop Website,
The Apache Hadoop software library is a framework that
allows for the distributed processing of large data sets across clusters
of computers using simple programming models. It is designed to scale up
from single servers to thousands of machines, each offering local
computation and storage. Rather than rely on hardware to deliver
high-availability, the library itself is designed to detect and handle
failures at the application layer, so delivering a highly-available
service on top of a cluster of computers, each of which may be prone to
failures.
I hope Dr Imran will give me additional marks for this effort.. :)
Ok so now to installing Hadoop. I assume that you have installed Ubuntu or any other version of linux. I will go through the procedure step by step. Remember installing hadoop is easy but you need to follow the step by step procedure.
And you must know how to use the terminal.
Remember You should be connected to Internet for these commands
So Login to the Ubuntu start a terminal and enter the following command. (Remember Linux is case sensitive so the case of the commands should be exactly the same.
Install JDK
sudo apt-get install openjdk-7-jdk
It will ask for your password and then install java development kit
Install SSH Server
From Wikipedia
Secure Shell, or SSH, is a cryptographic (encrypted) network protocol to allow remote login and other network services to operate securely over an unsecured network.
After installation proceed to install openssh server
sudo apt-get install openssh-server
This will install ssh server. SSH Server is required to communicate with the machine from other machine. It is secure shell
Now to some system administration
Create Users and Groups
The command to add a Group is
addgroup <GROUP-NAME>
We want to add a group named hadoop so fire the following command and also add a user hduser
sudo addgroup hadoop
After asking for your password this will create a user
hadoop
Now we need to Create a New User and add it to group hadoop
sudo adduser --ingroup hadoop hduser
This will ask you a couple of questions, Except for the Password, in most cases the Default is Ok.. So Keep on Pressing Enter
Now we have a hduser added to hadoop
Now Logout, and then login using the hduser and entering the password for that account.
You should not encounter any problems, if you encounter any problems Check the commands and its case..
Setup SSH Access by Creating a Certificate
Issue the following command
ssh-keygen -t rsa -P ''
This will generate a key for ssh access
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Test that your access is working by executing the command
ssh localhost
This will ask you to store the key.. and then ask for your password. If you manage to login using the above command, it means that your ssh server is working. Otherwise restart your computer and try again.
Now things are set.. We need to download Hadoop. I downloaded Hadoop from the following link
http://www.eu.apache.org/dist/hadoop/core/hadoop-2.7.1/hadoop-2.7.1.tar.gz
Download Hadoop
You can either use the browser to download it. If you don't want to leave the console, issue the following command
wget http://www.eu.apache.org/dist/hadoop/core/hadoop-2.7.1/hadoop-2.7.1.tar.gz
Once download is completed, issue the following command
sudo tar vxzf hadoop-2.2.0.tar.gz -C /usr/local
This command will extract the files to /usr/local directory
Change directory to /usr/local
cd /usr/local
rename hadoop-2.7.1 to hadoop
mv hadoop-2.7.1 hadoop
In order for hduser to access the hadoop directory change its ownership
sudo chown -R hduser:hadoop hadoop
That's the easy part and it is done. Congrats. Take a break, drink a cup of coffee or root beer.. :)
Now to configure Environment variables for Hadoop
Setting User Variables
Switch to your home Directory by issuing the following command
cd ~
gedit .bashrc
This will open the file .bashrc which keeps user information. You will see a lot of lines of code. Don't change any code. Move to the end of the file and paste the following
#Hadoop variables
export JAVA_HOME=/usr/lib/jvm/jdk/
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
Setting Hadoop Variables
Switch to hadoop directory by this command
cd /usr/local/hadoop/etc/hadoop
Open hadoop-env.sh by issuing the command
gedit hadoop-env.sh
Add the following code at the end of the file
#modify JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/jdk/
Logout and then login once again using the hduser
Now open a terminal Fire the command to test hadoop version
hadoop version
If it shows the version. Hadoop is installed.. Happy Coding.