Big Data Series; Part 1: Set up Hadoop in ubuntu

Image

The reference given at the bottom most of this page can give you a detailed description on setup of Hadoop. I will take you through my experience in setting it up in Ubuntu.

You should have a linux/unix system with jvm installed and password-less ssh enabled.

Download the latest release of hadoop FROM

I prefer *.tar.gz to other installable packages because once you setup hadoop with installable packages, it will be hard for you to find the configuration files for any editing(from my experience; I removed it and installed with *.tar.gz).

Assuming that your browser downloaded the hadoop tar file to Downloads folder.

(in my case Image)

I chose /app folder to setup hadoop. So move the tar file to /app

Image

Unzip and un-tar the file there:

Image

You will need to edit the hadoop-env.sh file to set the JAVA_HOME environment variable.

If you try to start hadoop without this modification, hadoop will fail to start throwing the below error:

Image

gedit is a text editor I am using. You can prefer your favourite(vi/vim/textedit/…)

location of hadoop-env.sh (hadoop<version>/conf)

Image

You will find below lines in hadoop-env.sh

Image

Either edit the already existing line of add a new line as I did:

Image

You can know about your specific location with following commands:

Image

As you can see, I highlighted /usr/lib/jvm/java-7-oracle/jre/bin/java. hadoop expects us to specify the path till java-7-oracle ie. “/usr/lib/jvm/java-7-oracle”

This will be enough to kick-start your hadoop in stand-alone mode.

Since I plan to install Apache Pig for scripting, I will setup hadoop in pseudo Distributed mode. For that I need to edit three files: core-site.xml, hdfs-site.xml and mapred-site.xml which can be found in “hadoop<version>/conf/” directory. The same information can be found in the reference as well.

ImageImageImage

 

 

Now the recipe is ready. Before I can start hadoop there is this one final thing to be done: formatting of name-node. Assuming that you are in the hadoop main directory, run the command: “bin/hadoop namenode -format”

And you will see logs like below:Image

 

Done with the waiting part. Run the command “bin/start-all.sh” to run NameNode, Secondary NameNode, Data Node, Task Tracker and Job Tracker as back-end processes.

Image

To ensure that all five services are running, use jps command. If you see the below output, “ALL IS WELL..”

 

Image

Out of my experience in setting it up in different linux and unix variants including Mac, I can say, the same steps can be repeated in any *nix variants.

Big Data is a Big Deal.. 🙂

Reference:

http://hadoop.apache.org/docs/r1.1.1/single_node_setup.html

Published by

Sreejith

A strong believer of: 1. Knowledge is power 2. Progress comes from proper application of knowledge 3. Reverent attains wisdom 4. For one's own salvation, and for the welfare of the world

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s