Big Data Series; Part 1: Set up Hadoop in ubuntu


The reference given at the bottom most of this page can give you a detailed description on setup of Hadoop. I will take you through my experience in setting it up in Ubuntu.

You should have a linux/unix system with jvm installed and password-less ssh enabled.

Download the latest release of hadoop FROM

I prefer *.tar.gz to other installable packages because once you setup hadoop with installable packages, it will be hard for you to find the configuration files for any editing(from my experience; I removed it and installed with *.tar.gz).

Assuming that your browser downloaded the hadoop tar file to Downloads folder.

(in my case Image)

I chose /app folder to setup hadoop. So move the tar file to /app


Unzip and un-tar the file there:


You will need to edit the file to set the JAVA_HOME environment variable.

If you try to start hadoop without this modification, hadoop will fail to start throwing the below error:


gedit is a text editor I am using. You can prefer your favourite(vi/vim/textedit/…)

location of (hadoop<version>/conf)


You will find below lines in


Either edit the already existing line of add a new line as I did:


You can know about your specific location with following commands:


As you can see, I highlighted /usr/lib/jvm/java-7-oracle/jre/bin/java. hadoop expects us to specify the path till java-7-oracle ie. “/usr/lib/jvm/java-7-oracle”

This will be enough to kick-start your hadoop in stand-alone mode.

Since I plan to install Apache Pig for scripting, I will setup hadoop in pseudo Distributed mode. For that I need to edit three files: core-site.xml, hdfs-site.xml and mapred-site.xml which can be found in “hadoop<version>/conf/” directory. The same information can be found in the reference as well.




Now the recipe is ready. Before I can start hadoop there is this one final thing to be done: formatting of name-node. Assuming that you are in the hadoop main directory, run the command: “bin/hadoop namenode -format”

And you will see logs like below:Image


Done with the waiting part. Run the command “bin/” to run NameNode, Secondary NameNode, Data Node, Task Tracker and Job Tracker as back-end processes.


To ensure that all five services are running, use jps command. If you see the below output, “ALL IS WELL..”



Out of my experience in setting it up in different linux and unix variants including Mac, I can say, the same steps can be repeated in any *nix variants.

Big Data is a Big Deal.. 🙂


Published by


A strong believer of: 1. Knowledge is power 2. Progress comes from proper application of knowledge 3. Reverent attains wisdom 4. For one's own salvation, and for the welfare of the world

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s