Tuesday, March 18, 2014

Install Hadoop on Ubuntu

Michael Noll wrote two very comprehensive tutorials about running Hadoop on Ubuntu Linux. One is for single-node cluster. The other is for multi-node cluster.

I gave it a try and installed Hadoop on one Ubuntu box.

Hadoop version : 1.2.1
Ubuntu version : 12.04

Installation

Some difference I altered :
1 The Hadoop version from download site is 1.2.1.
2 Java 6 is installed via openjdk-6-jdk

hduser@ubuntu:/var/log$ java -version
java version "1.6.0_30"
OpenJDK Runtime Environment (IcedTea6 1.13.1) (6b30-1.13.1-1ubuntu2~0.12.04.1)
OpenJDK 64-Bit Server VM (build 23.25-b01, mixed mode)

3 IPv6 is not disabled

Configuration

The following configuration files I follow the single-node document

conf/core-site.xml
conf/mapred-site.xml
conf/hdfs-site.xml

Since I used a different Java 6 package, the JAVA_HOME variable is set as the following in file conf/hadoop-env.sh

JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64

Format the HDFS filesystem

hduser@ubuntu:/usr/local/hadoop/conf$ /usr/local/hadoop/bin/hadoop namenode -format
14/03/18 00:29:26 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ubuntu/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.2.1
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG:   java = 1.6.0_30
************************************************************/
14/03/18 00:29:26 INFO util.GSet: Computing capacity for map BlocksMap
14/03/18 00:29:26 INFO util.GSet: VM type       = 64-bit
14/03/18 00:29:26 INFO util.GSet: 2.0% max memory = 932118528
14/03/18 00:29:26 INFO util.GSet: capacity      = 2^21 = 2097152 entries
14/03/18 00:29:26 INFO util.GSet: recommended=2097152, actual=2097152
14/03/18 00:29:26 INFO namenode.FSNamesystem: fsOwner=hduser
14/03/18 00:29:26 INFO namenode.FSNamesystem: supergroup=supergroup
14/03/18 00:29:26 INFO namenode.FSNamesystem: isPermissionEnabled=true
14/03/18 00:29:26 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
14/03/18 00:29:26 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
14/03/18 00:29:26 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
14/03/18 00:29:26 INFO namenode.NameNode: Caching file names occuring more than 10 times
14/03/18 00:29:26 INFO common.Storage: Image file /app/hadoop/tmp/dfs/name/current/fsimage of size 112 bytes saved in 0 seconds.
14/03/18 00:29:27 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/app/hadoop/tmp/dfs/name/current/edits
14/03/18 00:29:27 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/app/hadoop/tmp/dfs/name/current/edits
14/03/18 00:29:27 INFO common.Storage: Storage directory /app/hadoop/tmp/dfs/name has been successfully formatted.
14/03/18 00:29:27 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/

Start single-node cluster

hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh
starting namenode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-hduser-namenode-ubuntu.out
localhost: starting datanode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-hduser-datanode-ubuntu.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-hduser-secondarynamenode-ubuntu.out
starting jobtracker, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-hduser-jobtracker-ubuntu.out
localhost: starting tasktracker, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-hduser-tasktracker-ubuntu.out

Checking Hadoop processes 

hduser@ubuntu:~$ jps
17320 NameNode
17559 DataNode
17810 SecondaryNameNode
17894 JobTracker
18292 Jps
18136 TaskTracker

Stop single-node cluster

hduser@ubuntu:~$ /usr/local/hadoop/bin/stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode

hduser@ubuntu:~$ jps
21570 Jps




No comments:

Post a Comment