Untitled

(Extract from Dharmesh Kakdia's - "Apache Mesos Essentials")

Installing Hadoop on Mesos
==========================

Hadoop on Mesos (https://github.com/mesos/hadoop) relies on the extension
to Mesos, such as the Mesos executor to execute TaskTrackers and Hadoop
JobTracker Mesos Scheduler to run Hadoop on a Mesos framework. We will run
Hadoop 1.x on Mesos:

1. Install and run Mesos by following the instructions in Chapter 1,
Running Mesos.

2. We need to compile Hadoop on the Mesos library. Hadoop on Mesos uses
Maven to manage dependencies, which we will need to install along with
Java and Git:
ubuntu@master:~ $ sudo apt-get install maven openjdk-7-jdk git

3. Let's clone Hadoop on the Mesos source code from https://github.com/
mesos/hadoop, and navigate to it using the following command:
ubuntu@master:~ $ git clone https://github.com/mesos/hadoop/
ubuntu@master:~ $ cd hadoop

4. Build Hadoop on the Mesos binaries from the code using the following
command. By default, it will build the latest version of Mesos and Hadoop.
If required, we can adjust the versions in the pom.xml file:
ubuntu@master:~ $ mvn package
This will build hadoop-mesos-VERSION-jar in the target folder.

5. Download the Hadoop distribution, extract, and navigate to it. We can use
vanilla Apache distribution, Cloudera Distribution Hadoop (CDH), or any
other Hadoop distribution. We can download and extract the latest CDH
distribution using following commands:
ubuntu@master:~ $ wget http://archive.cloudera.com/cdh5/cdh/5/
hadoop-2.5.0-cdh5.2.0.tar.gz
ubuntu@master:~ $ tar xzf hadoop-*.tar.gz

6. We need to put Hadoop on Mesos jar that we just built in a location where
it's accessible to Hadoop via Hadoop CLASSPATH. We will copy it
to the Hadoop lib folder, which is by default the lib folder in share/
hadoop/common/ inside hadoop distribution:
ubuntu@master:~ $ cp hadoop-mesos/target/hadoop-mesos-*.jar
hadoop-*/share/hadoop/common/lib

7. By default, the CDH distribution is configured to use MapReduce Version 2
(MRv2) with YARN. So, we need to update it to point to MRv1:

ubuntu@master:~ $ cd hadoop-*
ubuntu@master:~ $ mv bin bin-mapreduce2
ubuntu@master:~ $ ln –s bin-mapreduce1 bin
ubuntu@master:~ $ cd etc;
ubuntu@master:~ $ mv hadoop hadoop-mapreduce2
ubuntu@master:~ $ ln –s hadoop-mapreduce1 hadoop
ubuntu@master:~ $ cd -;

Optionally, we can also update examples to point to the MRv1 examples:

ubuntu@master:~ $ mv examples examples-mapreduce2
ubuntu@master:~ $ ln –s example-mapreduce1 examples
www.it-ebooks.info

8. Now, we will configure Hadoop to recognize that it should use the Mesos
scheduler that we just built. Set the following mandatory configuration
options in etc/hadoop/mapred-site.xml by adding them to the

<configuration> and </configuration> tags:
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.MesosScheduler</value>
</property>
<property>
<name>mapred.mesos.taskScheduler</name>
<value>org.apache.hadoop.mapred.JobQueueTaskScheduler
</value>
</property>
<property>
<name>mapred.mesos.master</name>
<value>zk://localhost:2181/mesos</value>
</property>
<property>
<name>mapred.mesos.executor.uri</name>
<value>hdfs://localhost:9000/hadoop.tar.gz</value>
</property>

We specify Hadoop to use Mesos for scheduling tasks by specifying the
mapred.jobtracker.taskScheduler property. The Mesos master address
is specified via mapred.mesos.master, which we have set to the local
ZooKeeper address. mapred.mesos.executor.uri points to the Hadoop
distribution path that we will upload to HDFS, which is to be used for
executing tasks.

9. We have to ensure that Hadoop on Mesos is able to find the Mesos native
library, which by default is located at /usr/local/lib/libmesos.so.
We need to export the location of the Mesos native library by adding the
following line at the start of the bin/hadoop-daemon.sh script:
export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so

10. We need to have a location where the distribution can be accessed by
Mesos, while launching Hadoop tasks. We can put it in HDFS, S3, or any
other accessible location, such as a NFS server. We will put it in HDFS, and
for this, we need to install HDFS on the cluster. We need to start the
Namenode daemon on the HDFS master node. Note that HDFS master
node is independent of the Mesos master. Copy the Hadoop distribution
to the node, and start the Namenode using the following command:
ubuntu@master:~$ bin/hadoop-daemon.sh start namenode
We need to start the Datanode daemons on each node that we want to make
a HDFS slave node (which is independent of the Mesos slave node). Copy the
created Hadoop distribution to all HDFS slave nodes, and start the Datanode
on each using the following command:

ubuntu@master:~$ bin/hadoop-daemon.sh start datanode
We need to format the Namenode for the first usage with the following
command on the HDFS master, where the Namenode is running:
ubuntu@master:~$ bin/hadoop namenode -format

11. Hadoop is now ready to run on Mesos. Let's package it and upload it on

HDFS:
ubuntu@master:~ $ tar cfz hadoop.tar.gz hadoop-*
ubuntu@master:~ $ bin/hadoop dfs -put hadoop.tar.gz /hadoop.tar.gz
ubuntu@master:~ $ bin/hadoop dfs –chmod 777 /hadoop.tar.gz

12. Now, we can start the JobTracker. Note that we don't need to start
TaskTracker manually, as this will be started by Mesos when we submit a
Hadoop job:
ubuntu@master:~ $ bin/hadoop jobtracker
Hadoop is now running, and we are ready to run Hadoop jobs.