Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- (Extract from Dharmesh Kakdia's - "Apache Mesos Essentials")
- Installing Hadoop on Mesos
- ==========================
- Hadoop on Mesos (https://github.com/mesos/hadoop) relies on the extension
- to Mesos, such as the Mesos executor to execute TaskTrackers and Hadoop
- JobTracker Mesos Scheduler to run Hadoop on a Mesos framework. We will run
- Hadoop 1.x on Mesos:
- 1. Install and run Mesos by following the instructions in Chapter 1,
- Running Mesos.
- 2. We need to compile Hadoop on the Mesos library. Hadoop on Mesos uses
- Maven to manage dependencies, which we will need to install along with
- Java and Git:
- ubuntu@master:~ $ sudo apt-get install maven openjdk-7-jdk git
- 3. Let's clone Hadoop on the Mesos source code from https://github.com/
- mesos/hadoop, and navigate to it using the following command:
- ubuntu@master:~ $ git clone https://github.com/mesos/hadoop/
- ubuntu@master:~ $ cd hadoop
- 4. Build Hadoop on the Mesos binaries from the code using the following
- command. By default, it will build the latest version of Mesos and Hadoop.
- If required, we can adjust the versions in the pom.xml file:
- ubuntu@master:~ $ mvn package
- This will build hadoop-mesos-VERSION-jar in the target folder.
- 5. Download the Hadoop distribution, extract, and navigate to it. We can use
- vanilla Apache distribution, Cloudera Distribution Hadoop (CDH), or any
- other Hadoop distribution. We can download and extract the latest CDH
- distribution using following commands:
- ubuntu@master:~ $ wget http://archive.cloudera.com/cdh5/cdh/5/
- hadoop-2.5.0-cdh5.2.0.tar.gz
- ubuntu@master:~ $ tar xzf hadoop-*.tar.gz
- 6. We need to put Hadoop on Mesos jar that we just built in a location where
- it's accessible to Hadoop via Hadoop CLASSPATH. We will copy it
- to the Hadoop lib folder, which is by default the lib folder in share/
- hadoop/common/ inside hadoop distribution:
- ubuntu@master:~ $ cp hadoop-mesos/target/hadoop-mesos-*.jar
- hadoop-*/share/hadoop/common/lib
- 7. By default, the CDH distribution is configured to use MapReduce Version 2
- (MRv2) with YARN. So, we need to update it to point to MRv1:
- ubuntu@master:~ $ cd hadoop-*
- ubuntu@master:~ $ mv bin bin-mapreduce2
- ubuntu@master:~ $ ln –s bin-mapreduce1 bin
- ubuntu@master:~ $ cd etc;
- ubuntu@master:~ $ mv hadoop hadoop-mapreduce2
- ubuntu@master:~ $ ln –s hadoop-mapreduce1 hadoop
- ubuntu@master:~ $ cd -;
- Optionally, we can also update examples to point to the MRv1 examples:
- ubuntu@master:~ $ mv examples examples-mapreduce2
- ubuntu@master:~ $ ln –s example-mapreduce1 examples
- www.it-ebooks.info
- 8. Now, we will configure Hadoop to recognize that it should use the Mesos
- scheduler that we just built. Set the following mandatory configuration
- options in etc/hadoop/mapred-site.xml by adding them to the
- <configuration> and </configuration> tags:
- <property>
- <name>mapred.job.tracker</name>
- <value>localhost:9001</value>
- </property>
- <property>
- <name>mapred.jobtracker.taskScheduler</name>
- <value>org.apache.hadoop.mapred.MesosScheduler</value>
- </property>
- <property>
- <name>mapred.mesos.taskScheduler</name>
- <value>org.apache.hadoop.mapred.JobQueueTaskScheduler
- </value>
- </property>
- <property>
- <name>mapred.mesos.master</name>
- <value>zk://localhost:2181/mesos</value>
- </property>
- <property>
- <name>mapred.mesos.executor.uri</name>
- <value>hdfs://localhost:9000/hadoop.tar.gz</value>
- </property>
- We specify Hadoop to use Mesos for scheduling tasks by specifying the
- mapred.jobtracker.taskScheduler property. The Mesos master address
- is specified via mapred.mesos.master, which we have set to the local
- ZooKeeper address. mapred.mesos.executor.uri points to the Hadoop
- distribution path that we will upload to HDFS, which is to be used for
- executing tasks.
- 9. We have to ensure that Hadoop on Mesos is able to find the Mesos native
- library, which by default is located at /usr/local/lib/libmesos.so.
- We need to export the location of the Mesos native library by adding the
- following line at the start of the bin/hadoop-daemon.sh script:
- export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so
- 10. We need to have a location where the distribution can be accessed by
- Mesos, while launching Hadoop tasks. We can put it in HDFS, S3, or any
- other accessible location, such as a NFS server. We will put it in HDFS, and
- for this, we need to install HDFS on the cluster. We need to start the
- Namenode daemon on the HDFS master node. Note that HDFS master
- node is independent of the Mesos master. Copy the Hadoop distribution
- to the node, and start the Namenode using the following command:
- ubuntu@master:~$ bin/hadoop-daemon.sh start namenode
- We need to start the Datanode daemons on each node that we want to make
- a HDFS slave node (which is independent of the Mesos slave node). Copy the
- created Hadoop distribution to all HDFS slave nodes, and start the Datanode
- on each using the following command:
- ubuntu@master:~$ bin/hadoop-daemon.sh start datanode
- We need to format the Namenode for the first usage with the following
- command on the HDFS master, where the Namenode is running:
- ubuntu@master:~$ bin/hadoop namenode -format
- 11. Hadoop is now ready to run on Mesos. Let's package it and upload it on
- HDFS:
- ubuntu@master:~ $ tar cfz hadoop.tar.gz hadoop-*
- ubuntu@master:~ $ bin/hadoop dfs -put hadoop.tar.gz /hadoop.tar.gz
- ubuntu@master:~ $ bin/hadoop dfs –chmod 777 /hadoop.tar.gz
- 12. Now, we can start the JobTracker. Note that we don't need to start
- TaskTracker manually, as this will be started by Mesos when we submit a
- Hadoop job:
- ubuntu@master:~ $ bin/hadoop jobtracker
- Hadoop is now running, and we are ready to run Hadoop jobs.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement