Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- {
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Control Cluster Creation via API"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": true
- },
- "outputs": [],
- "source": [
- "import digitalocean\n",
- "import base64\n",
- "import paramiko\n",
- "from datetime import datetime"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Init API\n",
- "* configure **token_secret**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": true
- },
- "outputs": [],
- "source": [
- "token_secret=\"REPLACE_WITH_YOUR_TOKEN\"\n",
- "manager = digitalocean.Manager(token=token_secret)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Get snapshot image\n",
- "* Prebuilt spark slave and take snapshot\n",
- "* Get the built image id into **image_id**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "snapshots = manager.get_my_images()\n",
- "image_id = snapshots[0].id\n",
- "print(snapshots[0].id)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Create instances\n",
- "* Based on snapshot image retrieved from above\n",
- "* Specify the **total_instance** to be created\n",
- "* size and region can be configured too\n",
- "* Assumed here, you have ssh key created"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "##create instance\n",
- "keys = manager.get_all_sshkeys()\n",
- "total_instance = 5\n",
- "size_slug='s-1vcpu-2gb'\n",
- "region='sgp1' # Singapore\n",
- "names = []\n",
- "for i in range(total_instance):\n",
- " num = i\n",
- " if i < 9:\n",
- " num = \"0\"+str(num)\n",
- " else:\n",
- " num = str(num)\n",
- " names.append('hadoop-spark-slave-'+num)\n",
- "output = digitalocean.Droplet.create_multiple(\n",
- " token=token_secret,\n",
- " names=names,\n",
- " region=region, \n",
- " image=image_id, \n",
- " #size_slug='s-2vcpu-2gb', #2GB RAM, 2CPU\n",
- " size_slug=size_slug,\n",
- " backups=False,\n",
- " ssh_keys=keys,\n",
- " ipv6=True,\n",
- " private_networking=True,\n",
- " tags=['hadoop-slave']\n",
- ")\n",
- "print(output)\n",
- "print(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Get IP addresses of the instances\n",
- "* Here, assumed all droplets are part of cluster\n",
- "* Master is tagged with **hadoop**\n",
- "* Slave is tagged with **hadoop-slave**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": true
- },
- "outputs": [],
- "source": [
- "my_droplets = manager.get_all_droplets()\n",
- "slaves = []\n",
- "master = {}\n",
- "slaves_str = ''\n",
- "for i in my_droplets:\n",
- " slaved = {}\n",
- " #print(i.tags)\n",
- " slaved = {'name':i.name,'id':i.id,'pure_slave':False}\n",
- " if 'hadoop-slave' in i.tags:\n",
- " slaved['pure_slave'] = True\n",
- " \n",
- " for j in i.networks['v4']:\n",
- " if j['type'] == 'private':\n",
- " #print(j['ip_address'] + \" #\" + i.name)\n",
- " slaves_str = slaves_str + j['ip_address'] + \" #\" + i.name + \"\\n\"\n",
- " slaved['private_ip'] = j['ip_address']\n",
- " else:\n",
- " slaved['public_ip'] = j['ip_address']\n",
- " if slaved['pure_slave']:\n",
- " slaves.append(slaved)\n",
- " else:\n",
- " master = slaved"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Init SSH\n",
- "* Replace the path of private key file (the spark.pem below)\n",
- "* Replace the hadoop and spark base path"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": true
- },
- "outputs": [],
- "source": [
- "k = paramiko.RSAKey.from_private_key_file(\"/Users/mahadir/.ssh/spark.pem\")\n",
- "user = 'root'\n",
- "hadoop_path = '/root/hadoop-3.1.0'\n",
- "spark_path = '/root/spark-2.3.0-bin-hadoop2.7'\n",
- "spark_master_port = 7077\n",
- "\n",
- "def send_command(private_ip,public_ip,user,k,cmds):\n",
- " print(\"connecting to \"+public_ip+\"..\")\n",
- " client = paramiko.SSHClient()\n",
- " client.set_missing_host_key_policy(paramiko.AutoAddPolicy())\n",
- " client.connect(public_ip, username = user, pkey = k)\n",
- " for cmd in cmds:\n",
- " print(cmd)\n",
- " stdin, stdout, stderr = client.exec_command(cmd)\n",
- " for line in stdout:\n",
- " print('... ' + line.strip('\\n'))\n",
- " for line in stderr:\n",
- " print('... ' + line.strip('\\n'))\n",
- " client.close()\n",
- " print(\"closed connection to \"+public_ip+\"..\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Start Master Node\n",
- "* You may skip this step if the master node already up and running with spark & hadoop\n",
- "* Note that I only have one hdfs instance and I don't start spark slave process here"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "#start master only\n",
- "cmds = [\n",
- " spark_path+'/sbin/start-master.sh',\n",
- " hadoop_path+'/sbin/start-dfs.sh',\n",
- " hadoop_path+'/sbin/start-yarn.sh'\n",
- " ]\n",
- "private_ip = master['private_ip']\n",
- "public_ip = master['public_ip']\n",
- "send_command(private_ip,public_ip,user,k,cmds)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Start Slave Nodes\n",
- "* Only start spark slaves"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "#start apache spark on slaves only\n",
- "for node in slaves:\n",
- " if node['pure_slave'] == False:\n",
- " continue\n",
- " private_ip = node['private_ip']\n",
- " public_ip = node['public_ip']\n",
- " cmds = [spark_path+'/sbin/start-slave.sh spark://'+master['private_ip']+':'+str(spark_master_port)]\n",
- " send_command(private_ip,public_ip,user,k,cmds)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Delete droplets\n",
- "* Delete only the droplets tagged with **hadoop-slave**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": true
- },
- "outputs": [],
- "source": [
- "###delete instance\n",
- "disposble_droplets = manager.get_all_droplets(tag_name='hadoop-slave')\n",
- "for i in disposble_droplets:\n",
- " i.destroy()\n",
- "print(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.5.3"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
- }
Add Comment
Please, Sign In to add comment