lvalnegri

Setup Analytics Cloud Machine.md

Mar 12th, 2018
311
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Markdown 27.06 KB | None | 0 0

System

While the following notes will apply to any unmanaged servers with full root access, the providers used here is Digital Ocean for its good balance of price, performance, usability, and scalability (the previous link is a referral link that gives you $10 in credits, enough to give you a viable first server ‒ 2GB-1CPU ‒ for at least one month). Note that monthly billed plans don't usually allow to scale the machine, neither up nor down, which loses the magic of cloud computing.

Create droplet

  • choose Linux Distribution, these notes are based on Ubuntu 16.04 LTS but any other distro would work smoothly, apart from the package management commands

  • choose CPU & RAM, for better results I suggest at least a 2GB RAM during the installation, you'll always be capable to change the size of your droplet at any time

  • choose the Region related to most of your traffic

  • choose a recognizable hostname (the default should be: keyname-distroname-ncpu-ngb-reg). This has nothing to do with a possible future domain name, it is simply a mnemonic name for you.

  • it's OK to skip all of the other options: Block Storage, Additional Options, and SSH keys

  • click Create to let the system start building the VPS

  • In the new page that shows up, wait for a small green dot to appear on the left of the hostname in the list of droplets. This will indicate that your new VPS is up and running. Meanwhile, check for a new email from Digital Ocean, containing all the info you need to access the VPS via SSH: server ip and password. the user will always be root.

First time login

Linux and MacOS users

For Windows users, I suggest you use the software MobaXTerm, which has a more modern look and feel than the usually putty. It also have the possibility to save sessions.

Once connected, write these commands in order, to let the system in line with all the updates:

apt-get update
apt-get upgrade
apt-get dist-upgrade

Before starting to configure the server, run the date command to check if the clock is configured with the right timezone. If not, run the following command to fix it:

dpkg-reconfigure tzdata

Create SSH key pair

Windows

I suggest you use the software MobaXTerm, which has a more modern look and feel than the usual putty.

After installed, run the software and follow these steps:

    Tools > 
    MobaKeyGen > 
    (leave parameters as default) > 
    Generate > 
    Move the mouse around in the big empty area over the **Generate** button >
    insert a password twice in the textboxes called **passphrase** (you can generate one suitable [here](https://www.random.org/passwords/?num=1&len=15&format=html&rnd=new) >
    Save both public and private keys >
    Close

Any public key can be subsequently regenerated countless times loading first the corresponding private key.

Linux/MacOS

  • Open the terminal, and run the command
    ssh-keygen

    You can safely press enter and use the suggested path and filename root, unless you have a reason to change it. These will create two files id_rsa.pub and id_rsa.pk in the ~/.ssh/ directory, that will be created automatically if not found.

  • run the command:
    cat ~/.ssh/id_rsa.pub

    to print the public key on the screen, copy it and paste it into the text area.

Add admin user (to substitute root)

  • create new user (change usrname with the actual user name)

    adduser usrname

    Enter a password twice (generate one suitable here, and then the required information (you can simply void all fields)

  • add the new user to the sudo group

    usermod -aG sudo usrname
  • change shell to login as the new user

    su - usrname
  • check that the new user can actually run admin commands

    sudo su

    You should see the first part of the prompt changing from usrname to root.

  • exit sudo (always remember to exit sudo!)

    exit

    From now on you should forget there exists a user called root, and always use usrname to run admin commands.

  • associate the desired credential process to the new user

    • using the user's password:

      • Open SSH configuration file (if nano: command not found then sudo apt-get install nano)
        sudo nano /etc/ssh/sshd_config
      • find and change the following lines (to save the file use: CTRL+x ==> y ==> Enter)
        PasswordAuthentication yes
      • Restart the service
        sudo systemctl restart ssh
    • using a ssh key-pair:

      • generate a new SSH key (see again the above instructions using MobaXTerm under Windows)
      • add the public key to the server
      • switch shell to the user:
        su - usrname
      • add the directory that will contain the SSH keys:
        mkdir ~/.ssh
      • change permission on the directory:
        chmod 700 ~/.ssh
      • paste the content of the PUBLIC key in the appropriate container file (if nano: command not found then sudo apt-get install nano):
        nano ~/.ssh/authorized_keys
      • change permission on the key file
        chmod 600 ~/.ssh/authorized_keys
      • test that the new user is capable to ssh into the machine.
      • once everything works fine, block the possibility to login into the system using a password:
      • Open SSH configuration file
        sudo nano /etc/ssh/sshd_config
      • find and change the following lines (uncomment if needed), then save the file (CTRL+x ==> y ==> Enter)
        PasswordAuthentication no
      • Restart the service
        sudo systemctl restart ssh

Add security

  • deny root access via SSH:

    # Open SSH configuration file
    sudo nano /etc/ssh/sshd_config
    # Change and Insert the following lines, then save the file (CTRL+x ==> y ==> Enter) 
    PermitRootLogin no
    # Restart the service 
    sudo systemctl restart ssh

    Now test that the root user is NOT capable to ssh into the machine

  • change the standard ssh port 22 to a random integer number between 1024 and 65535 (see here for a list of known ports used by various services)

    # Open SSH configuration file
    sudo nano /etc/ssh/sshd_config
    # Change the following line as desired, then save the file (CTRL+x ==> y ==> Enter)
    Port xxxx
    # Restart the service 
    sudo systemctl restart ssh

    Now, without logging out from the current session, test that the new user is capable to ssh into the machine using the new port, but not from the standard 22.

  • enable firewall ufw allowing at once the new above port (THIS IS IMPORTANT!!!)

    # enable firewall
    sudo ufw enable
    # allow the ssh port (the above xxxx if it's been changed, or the standard 22 if it's not been changed)
    sudo ufw allow xxxx
    # check if the rule has been correctly applied, check again the number is correct!
    sudo ufw status

    Now, using a different session as earlier, test that the new user is still capable to ssh into the machine.

If anything happens, and you can't login anymore through SSH, most VPS providers allows to open a shell from the dashboard account. For example, on DO at the top right of the droplet dashboard there is a Console button which allows to login directly using password authentication. If you lack the root password, or you've never set it, from the Access item on the left menu it's possible to reset it. Once you log in, if not ask by the system itself, you should reset the password again using the following commands:

sudo -i
passwd

Webmin

  • Add the Webmin repository to the packet manager sources list:
    echo -e "\n# WEBMIN\ndeb http://download.webmin.com/download/repository sarge contrib\n" | sudo tee -a /etc/apt/sources.list    
  • add the public key of the Webmin developer Jamie Cameron to secure the package manager, then run an update command to let apt aware of the changes so it includes the Webmin package:
    wget http://www.webmin.com/jcameron-key.asc
    sudo apt-key add jcameron-key.asc
    sudo apt-get update
  • install webmin:
    sudo apt-get install webmin
  • allow access to the standard 10000 port:
    sudo ufw allow XXXX
  • navigate to the URL https://server_ip:10000/, don't worry for now about the warnings, then enter the username and password to log in into the Webmin console
  • change Default Port to some random number XXXX:
    Webmin > 
        Webmin Configuration >
        Ports and Addresses >
        Listen on IPs and ports >
        Listen on port

    Also:

    • check NO for Accept IPv6 connections?
    • check Don't listen for Listen for broadcasts on UDP port
  • allow access to the new XXXX port, and delete the previous rule for the standard 10000 port:
    sudo ufw delete allow 10000
    sudo ufw allow XXXX

Nginx

Nginx is a free, open-source, high-performance HTTP server software, that also works as a proxy, load balancer, and Reverse Proxy. It's been developed with the intention to run on small resources, yet with the capacity to handle a large volume of concurrent connections. For these reasons, it is a great alternative to the more commonly used Apache web server.

  • Just in case Apache is already installed, stop the server and remove the package:
    sudo systemctl stop apache2
    sudo apt-get remove -y apache*
  • install the server:
    sudo apt-get install nginx
  • check that Nginx is in the list of applications that have registered their profile(s) with the firewall
    sudo ufw app list

    There should now be one called 'Nginx Full', that open both ports 80 (for unencrypted traffic) and 443 ( For SSL/TSL encrypted traffic). Let's allow it by executing the following command:

    sudo ufw allow 'Nginx Full'
  • Now verify whether changes have been made:
    sudo ufw status verbose
  • After the completion of the installation process, the Nginx web server should start and run automatically. To ensure that the service is actually up and running, run the following command:
    sudo systemctl nginx status

    To test that the service is actually working, enter the server_ip directly into the browser's address bar, and you should see the default Nginx landing page.

  • To start, stop, restart, or reload the web server, run respectively the following standard command:
    sudo systemctl stop nginx
    sudo systemctl start nginx
    sudo systemctl restart nginx
    sudo systemctl relaod nginx
  • Nginx is automatically started when the server boots. To avoid it, simply disable the server:
    sudo systemctl disable nginx

    and enable it again if you want to start the Nginx server at boot:

    sudo systemctl enable nginx
  • The following are the location and names of the configuration and logs files:
    • /etc/nginx the Nginx parent directory that contain all the server configuration file
      • /etc/nginx/nginx.conf the main configuration file of Nginx
      • /etc/nginx/sites-available/ you can store the "server blocks" in this directory. It has the configuration files which will not be used until they are linked with sites-enable directory.
      • /etc/nginx/sites-enabled/ This directory stores the "server blocks". They link to the configuration file in the sites-available directory.
      • /etc/nginx/snippets/ Here the configuration fragments are stored and they can be used anywhere in the Nginx Configuration. If you are using specific configuration segments repeatedly, then they can be added to this directory.
    • /var/log/nginx/ the Nginx parent directory for the server log files
      • /var/log/nginx/access.log stores all the entry requests to the web server (it has to be configured to do that).
      • /var/log/nginx/error.log Nginx errors are recorded in this file
    • /var/www/html/ the default directory for the content of the website(s)

The default Nginx installation will have only one default server block, enabled with a document root set to /var/www/html/. It is possible to add as many blocks as desired as follows:

  • create a new domain document root:
    sudo mkdir -p /var/www/newdomain.com
  • in the above folder, create a basic welcome web page:
    sudo nano /var/www/newdomain.com/index.html

    like the following:

    <html>
        <head>
            <title>Welcome to the "newdomain.com>" nginx webserver!</title>
        </head>
        <body bgcolor="white" text="black">
            <center><h1>newdomain.com is working!</h1></center>
        </body>
        </html>
  • create a new server block:

    sudo nano /etc/nginx/sites-available/newdomain.com.conf

    and add the following content:

    server {
        listen 80;
        listen [::]:80;
        server_name newdomain.com www.newdomain.com;
        root /var/www/newdomain.com;
    
        index index.html;
    
        location / {
            try_files $uri $uri/ =404;
        }
    }
  • Activate the server block by creating a symbolic link in the list of available websites:
    sudo ln -s /etc/nginx/sites-available/newdomain.com.conf /etc/nginx/sites-enabled/newdomain.com.conf
  • eventually, test that the above configuration is actually correct:
    sudo nginx -t
  • restart the nginx web server:
    sudo systemctl restart nginx
  • check with the browser that newdomain.com is working as desired.

Domain Name System (DNS)

An IP address is a numeric string of numbers that identifies and locate a particular computer/server/router/device connected to a TCP/IP network (private local and wide, or the public internet). The most used version of the protocol is still the version 4. IPv4 addresses may be represented in any notation expressing a 32-bit integer, but the typical format is the so-called Dot-decimal notation that consists of a string of four integer between 0 and 255, each pair separated by a full stop, like www.xxx.yyy.zzz. It's worth noting that some strings have been reserved for special purposes, like 127.0.0.1 for the local machine, usually called localhost, and 10.xxx.yyy.zzz, 172.16/31.yyy.zzz, 192.168.yyy.zzz for private networks, whose packets are ignored by all public routers. In the foreseeable future, IPv4 will be replaced by IPv6.

A domain name is a human readable name which identifies an IP address. Every domain name must use a Top Level Domain, or TLD, such as .com, .org, ... (notice that strings like .co.uk, .ac.uk,... are not proper TLDs, but instead Second-level Domains). Domain names are case insensitive, should contain only alphanumeric character, plus dash and hyphen, and must be shorter than 64 characters.

A nameserver is a repository of lookups between domain names and cooresponding ips ...
Notice that VPS Providers and Domains Registrar have each their own way to manage and store the information on how to point domain names to ips. Moreover, any information is ... to get outdated because the steps required keep changing frequently.

  • Sign up at freenom (or any other providers you're happy with), and grab an available domain for free.

  • Under Services > My Domains click the Manage Domain button of the domain you want to move.

  • Under Management Tools > Nameservers click on Use custom nameservers and enter the following strings:

    • ns1.digitalocean.com
    • ns2.digitalocean.com
    • ns3.digitalocean.com
  • Log in into Digital Ocean, and open the Networking > Domains menu item

  • Under the A tab: In the Domain text-box inside the Add a domain sub-panel enter the above domain name you've just grabbed from freenom, while in the Droplet text-box you should have the possibility to choose directly the droplet you've just created.

  • Under the CNAME tab:

Encryption https

Web pages are identified by a [URL]() and are mostly accessed using a browser through [HTTP]() requests. To ensure encryption of the information sent back and forth between the client and the server, it is highly recommended to install a [SSL]() certificate and use the [HTTPS]() protocol instead.

PHP

Docker

Node.js

KVM

LaTeX


R

Core R

  • add the CRAN repository to the list used by the Ubuntu package manager apt:

    echo -e "\n# CRAN REPOSITORY\ndeb http://cran.rstudio.com/bin/linux/ubuntu xenial/\n" | sudo tee -a /etc/apt/sources.list

    The above command:

    • presumes that the installed OS version is 16.04 LST. For different versions, change the word xenial with the correct adjective using this list as a reference. In particular, the newest LTS version 18.04 is named bionic.
    • connects to cran.rstudio.com, which is the the generic redirection service from RStudio, but it's also possible to switch to a static closer location (according to the chosen VM region, not the user's location!) using this list.
  • add the public key of the CRAN maintainer Michael Rutter to secure the package manager, then run an update command to let apt aware of the changes:

    gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9
    gpg -a --export E084DAB9 | sudo apt-key add -
    sudo apt-get update
  • install R:

    sudo apt-get install r-base r-base-dev

RStudio Server

  • install prerequisites Ubuntu libraries:

    sudo apt-get install gdebi-core
    sudo apt-get install libapparmor1
  • download and install Rstudio Server:

    wget -O rstudio https://s3.amazonaws.com/rstudio-dailybuilds/rstudio-server-1.1.442-amd64.deb 
    sudo gdebi rstudio

    Please note that the above command presumes the preview version at the time of writing. It's worth verifying the newest version visiting this page, and in case substitute. Moreover, if you prefer to stay on the safer side and want to install the stable release change s3.amazonaws.com/rstudio-dailybuilds with download2.rstudio.org, checking instead this page for the newest version.

  • test that the server is actually running browsing to http://server_ip:8787/

  • modify some configurations for security and access (to dig deeper see this page)

    # open the configuration file
    sudo nano /etc/rstudio/rserver.conf
    # add the following line to change the port
    www-port=xxxx
    # restart the server
    sudo rstudio-server restart
    # open the port in firewall
    sudo ufw allow xxxx

    Now, check that the server is actually working opening the URL http://server_ip:xxxx

  • check that git is recognized:
    Tools >
    Global Options >
    Git/SVN >
    Enable version control interface for RStudio projects must be checked >
    Git executable should point to /usr/bin/git

  • create a GitHub token to use instead of the password

  • add a SSH key to the GitHub account

Shiny Server

  • Install first the shiny package from inside R with admin privileges (see below for the reason):

    sudo su
    R
    install.packages('shiny')
    q()
    exit
  • download and install Shiny Server:

    wget -O shiny https://s3.amazonaws.com/rstudio-dailybuilds/rstudio-server-1.1.442-amd64.deb 
    sudo gdebi shiny
  • modify some configurations for access (to dig deeper see [this page]())

    # open the configuration file
    sudo nano /etc/shiny-server/shiny-server.conf
    # change the following line adding 127.0.0.1
    listen 3838 127.0.0.1
    # restart the server
    sudo shiny-server restart
    # open the port in firewall
    sudo ufw allow 80

Packages

All packages should be installed with admin privileges, to ensure a unique shared library between all normal users and the shiny user, and avoid duplication and possible mismatches in versions. The single installation line could be replaced by the following in case of multiple installations:

dep.pkg <- c(...) # list of packages
pkgs.not.installed <- dep.pkg[!sapply(dep.pkg, function(p) require(p, character.only = TRUE))]
if( length(pkgs.not.installed) > 0 ) install.packages(pkgs.not.installed, dependencies = TRUE)
  • install prerequisite Ubuntu libraries
    • devtools:
      sudo apt-get install curl libssl-dev libcurl4-gnutls-dev
    • RMySQL:
      sudo apt-get install libmysqlclient-dev
    • rgdal/rgeos/spdplyr (in this order):
      sudo add-apt-repository ppa:ubuntugis/ppa 
      sudo apt-get update 
      sudo apt-get install gdal-bin
      sudo apt-get install libgdal-dev libgeos-dev libproj-dev 
    • sf (must be installed AFTER previous deps):
      sudo apt-get install libudunits2-dev
    • geojsonio/tmap (must be installed AFTER previous deps):
      sudo apt-get install libv8-3.14-dev
    • Cairo/gdtools:
      sudo apt-get install libcairo2-dev libxt-dev
    • RccpGSL:
      sudo apt-get install libgsl0-dev
    • GMP:
      sudo apt-get install libgmp3-dev 
    • rgl:
      sudo apt-get install r-cran-rgl libcgal-dev libglu1-mesa-dev libglu1-mesa-dev
    • rJava:
      sudo apt-get install openjdk-8-*
      sudo apt-get install r-cran-rjava
      sudo R CMD javareconf
    • nloptr (dependency, the MIT website some R packages relies on for the download is often down):
      sudo apt-get install libnlopt-dev 

Before proceeding, it is important to install devtools as the first package. Even if not directly needed for installing packages from CRAN, some of these packages need to install dependencies that need to be compiled from source.

sudo su
R
install.packages('devtools')
q()
exit
  • install fundamental set of packages for data science

  • install fundamental set of packages for data science

  • install set of packages for data visualization and

Shared Repository

Additional Fonts


Relational Databases

[MySQL]()

Configuration

  • To know the list of all locations where MySQL searches for its configuration files simply execute:

    mysqld --help --verbose

    In the very first lines you will find a message with a list of all my.cnf locations it looks for.

  • To set the default charset to UTF-8 and the default storage engine to MYISAM, you want to add the following to my.cnf:

    [client]
    default-character-set=utf8
    
    [mysql]
    default-character-set=utf8
    
    [mysqld]
    collation-server = utf8_unicode_ci
    init-connect='SET NAMES utf8'
    character-set-server = utf8
    default-storage-engine=MYISAM

[PostgreSQL]()

[Microsoft SQL Server]()

NoSQL Databases

There are 4 basic types of NoSQL databases:

  • Key-Value It has a Big Hash Table of keys & values
  • Document-based It stores documents made up of tagged elements
  • Column-based Each storage block contains data from only one column
  • Graph-based A network database that uses edges and nodes to represent and store data

Redis (In-memory Key-Value Store)

MonetDBLite (The R version of the column-store pioneer)

MongoDB ()

Neo4j (Graph database for Network Analysis)

Hive (The SQL-like Hadoop database)

Despite providing SQL functionality, Hive does not provide interactive querying yet, it only runs batch processes on Hadoop

HBase (The key/value Hadoop store)

Unlike its sibling Hive, HBase operations run in real-time on its database rather than MapReduce jobs.

Influx Db (Time Series database platform for Metrics & Events)


Additional Languages

Python

# Update the ubuntu package manager 
sudo apt-get update
# Install language, development tools, and package manager
sudo apt-get install python3 python3-dev python3-pip
# ensure the package manager pip is the latest version; older versions may have trouble with some dependencies
sudo -H pip3 install --upgrade pip
# install seaborn prerequisite
sudo apt-get install python3-tk
# Install BLAS to improve Theano and Keras performance
sudo apt-get install libblas-dev
# Install python libraries using list from LV (all requires 1GB but xgboost that needs 2GB)
wget https://raw.githubusercontent.com/lvalnegri/whatever/master/python-libs.lst
python3 -m pip install --user -r python-libs.txt

Jupyter Notebook

  • Install the software:

    sudo apt-get install ipython ipython-notebook (?)
    sudo -H pip3 install jupyter (?)
    • Start the Notebook, using a different port than the standard 8888
      jupyter notebook --no-browser --port xxxx
  • Create a SSH Tunnel using Windows MobaXTerm:
    Tools >
    MobaSSHTunnel >
    New SSH tunnel >
    <Forwarded Port> = 8000, or whatever other port number, but be careful not to interfere with other services
    <SSH Server> = the IP of the droplet
    <SshUsername> = the user that started jupyter
    <SSH port> = 22 or a different port if it's been set up
    <Remote port> = 8888 or a different port if jupyter has been so instructed
    <Remote server> = localhost

    Save

    • On the line related to the click the key on the right under the Settings tab and attach the private key of the user.

Elixir

Scala


Additional Services

Next Cloud


|  service  | default |  new  |
| --------- | ------- | ----- |
| SSH       |      22 |   |
| JUPYTER   |    8888 |   |
| RSTUDIO   |    8787 |   |
| SHINY     |    3838 |    80 |
| MYSQL     |    3306 |   |

Disclaimer

I’m not a devOps or sysAdmin, and most of this document has been , so it’s very possible that some steps here are not the very best way of performing some tasks. If anyone has any comments on anything in this document, I’d love to hear about it!

Advertisement
Add Comment
Please, Sign In to add comment