lvalnegri

geoserver_nominatim_installation.md

Apr 30th, 2020 (edited)
668
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!

Introduction

When doing geo-analytics, you often need, for example, to geocode thousands of addresses, if not hundreds of thousands or even millions, and you want the process obviously to be an automated backend operation. We all know that [Google Maps]() is the gold standard for this job, but it's fairly expensive out of its free quota, and its API conditions are quite strict, as you are supposed only to geocode addresses you will be displaying in conjunction with a Google map. Moreover, it doesn't easily accept bulk geocoding.

Here comes a private geoserver based on open source efforts. The official instructions for installing Nominatim are fairly complete, but brief in places and a bit scattered around different pages, and some steps must be changed or reordered in order to get ASAP to the end of the installation, and ready to geocode!

The following notes assumes that you have a VPS running Ubuntu 20.04LTS and Postgres 12, with a sudoer already created, even if a specific nominatim user will be created as well for sole server purpose. Never, ever run the installation script as root. You have been warned.

The server directory is: /srv/nominatim
The server username is: nominatim
The data will be donwloaded in: /srv/nominatim/Nominatim-3.5.1/data/ (but check version in folder name!)
The software will be installed from: /srv/nominatim/build/
The machine specs that I've used for the process are: 6vCPUs, 16GBRAM, 80GB SSD.

Data have been limited to Italy (1.5GB file), and the process lasted 200 minutes. Notice that after the first step of setting up the db server and loading the data, the process is divided in 30 so called rank, where the two ranks numbered as 26 and 30 take most of the time (one third each more or less), with an ETA showing. That's where you should focus to understand if your machine is correctly specced for the job (it should take 7 days for the entire planet, I had ~3K seconds for each of the above two ranks for Italy only). If the machine is very under-specced, you don't even get to calculate the first rank!

Create and Setup droplet

  • From the Basic plan, choose 4 GB RAM / 2 vCPUs / 80 GB SSD Disk with a $20/mo price tag at the time of writing. This is an OKish machine to start for a single country like Italy, it will be resized to a 6vCPU / 8GB RAM later before running the script (and downsized back again when the script has finished). If you plan more countries or a bigger country, start with a bigger storage, more or less proportional to the number and type of countries involved. The entire planet need up to 1TB, depending on potential additional data like postcodes, wikipedia, ...

  • if you have a domain to attach, do it now. Remember to provide DO nameservers to the original provider.

  • enable monitoring from the DO dashboard:

    curl -sSL https://repos.insights.digitalocean.com/install.sh | bash
  • update the timezone:

    dpkg --configure -a
    dpkg-reconfigure tzdata
  • upgrade the system:

    apt-get update
    apt-get -y full-upgrade
    apt-get -y autoremove
    reboot
  • change ssh port, you need to open the file /etc/ssh/sshd_config for editing, then uncomment the following line:

    #Port 22

    changing the default value 22 with another of your choice. Remember to restart the ssh service:

        sudo systemctl restart ssh
  • enable firewall, then open ssh + http ports

    ufw enable
    ufw allow http
    ufw allow XXXX

    where XXXX is the number you include in the above ssh configuration

  • install dependencies (cmake, libxml2, c++ compiler, build-essential, osm2pgsql, postgresql, postgis, apache, php, php-pgsql)

    sudo apt-get install -y acl build-essential clang-tidy cmake g++ git libboost-dev libboost-system-dev \
                        libboost-filesystem-dev libexpat1-dev zlib1g-dev libbz2-dev libpq-dev libproj-dev libicu-dev \
                        postgresql-server-dev-12 postgresql-12-postgis-3 postgresql-contrib postgresql-12-postgis-3-scripts \
                        apache2 php php-pgsql libapache2-mod-php php-cgi php-intl \
                        python3-setuptools python3-dev python3-pip python3-psycopg2 python3-tidylib \
                        python3-psutil python3-jinja2 python3-icu python3-argparse-manpage 
  • create admin user:

    adduser usrname
    usermod -aG sudo usrname
    su - usrname

POSTGRES configuration

  • Create postgres users:

    sudo -u postgres createuser -s nominatim
    sudo -u postgres createuser www-data
  • Open the postgres server file configuration:

    sudo nano /etc/postgresql/12/main/postgresql.conf
  • Edit the following parameters:

    shared_buffers = 2GB
    work_mem = (50MB)
    maintenance_work_mem = (10GB)
    autovacuum_work_mem = 2GB
    fsync = off
    synchronous_commit = off
    full_page_writes = off
    checkpoint_timeout = 10min
    max_wal_size = 1GB 
    checkpoint_completion_target = 0.9
    effective_cache_size = (24GB)

    The above settings are in the actual order as in the file.

    Settings in parenthesis are for 64GB RAM, and should be adapted considering the actual RAM available.

    Some settings are commented in the default file.

    fsync and full_page_writes should be turned ON after installation.

  • Restart the postgresql service after updating the config file:

    sudo systemctl restart postgresql

APACHE configuration

  • Add to the configuration file for the website:

    sudo tee /etc/apache2/conf-available/nominatim.conf << EOFAPACHECONF
    <Directory "/srv/nominatim/build/website">
      Options FollowSymLinks MultiViews
      AddType text/html   .php
      DirectoryIndex search.php
      Require all granted
    </Directory>
    
    Alias /nominatim /srv/nominatim/build/website
    EOFAPACHECONF
  • Enable the configuration:

    sudo a2enconf nominatim
  • Reload the server to activate the new configuration:

    sudo systemctl reload apache2

Software

  • switch to the root user:

    su -
  • create geoserver own user:

    useradd -d /srv/nominatim -s /bin/bash -m nominatim
  • download source code:

    cd /srv/nominatim/
    wget -O nominatim.tar.bz2 https://nominatim.org/release/Nominatim-3.7.0.tar.bz2
    tar xf nominatim.tar.bz2
  • build the server software:

    mkdir build
    cd build
    cmake /srv/nominatim/Nominatim-3.7.0
    make
    make install
  • create a minimal configuration file (the reference below should be the same as in the Apache alias above:

    tee /srv/nominatim/build/settings/local.php << EOF
        <?php
        @define('CONST_Website_BaseURL', '/nominatim/');
    EOF
  • download data:

    cd /srv/nominatim/Nominatim-3.7.0/data/
    wget -O gb.osm.pbf https://download.geofabrik.de/europe/great-britain-latest.osm.pbf
    wget https://www.nominatim.org/data/gb_postcode_data.sql.gz
  • shutdown the machine and upsize the droplet. In the current process, involving only one country, an upsize to 16vCPU / 8GB RAM` is sufficient, with the entire process lasting less than 4 hours

  • once the new machine is available, turn it on, ssh into it, then run the script to create the database:

    sudo su nominatim
    cd /srv/nominatim/build/
    ./utils/setup.php --drop --osm-file /srv/nominatim/Nominatim-3.7.0/data/gb.osm.pbf --all 2>&1 | tee ../setup.log

    Notice that if no data extract file is included in the command, the script will assume the entire planet!

    If you find yourself in any need of rerun the script, you must first drop the database (while logged in as nominatim):

    dropdb nominatim
  • when finished, shutdown the machine and downsize the droplet

  • now try to connect to http://ip_address/nominatim and you should see the same home page as http://ip_address/nominatim

  • you can also install R/RStudio Server and build an API that can provide bulk automated feedback. See ...

Add Comment
Please, Sign In to add comment