Untitled

part 0.1
creating a VM instance


in google cloud, open the console :
https://console.cloud.google.com/home/

then click on the sandwich bar on top left -> compute engine -> WM instances
then create a new VM instance, choose a custom one with these settings :

- scroll through the regions and sub regions that have a V100 available, then when you find it, choose :
- 2 vCPU (to avoid nvidia V100 throttling)
- 3,75 GB RAM
- 15 GB HDD
- 1 GPU : Tesla V100 (maximum for free trial)
- system : ubuntu 18.04 LTS
- allow HTTP and HTTPS requests
- preemptible settings (-60% discount on the free credit consumption, cost in exchange of occasional power off of the instance, takes 2 minutes to start again, totally worth it :
to activate preemptible machine  :
       Click Management, security, disks, networking, sole tenancy.
       Under Availability policy, set the Preemptibility option to On. This setting disables automatic restart for the instance, and sets the host maintenance action to Terminate.
       Click Create to create the instance.
more details about preemptible here : https://cloud.google.com/compute/docs/instances/create-start-preemptible-instance


for information :
preemptible instances being much cheaper (if i remember well arround 650 dollars vs 1400 dollars per month of free credit consumed on a V100 VM instance), choosing preemptible instance when you create it is a no brainer.
From my experience, the VM instance will be stopped by Google at most 1 or 2, or very rarely 3 times per 24 hours, which leaves on average at least 5-10 hours to use it on a row before first stop.
And then, to restart the instance only takes 1 click and 2 minutes as we will see later, then you are good to go again for many hours.
Note that when preemptible instance is stopped by Google every while, the credit stops being consumed too because the instance is stopped, so you dont have to worry about efficiency of the credit.


part 0.2 :
preparations

go on google console (compute engine) -> VM instances -> click on SSH button to connect to the instance via SSH (embbed on chrome)

To read before starting :
The instance will be opened in a new chrome window that uses SSH protocol to connect to your VM instance.
To make copying commands easier, in the SSH chrome new window, go to : settings → copy paste with ctrl+shift+c/v : click ok
From now on, we will use ctrl+c to copy a command from this text, but ctrl+shift+v to paste it on the SSH window (because it is ubuntu terminal)


Finally, for information, before starting part 1.1 and part 1.2 :
About preemptible instances again, if, unfortunately, the instance was to be exited by google while we are installing system packages, the probability of system corruption due to exit during install of packages is high (parts 1.1 + 1.2 + 2), and if this very rare case happens, i advise you to delete the instance and recreate a new one, and hopefully (unless you are very unlucky, but then try again !) you will not be stopped in this new instance.


part 1 :
installing all drivers


sudo apt-get update && sudo apt-get -y upgrade && sudo apt-get -y dist-upgrade && sudo reboot


for information : after the first reboot, you lose connection with the VM instance : wait 1-2 minutes and retry 1 or 2 times, and it should reconnect again as long as server is ON (see this page to see if server is ON with a green circle, and to see if SSH is clickable) :
https://console.cloud.google.com/compute/instances
if "retry" does not work and the instance is still ON (green circle), then retry again until you succeed to connect via SSH again.


for information 2, before starting next command :
official website : https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa
Current long-lived branch release at the time of this tutorial : `nvidia-390` (390.48)
at the ppa website linked above.
if in the future Current long-lived branch gets an update, you should replace 390 (in nvidia-390 of the 1st command and in libnvi
dia-compute-390 of the 2nd command by the driver version number mentionned.
For example , if next driver update of nvidia "Current long-lived branch release" is nvidia-393, you would have to replace nvidia-390 and libnvidia-compute-390 by nvidia-393 and libnvidia-compute-393 without changing the rest of the command)


sudo add-apt-repository -y ppa:graphics-drivers/ppa && sudo apt-get update && sudo apt-get -y install nvidia-390 && sudo reboot


sudo apt-get -y install linux-headers-generic nvidia-opencl-dev libnvidia-compute-390 && sudo reboot


part 2 :
installing all other prerequired packages


sudo apt-get -y install clinfo cmake git libboost-all-dev libopenblas-dev zlib1g-dev build-essential qtbase5-dev qttools5-dev qttools5-dev-tools libboost-dev libboost-program-options-dev opencl-headers ocl-icd-libopencl1 ocl-icd-opencl-dev qt5-default qt5-qmake curl && clinfo


part 3 :
compiling leela z, autogtp, and running autogtp

select all commands below and copy/paste all the selection :


git clone https://github.com/gcp/leela-zero
cd leela-zero/src
make
cd ..
cd ./autogtp
qmake -qt5
make
cp ../src/leelaz .
./autogtp


and autogtp finally runs !
AutoGTP v16
Using 1 thread(s) for GPU(s).
Starting tuning process, please wait...
Net filename: networks/25c2313d8c11b9320de4795cf593f237f32e8a61c4524a6305ff30073b760132
net: 25c2313d8c11b9320de4795cf593f237f32e8a61c4524a6305ff30073b760132.
./leelaz --tune-only -w networks/25c2313d8c11b9320de4795cf593f237f32e8a61c4524a6305ff30073b760132
./leelaz --tune-only -w networks/25c2313d8c11b9320de4795cf593f237f32e8a61c4524a6305ff30073b760132
Using 2 thread(s).
RNG seed: 17613292517859834344
Detecting residual layers...v1...256 channels...40 blocks.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 CUDA 9.1.84
Platform profile: FULL_PROFILE
Platform name:    NVIDIA CUDA
Platform vendor:  NVIDIA Corporation
Device ID:     0
Device name:   Tesla V100-SXM2-16GB
Device type:   GPU
Device vendor: NVIDIA Corporation
Device driver: 390.87
Device speed:  1530 MHz
Device cores:  80 CU
Device score:  1112
Selected platform: NVIDIA CUDA
Selected device: Tesla V100-SXM2-16GB
with OpenCL 1.2 capability.

Started OpenCL SGEMM tuner.
Will try 290 valid configurations.
(1/290) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=16 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=2 0.0689 ms (3045.4 GFLOPS)
(2/290) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=2 0.0548 ms (3828.0 GFLOPS)
(5/290) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=2 0.0538 ms (3901.0 GFLOPS)
(21/290) KWG=16 KWI=2 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=2 0.0509 ms (4116.6 GFLOPS)
(27/290) KWG=32 KWI=2 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=2 0.0489 ms (4289.0 GFLOPS)
(88/290) KWG=32 KWI=8 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=2 0.0479 ms (4380.7 GFLOPS)
(131/290) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=4 VWN=2 0.0471 ms (4452.2 GFLOPS)
(135/290) KWG=16 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=4 VWN=2 0.0456 ms (4602.2 GFLOPS)
(205/290) KWG=32 KWI=2 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=4 0.0445 ms (4708.0 GFLOPS)
(238/290) KWG=32 KWI=8 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=4 0.0438 ms (4790.6 GFLOPS)
Tuning process finished


part 4 :
how to run autogtp again (if exited)


cd leela-zero/autogtp
./autogtp


part 5 :
monitoring of the VM


a) check if instance is ON or OFF, and manage your instance :
https://console.cloud.google.com/compute/instances
for information : your free credit does NOT get charged as long as the machine is off, the only point of staying always ON is computing as much selfplay games as possible.

as we said earlier, if the VM instance is stopped because of preemptive rules :
the free credit stopped being consumed and is not wasted, simply because the instance is now stopped (grey square)
to restart selfplay, you can restart the machine again, reconnect via SSH, and do again part 4


b) of the free 300 dollars credit, you can manage how much credit left you have :
https://console.cloud.google.com/billing/

(this credit totally free, you don't get charged at the end of the free trial)


c) unlike RDP (windows server 2016), the SSH protocol (used to access the ubuntu VM remotely) seems to terminate the server when you exit the SSH window of google chrome (correct me if i'm wrong), so you need to always keep the SSH window of google chrome open, or the server will be open but will not produce any game, which makes us move to point d)


d) as we said earlier, when you finished producing games for the day (or night), you need to first exit the SSH window of google chrome (or sudo shutdown), then the server wont produce games anymore, BUT the VM instance will still be open (whether you are runing a command or not does not matter to google), therefore consuming your free credit for nothing, so it is very important to stop the VM instance after you exit it and have finished producing games for today !

to do so, go in the console VM instances page :
https://console.cloud.google.com/compute/instances

then, click on the menu settings next to your instance (the 3 dots at the right of SSH button)
and choose STOP

again, this is very important in order to avoid wasting the free credit without producing anygame with it, just because you left the server open after closing your SSH chrome window !
The grey square shows that the instance is now stopped (not consuming free credit anymore), and the green circle shows that the instance is runing (thus consuming the free credit)


e) not needed, but if you want
monitor your VM cpu usage at the console , on the console :
https://console.cloud.google.com/home/