Advertisement
Guest User

Untitled

a guest
Oct 4th, 2018
138
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 10.06 KB | None | 0 0
  1. part 0.1
  2. creating a VM instance
  3.  
  4.  
  5. in google cloud, open the console :
  6. https://console.cloud.google.com/home/
  7.  
  8. then click on the sandwich bar on top left -> compute engine -> WM instances
  9. then create a new VM instance, choose a custom one with these settings :
  10.  
  11. - scroll through the regions and sub regions that have a V100 available, then when you find it, choose :
  12. - 2 vCPU (to avoid nvidia V100 throttling)
  13. - 3,75 GB RAM
  14. - 15 GB HDD
  15. - 1 GPU : Tesla V100 (maximum for free trial)
  16. - system : ubuntu 18.04 LTS
  17. - allow HTTP and HTTPS requests
  18. - preemptible settings (-60% discount on the free credit consumption, cost in exchange of occasional power off of the instance, takes 2 minutes to start again, totally worth it :
  19. to activate preemptible machine :
  20. Click Management, security, disks, networking, sole tenancy.
  21. Under Availability policy, set the Preemptibility option to On. This setting disables automatic restart for the instance, and sets the host maintenance action to Terminate.
  22. Click Create to create the instance.
  23. more details about preemptible here : https://cloud.google.com/compute/docs/instances/create-start-preemptible-instance
  24.  
  25.  
  26. for information :
  27. preemptible instances being much cheaper (if i remember well arround 650 dollars vs 1400 dollars per month of free credit consumed on a V100 VM instance), choosing preemptible instance when you create it is a no brainer.
  28. From my experience, the VM instance will be stopped by Google at most 1 or 2, or very rarely 3 times per 24 hours, which leaves on average at least 5-10 hours to use it on a row before first stop.
  29. And then, to restart the instance only takes 1 click and 2 minutes as we will see later, then you are good to go again for many hours.
  30. Note that when preemptible instance is stopped by Google every while, the credit stops being consumed too because the instance is stopped, so you dont have to worry about efficiency of the credit.
  31.  
  32.  
  33.  
  34. part 0.2 :
  35. preparations
  36.  
  37. go on google console (compute engine) -> VM instances -> click on SSH button to connect to the instance via SSH (embbed on chrome)
  38.  
  39. To read before starting :
  40. The instance will be opened in a new chrome window that uses SSH protocol to connect to your VM instance.
  41. To make copying commands easier, in the SSH chrome new window, go to : settings → copy paste with ctrl+shift+c/v : click ok
  42. From now on, we will use ctrl+c to copy a command from this text, but ctrl+shift+v to paste it on the SSH window (because it is ubuntu terminal)
  43.  
  44.  
  45. Finally, for information, before starting part 1.1 and part 1.2 :
  46. About preemptible instances again, if, unfortunately, the instance was to be exited by google while we are installing system packages, the probability of system corruption due to exit during install of packages is high (parts 1.1 + 1.2 + 2), and if this very rare case happens, i advise you to delete the instance and recreate a new one, and hopefully (unless you are very unlucky, but then try again !) you will not be stopped in this new instance.
  47.  
  48.  
  49. part 1 :
  50. installing all drivers
  51.  
  52.  
  53. sudo apt-get update && sudo apt-get -y upgrade && sudo apt-get -y dist-upgrade && sudo reboot
  54.  
  55.  
  56. for information : after the first reboot, you lose connection with the VM instance : wait 1-2 minutes and retry 1 or 2 times, and it should reconnect again as long as server is ON (see this page to see if server is ON with a green circle, and to see if SSH is clickable) :
  57. https://console.cloud.google.com/compute/instances
  58. if "retry" does not work and the instance is still ON (green circle), then retry again until you succeed to connect via SSH again.
  59.  
  60.  
  61. for information 2, before starting next command :
  62. official website : https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa
  63. Current long-lived branch release at the time of this tutorial : `nvidia-390` (390.48)
  64. at the ppa website linked above.
  65. if in the future Current long-lived branch gets an update, you should replace 390 (in nvidia-390 of the 1st command and in libnvi
  66. dia-compute-390 of the 2nd command by the driver version number mentionned.
  67. For example , if next driver update of nvidia "Current long-lived branch release" is nvidia-393, you would have to replace nvidia-390 and libnvidia-compute-390 by nvidia-393 and libnvidia-compute-393 without changing the rest of the command)
  68.  
  69.  
  70.  
  71.  
  72. sudo add-apt-repository -y ppa:graphics-drivers/ppa && sudo apt-get update && sudo apt-get -y install nvidia-390 && sudo reboot
  73.  
  74.  
  75. sudo apt-get -y install linux-headers-generic nvidia-opencl-dev libnvidia-compute-390 && sudo reboot
  76.  
  77.  
  78.  
  79.  
  80.  
  81. part 2 :
  82. installing all other prerequired packages
  83.  
  84.  
  85. sudo apt-get -y install clinfo cmake git libboost-all-dev libopenblas-dev zlib1g-dev build-essential qtbase5-dev qttools5-dev qttools5-dev-tools libboost-dev libboost-program-options-dev opencl-headers ocl-icd-libopencl1 ocl-icd-opencl-dev qt5-default qt5-qmake curl && clinfo
  86.  
  87.  
  88.  
  89.  
  90.  
  91. part 3 :
  92. compiling leela z, autogtp, and running autogtp
  93.  
  94. select all commands below and copy/paste all the selection :
  95.  
  96.  
  97.  
  98. git clone https://github.com/gcp/leela-zero
  99. cd leela-zero/src
  100. make
  101. cd ..
  102. cd ./autogtp
  103. qmake -qt5
  104. make
  105. cp ../src/leelaz .
  106. ./autogtp
  107.  
  108.  
  109.  
  110.  
  111. and autogtp finally runs !
  112. AutoGTP v16
  113. Using 1 thread(s) for GPU(s).
  114. Starting tuning process, please wait...
  115. Net filename: networks/25c2313d8c11b9320de4795cf593f237f32e8a61c4524a6305ff30073b760132
  116. net: 25c2313d8c11b9320de4795cf593f237f32e8a61c4524a6305ff30073b760132.
  117. ./leelaz --tune-only -w networks/25c2313d8c11b9320de4795cf593f237f32e8a61c4524a6305ff30073b760132
  118. ./leelaz --tune-only -w networks/25c2313d8c11b9320de4795cf593f237f32e8a61c4524a6305ff30073b760132
  119. Using 2 thread(s).
  120. RNG seed: 17613292517859834344
  121. Detecting residual layers...v1...256 channels...40 blocks.
  122. Initializing OpenCL.
  123. Detected 1 OpenCL platforms.
  124. Platform version: OpenCL 1.2 CUDA 9.1.84
  125. Platform profile: FULL_PROFILE
  126. Platform name: NVIDIA CUDA
  127. Platform vendor: NVIDIA Corporation
  128. Device ID: 0
  129. Device name: Tesla V100-SXM2-16GB
  130. Device type: GPU
  131. Device vendor: NVIDIA Corporation
  132. Device driver: 390.87
  133. Device speed: 1530 MHz
  134. Device cores: 80 CU
  135. Device score: 1112
  136. Selected platform: NVIDIA CUDA
  137. Selected device: Tesla V100-SXM2-16GB
  138. with OpenCL 1.2 capability.
  139.  
  140. Started OpenCL SGEMM tuner.
  141. Will try 290 valid configurations.
  142. (1/290) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=16 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=2 0.0689 ms (3045.4 GFLOPS)
  143. (2/290) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=2 0.0548 ms (3828.0 GFLOPS)
  144. (5/290) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=2 0.0538 ms (3901.0 GFLOPS)
  145. (21/290) KWG=16 KWI=2 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=2 0.0509 ms (4116.6 GFLOPS)
  146. (27/290) KWG=32 KWI=2 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=2 0.0489 ms (4289.0 GFLOPS)
  147. (88/290) KWG=32 KWI=8 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=2 0.0479 ms (4380.7 GFLOPS)
  148. (131/290) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=4 VWN=2 0.0471 ms (4452.2 GFLOPS)
  149. (135/290) KWG=16 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=4 VWN=2 0.0456 ms (4602.2 GFLOPS)
  150. (205/290) KWG=32 KWI=2 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=4 0.0445 ms (4708.0 GFLOPS)
  151. (238/290) KWG=32 KWI=8 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=4 0.0438 ms (4790.6 GFLOPS)
  152. Tuning process finished
  153.  
  154.  
  155.  
  156. part 4 :
  157. how to run autogtp again (if exited)
  158.  
  159.  
  160. cd leela-zero/autogtp
  161. ./autogtp
  162.  
  163.  
  164.  
  165. part 5 :
  166. monitoring of the VM
  167.  
  168.  
  169.  
  170. a) check if instance is ON or OFF, and manage your instance :
  171. https://console.cloud.google.com/compute/instances
  172. for information : your free credit does NOT get charged as long as the machine is off, the only point of staying always ON is computing as much selfplay games as possible.
  173.  
  174. as we said earlier, if the VM instance is stopped because of preemptive rules :
  175. the free credit stopped being consumed and is not wasted, simply because the instance is now stopped (grey square)
  176. to restart selfplay, you can restart the machine again, reconnect via SSH, and do again part 4
  177.  
  178.  
  179.  
  180. b) of the free 300 dollars credit, you can manage how much credit left you have :
  181. https://console.cloud.google.com/billing/
  182.  
  183. (this credit totally free, you don't get charged at the end of the free trial)
  184.  
  185.  
  186.  
  187. c) unlike RDP (windows server 2016), the SSH protocol (used to access the ubuntu VM remotely) seems to terminate the server when you exit the SSH window of google chrome (correct me if i'm wrong), so you need to always keep the SSH window of google chrome open, or the server will be open but will not produce any game, which makes us move to point d)
  188.  
  189.  
  190. d) as we said earlier, when you finished producing games for the day (or night), you need to first exit the SSH window of google chrome (or sudo shutdown), then the server wont produce games anymore, BUT the VM instance will still be open (whether you are runing a command or not does not matter to google), therefore consuming your free credit for nothing, so it is very important to stop the VM instance after you exit it and have finished producing games for today !
  191.  
  192. to do so, go in the console VM instances page :
  193. https://console.cloud.google.com/compute/instances
  194.  
  195. then, click on the menu settings next to your instance (the 3 dots at the right of SSH button)
  196. and choose STOP
  197.  
  198. again, this is very important in order to avoid wasting the free credit without producing anygame with it, just because you left the server open after closing your SSH chrome window !
  199. The grey square shows that the instance is now stopped (not consuming free credit anymore), and the green circle shows that the instance is runing (thus consuming the free credit)
  200.  
  201.  
  202. e) not needed, but if you want
  203. monitor your VM cpu usage at the console , on the console :
  204. https://console.cloud.google.com/home/
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement