Advertisement
Guest User

Untitled

a guest
Apr 10th, 2021
35
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 8.14 KB | None | 0 0
  1.  
  2. jacek 12:41PM
  3. ohai
  4. MSmits 12:41PM
  5. hey jacek
  6. KiwiTae 12:41PM
  7. i realised there are many tricks to stl when i tried to use c instead of cpp
  8. MSmits 12:42PM
  9. so jacek, i was meaning to ask you some questions about ML in oware specifically, but not sure how much you are sharing about this
  10. jacek 12:43PM
  11. its alright
  12. MSmits 12:43PM
  13. ok, so do you use selfplay or supervised in oware?
  14. Doju 12:43PM
  15. I just wish there was an option to select the interpreter for python here
  16. jacek 12:43PM
  17. selfplay
  18. MSmits 12:43PM
  19. when you train, do you play real games, or games with more depth or less depth?
  20. jacek 12:43PM
  21. or maybe 'self-supervised' would be more accurate
  22. MSmits 12:43PM
  23. when you generate data i mean
  24. jacek 12:44PM
  25. yes, real games with fast mcts
  26. MSmits 12:45PM
  27. what do you mean fast? Lower calculation time?
  28. jacek 12:45PM
  29. then i have position -> value of the mcts' root
  30. MSmits 12:45PM
  31. dont you use the endgame result as value to train on?
  32. jacek 12:45PM
  33. lower time, or even just fixed iterations count
  34. i used that in the past, but using value from shallow search is faster and gave better results
  35. even if at first generations it comes from random network
  36. at the very least, near endgames have accurate values
  37. MSmits 12:47PM
  38. well it makes sense that the endgame reslt is a poor value to use for early game states
  39. because it is too far away
  40. and in later states, the mcts root value should coincide with the endgame result anyway
  41. ok, so you don't have 6 outputs, one for each of the moves, like some other implementations
  42. jacek 12:48PM
  43. my training pipeline is similar to a0's now, just i train only value, and my target value is mcts result, not game final outcom
  44. yeah, only value
  45. MSmits 12:49PM
  46. ok, so do you use convolution layers?
  47. jacek 12:49PM
  48. still simple MLP
  49. MSmits 12:49PM
  50. seems less useful in oware because it is not a 2D game
  51. in what way is it close to azero then?
  52. azero uses convolution and resnet
  53. jacek 12:49PM
  54. i mean the training pipeline
  55. MSmits 12:50PM
  56. oh ok
  57. generating data in batches, training, validation etc.
  58. jacek 12:50PM
  59. yes
  60. MSmits 12:50PM
  61. well you do exactly what i would prefer to try, it seems like a good baseline to experiment with
  62. jacek 12:50PM
  63. it is evaltype-agnostic
  64. could be nn, n-tuple or handcrafter features
  65. MSmits 12:51PM
  66. yeah it's a nn, it could be whatever :)
  67. oh
  68. you mean the pipeline
  69. yeah
  70. jacek 12:51PM
  71. well anything that has adjustable parameters
  72. MSmits 12:51PM
  73. you could use it to train handcrafted features
  74. parameters
  75. yea
  76. thats something i need also
  77. so could learn 2 things at once here
  78. if i stick with it
  79. kovi 12:52PM
  80. bookmark
  81. MSmits 12:52PM
  82. do you use anything like tensorflow?
  83. or is it fully homemade
  84. jacek 12:52PM
  85. no, i made things in c++ from scratch
  86. MSmits 12:52PM
  87. thats great, also what i want to try
  88. that way you can more easily get it into CG
  89. kovi 12:53PM
  90. not sure if c++ tensor fits into cg
  91. yeah
  92. MSmits 12:53PM
  93. he can just make a long string and then convert it into tensors
  94. thats not the hard part
  95. kovi 12:53PM
  96. yeah but runtime lib
  97. MSmits 12:53PM
  98. the hard part is writing efficient matrix calculations
  99. jacek 12:53PM
  100. it all started when i finally rewrote the xor example from python's to c++
  101. kovi 12:53PM
  102. it was not written with 100k limit consideration
  103. MSmits 12:53PM
  104. right, thats good
  105. jacek 12:53PM
  106. i think i have somewhere the python example without using np
  107. MSmits 12:54PM
  108. when you say "the python example" which one do you mean?
  109. Doju 12:54PM
  110. Hmm, I must be doing something wrong because I want to make pretty much everything protected instead of private
  111. kovi 12:54PM
  112. there are a0 examples
  113. MSmits 12:54PM
  114. yeah, but we are doing this far below a0 level :)
  115. i dont want to touch that anymore
  116. jacek 12:54PM
  117. http://chat.codingame.com/pastebin/429ca1e6-890a-4b94-9773-49404526b36a
  118. MSmits 12:54PM
  119. a0 is so complicated
  120. kovi 12:55PM
  121. true, jacek value based thing is pretty wise
  122. jacek 12:55PM
  123. XOR mlp, 1 hidden layer
  124. no fancy numpy
  125. Doju 12:55PM
  126. Oh jeez
  127. are you doing neural nets without numpy?
  128. MSmits 12:55PM
  129. thats great jacek, finally something without numpy
  130. jacek 12:55PM
  131. just for learning for myself
  132. Doju 12:55PM
  133. thats nuts :o
  134. jacek 12:56PM
  135. no numpy in c++ standard libs
  136. MSmits 12:56PM
  137. numpy makes things faster, but it doesnt make it more clear to learn
  138. Doju 12:56PM
  139. great if you want to learn
  140. yeah thats true
  141. if the objective is to learn instead of making a fast thing then that makes sense
  142. kovi 12:56PM
  143. but without numpy usually 1/10 speed
  144. MSmits 12:56PM
  145. jacek for the matrix calcs, did you use any c++ library, or did you just figure out what intrinsics and other tricks to use, yourself?
  146. kovi 12:57PM
  147. and without tf/gpu 1/100
  148. or worse
  149. MSmits 12:57PM
  150. kovi if i take weeks to code something and the training takes 24 hrs instead of 1 hr, it's fine :)
  151. jacek 12:57PM
  152. i use good old for loops, not even intrinsics
  153. kovi 12:57PM
  154. oh, you do c++ calc, sorry
  155. than its ok
  156. MSmits 12:58PM
  157. I see, doesn't it bother you that it might be much faster jacek, with some tricks?
  158. i mean obviously you dont need more speed atm
  159. jacek 12:58PM
  160. yeah, NN eval is most consuming part of my code
  161. and i tried several times. im just too dumb
  162. MSmits 12:59PM
  163. well if I ever get to the point where I can write this stuff and it's better than what you have, I will share it with you
  164. might be a couple months. I want to get something before the end of my summer vacation. Trying to be generous with my time estimate here, apparently it's hard to learn
  165. jacek 01:00PM
  166. and i finally got this work for Y. its 5th without opening book
  167. MSmits 01:01PM
  168. nice one, Robo did as well
  169. but yavalath has huge problems with determinism because of the early game endings
  170. jacek 01:01PM
  171. its N-tuple with small hidden layer, MLP-tuple :v
  172. MSmits 01:01PM
  173. ohh ok
  174. I am going to be solving connect4 i think
  175. Doju 01:02PM
  176. welp now i've got circular dependencies
  177. jacek 01:03PM
  178. determinism... in training i choose final moves according softmax
  179. it was another thing that i lacked before
  180. allows for exploration but not too dumb exploration
  181. MSmits 01:03PM
  182. I see
  183. hey, you train on cpu right?
  184. jacek 01:04PM
  185. yeah
  186. MSmits 01:04PM
  187. i read about gpu being 20-100 times faster
  188. but i feel that's probably also because when people do that they have 4 really expensive ones running at once
  189. doubt i'd achieve that factor with mine
  190. DomiKo 01:05PM
  191. not really
  192. jacek 01:05PM
  193. i have rather small nn
  194. WOLFRAHH 01:05PM
  195. hii guys what was going
  196. jacek 01:05PM
  197. not quite parallelizable
  198. MSmits 01:05PM
  199. ahh ok
  200. jacek 01:05PM
  201. well maybe for training batch itself, the gpu would come in handy
  202. MSmits 01:05PM
  203. seems so difficult to write that yourself
  204. i would prefer to do it with tensorflow then and just convert their models somehow
  205. jacek 01:06PM
  206. thats why i havent written convnets yet. i could write gazillion layers etc. in python but at first i want to make something small myself
  207. MSmits 01:06PM
  208. and resnets?
  209. jacek 01:07PM
  210. too
  211. WOLFRAHH 01:07PM
  212. can any body tell what was going
  213. MSmits 01:07PM
  214. convnets supposedly help for games like othello/yavalath, where the surroundings of a hex/square are important
  215. doubt it would help much for oware
  216. jacek 01:08PM
  217. my NNs so far have at most 2 hidden layers, so resnets are pointless.
  218. MSmits 01:08PM
  219. yeah
  220. 2 is not much at all
  221. jacek 01:08PM
  222. also i mostly exploit the fact that there is little change between game states, i.e. only few squares are affected
  223. MSmits 01:08PM
  224. did you experiment with trading layer size for depth?
  225. how do you exploit this?
  226. jacek 01:09PM
  227. yeah, and this is what i came up for my framework and cg constraints
  228. for input/first hidden layer you need to only update the values instead of summing everything all over again
  229. partial updates, the main idea behind nnue
  230. MSmits 01:11PM
  231. oh you mean it's a performance improvement
  232. do you mean during training or running a game?
  233. jacek 01:11PM
  234. yes
  235. well both
  236. MSmits 01:12PM
  237. well that seems useful and alleviates the problem with you just using for loops
  238. jacek 01:12PM
  239. though i do not use that in oware
  240. MSmits 01:12PM
  241. it's easier to implement improvements like this when your code is not stuck in weird intrinsic and avx stuff
  242. proace21 01:13PM
  243. hi
  244. MSmits 01:13PM
  245. once you've gotten into that, you generally dont touch the code anymore. At least I dont
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement