Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- jacek 12:41PM
- ohai
- MSmits 12:41PM
- hey jacek
- KiwiTae 12:41PM
- i realised there are many tricks to stl when i tried to use c instead of cpp
- MSmits 12:42PM
- so jacek, i was meaning to ask you some questions about ML in oware specifically, but not sure how much you are sharing about this
- jacek 12:43PM
- its alright
- MSmits 12:43PM
- ok, so do you use selfplay or supervised in oware?
- Doju 12:43PM
- I just wish there was an option to select the interpreter for python here
- jacek 12:43PM
- selfplay
- MSmits 12:43PM
- when you train, do you play real games, or games with more depth or less depth?
- jacek 12:43PM
- or maybe 'self-supervised' would be more accurate
- MSmits 12:43PM
- when you generate data i mean
- jacek 12:44PM
- yes, real games with fast mcts
- MSmits 12:45PM
- what do you mean fast? Lower calculation time?
- jacek 12:45PM
- then i have position -> value of the mcts' root
- MSmits 12:45PM
- dont you use the endgame result as value to train on?
- jacek 12:45PM
- lower time, or even just fixed iterations count
- i used that in the past, but using value from shallow search is faster and gave better results
- even if at first generations it comes from random network
- at the very least, near endgames have accurate values
- MSmits 12:47PM
- well it makes sense that the endgame reslt is a poor value to use for early game states
- because it is too far away
- and in later states, the mcts root value should coincide with the endgame result anyway
- ok, so you don't have 6 outputs, one for each of the moves, like some other implementations
- jacek 12:48PM
- my training pipeline is similar to a0's now, just i train only value, and my target value is mcts result, not game final outcom
- yeah, only value
- MSmits 12:49PM
- ok, so do you use convolution layers?
- jacek 12:49PM
- still simple MLP
- MSmits 12:49PM
- seems less useful in oware because it is not a 2D game
- in what way is it close to azero then?
- azero uses convolution and resnet
- jacek 12:49PM
- i mean the training pipeline
- MSmits 12:50PM
- oh ok
- generating data in batches, training, validation etc.
- jacek 12:50PM
- yes
- MSmits 12:50PM
- well you do exactly what i would prefer to try, it seems like a good baseline to experiment with
- jacek 12:50PM
- it is evaltype-agnostic
- could be nn, n-tuple or handcrafter features
- MSmits 12:51PM
- yeah it's a nn, it could be whatever :)
- oh
- you mean the pipeline
- yeah
- jacek 12:51PM
- well anything that has adjustable parameters
- MSmits 12:51PM
- you could use it to train handcrafted features
- parameters
- yea
- thats something i need also
- so could learn 2 things at once here
- if i stick with it
- kovi 12:52PM
- bookmark
- MSmits 12:52PM
- do you use anything like tensorflow?
- or is it fully homemade
- jacek 12:52PM
- no, i made things in c++ from scratch
- MSmits 12:52PM
- thats great, also what i want to try
- that way you can more easily get it into CG
- kovi 12:53PM
- not sure if c++ tensor fits into cg
- yeah
- MSmits 12:53PM
- he can just make a long string and then convert it into tensors
- thats not the hard part
- kovi 12:53PM
- yeah but runtime lib
- MSmits 12:53PM
- the hard part is writing efficient matrix calculations
- jacek 12:53PM
- it all started when i finally rewrote the xor example from python's to c++
- kovi 12:53PM
- it was not written with 100k limit consideration
- MSmits 12:53PM
- right, thats good
- jacek 12:53PM
- i think i have somewhere the python example without using np
- MSmits 12:54PM
- when you say "the python example" which one do you mean?
- Doju 12:54PM
- Hmm, I must be doing something wrong because I want to make pretty much everything protected instead of private
- kovi 12:54PM
- there are a0 examples
- MSmits 12:54PM
- yeah, but we are doing this far below a0 level :)
- i dont want to touch that anymore
- jacek 12:54PM
- http://chat.codingame.com/pastebin/429ca1e6-890a-4b94-9773-49404526b36a
- MSmits 12:54PM
- a0 is so complicated
- kovi 12:55PM
- true, jacek value based thing is pretty wise
- jacek 12:55PM
- XOR mlp, 1 hidden layer
- no fancy numpy
- Doju 12:55PM
- Oh jeez
- are you doing neural nets without numpy?
- MSmits 12:55PM
- thats great jacek, finally something without numpy
- jacek 12:55PM
- just for learning for myself
- Doju 12:55PM
- thats nuts :o
- jacek 12:56PM
- no numpy in c++ standard libs
- MSmits 12:56PM
- numpy makes things faster, but it doesnt make it more clear to learn
- Doju 12:56PM
- great if you want to learn
- yeah thats true
- if the objective is to learn instead of making a fast thing then that makes sense
- kovi 12:56PM
- but without numpy usually 1/10 speed
- MSmits 12:56PM
- jacek for the matrix calcs, did you use any c++ library, or did you just figure out what intrinsics and other tricks to use, yourself?
- kovi 12:57PM
- and without tf/gpu 1/100
- or worse
- MSmits 12:57PM
- kovi if i take weeks to code something and the training takes 24 hrs instead of 1 hr, it's fine :)
- jacek 12:57PM
- i use good old for loops, not even intrinsics
- kovi 12:57PM
- oh, you do c++ calc, sorry
- than its ok
- MSmits 12:58PM
- I see, doesn't it bother you that it might be much faster jacek, with some tricks?
- i mean obviously you dont need more speed atm
- jacek 12:58PM
- yeah, NN eval is most consuming part of my code
- and i tried several times. im just too dumb
- MSmits 12:59PM
- well if I ever get to the point where I can write this stuff and it's better than what you have, I will share it with you
- might be a couple months. I want to get something before the end of my summer vacation. Trying to be generous with my time estimate here, apparently it's hard to learn
- jacek 01:00PM
- and i finally got this work for Y. its 5th without opening book
- MSmits 01:01PM
- nice one, Robo did as well
- but yavalath has huge problems with determinism because of the early game endings
- jacek 01:01PM
- its N-tuple with small hidden layer, MLP-tuple :v
- MSmits 01:01PM
- ohh ok
- I am going to be solving connect4 i think
- Doju 01:02PM
- welp now i've got circular dependencies
- jacek 01:03PM
- determinism... in training i choose final moves according softmax
- it was another thing that i lacked before
- allows for exploration but not too dumb exploration
- MSmits 01:03PM
- I see
- hey, you train on cpu right?
- jacek 01:04PM
- yeah
- MSmits 01:04PM
- i read about gpu being 20-100 times faster
- but i feel that's probably also because when people do that they have 4 really expensive ones running at once
- doubt i'd achieve that factor with mine
- DomiKo 01:05PM
- not really
- jacek 01:05PM
- i have rather small nn
- WOLFRAHH 01:05PM
- hii guys what was going
- jacek 01:05PM
- not quite parallelizable
- MSmits 01:05PM
- ahh ok
- jacek 01:05PM
- well maybe for training batch itself, the gpu would come in handy
- MSmits 01:05PM
- seems so difficult to write that yourself
- i would prefer to do it with tensorflow then and just convert their models somehow
- jacek 01:06PM
- thats why i havent written convnets yet. i could write gazillion layers etc. in python but at first i want to make something small myself
- MSmits 01:06PM
- and resnets?
- jacek 01:07PM
- too
- WOLFRAHH 01:07PM
- can any body tell what was going
- MSmits 01:07PM
- convnets supposedly help for games like othello/yavalath, where the surroundings of a hex/square are important
- doubt it would help much for oware
- jacek 01:08PM
- my NNs so far have at most 2 hidden layers, so resnets are pointless.
- MSmits 01:08PM
- yeah
- 2 is not much at all
- jacek 01:08PM
- also i mostly exploit the fact that there is little change between game states, i.e. only few squares are affected
- MSmits 01:08PM
- did you experiment with trading layer size for depth?
- how do you exploit this?
- jacek 01:09PM
- yeah, and this is what i came up for my framework and cg constraints
- for input/first hidden layer you need to only update the values instead of summing everything all over again
- partial updates, the main idea behind nnue
- MSmits 01:11PM
- oh you mean it's a performance improvement
- do you mean during training or running a game?
- jacek 01:11PM
- yes
- well both
- MSmits 01:12PM
- well that seems useful and alleviates the problem with you just using for loops
- jacek 01:12PM
- though i do not use that in oware
- MSmits 01:12PM
- it's easier to implement improvements like this when your code is not stuck in weird intrinsic and avx stuff
- proace21 01:13PM
- hi
- MSmits 01:13PM
- once you've gotten into that, you generally dont touch the code anymore. At least I dont
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement