Advertisement
CMZinac

Untitled

Apr 4th, 2019
437
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 9.90 KB | None | 0 0
  1. These are my personal translation and slide notes. I apologize ahead of time if many of the topics are unclear or don't go into enough detail. Feel free to contact me on Twitter: https://twitter.com/CMZinac
  2.  
  3. Get a copy of the slides for reference here: https://docs.google.com/presentation/d/1_qlTcEW-PzB_hhi2HZx6kA8mez2j-WtA4wKEzkIPiW8/edit?usp=sharing
  4.  
  5. Using Neural Network for Fighting Game AI. Hisanobu Tomari 泊久信
  6. Game Creators Conference'19. May 30, 2019. Osaka
  7.  
  8. - What they did in their game.
  9. > Include a neutral network in the game itself (Wanted to be able to play with machine learning like a toy)
  10.  
  11. - Real time machine learning while running the game
  12. > In order to run it on a consoles they made sure to use a configuration which was as small as they could make it.
  13. > A technique to allow efficient machine learning even for short play sessions.
  14.  
  15. - What they wanted to do
  16. > First, train a model on Player A's behavior during a match.
  17. > Second, transmit that model to a player over the internet to Player B.
  18. > Finally, make it possible for Player B to have matches using the trained model imitating Player A.
  19.  
  20. - They looked at existing machine learning experiments. First was OpenAI Five, which trained bots in DOTA2 using the Steam BOT API.
  21. - Hardware for that experiement used 128,000 preemptive CPU cores on GCP and 256 GPUs.
  22. - One observation of game state was ~36.8 KB, and they could achieve 7.5 observations for each second of gameplay.
  23.  
  24. - The considered various models for neutral networks (CNNs and RNNs).
  25. - CNNs are good at handling data covering spacial ranges. Lately there's a lot of research into whether or not it can deal with sequential data too.
  26. - RNNs are good at handling sequential data.
  27. - They went with RNNs this time.
  28.  
  29. - RNNs
  30. - LSTM: Works well even when there is a gap in time from a cause to its effect.
  31. - GRU: Similar to LSTMs, but structurally simpler.
  32. - Clockwork RNN: Can handle data that covers a longer gap of time.
  33. - Tests showed that this method was best for their use case.
  34.  
  35. - Game Overview (Samurai Shodown)
  36. - Competitve Fighting Game
  37. - 1 v 1 battles between you and your opponent
  38. - You when when your opponent's life hits 0.
  39.  
  40. - Input Commands in Fighting Games
  41. - "Commands" are a sequence of inputs that cause a move to be executed when they are read by the game.
  42. - The behavior of each move differs and leads to certain tradeoffs.
  43. - Blocks reduce damage.
  44. - Moves that takes a large chunk of the opponent's life can have a lot of recovery, thus prevents a player from guarding for a long time.
  45. - Moves that can damage an opponent nearby, and moves that hit an opponent far away.
  46. - During normal game play, players will look at the state of the game screen and input the appropriate buttons.
  47.  
  48.  
  49. - Secret of input commands
  50. - The commands are stored as tables in the game code.
  51. - A command will be matched with its inputs
  52. - Even unneeded inputs will be matched.
  53. - Commands are stored in the table based on priority.
  54. - When multiple commands are detected, the one with the highest priority is used.
  55. - Each command depends on the character's current state. Because of this, there are times where commands can't be triggered.
  56.  
  57. - Plan for implementing machine learning.
  58. - Make it so you can play with a model that immitates a player's behavior.
  59. - The goal was to choose the keys that players would want to press depending on the game state.
  60. - Also wanted to created the model using the player's actions while running in real time on consoles.
  61. - Come up with a simple method for creating these models in real time.
  62. - Make it possible to share these models online.
  63.  
  64. - How this was to be implemented as a game feature.
  65. - Dojo Mode
  66. - Download ghost data (machine learning model) on the leader board.
  67. - You can play against this ghost in a match.
  68. - During normal play (offline/online), these ghosts were to be generated and uploaded to a server.
  69. - Ghost of a player's most frequently used characters would be uploaded to the leader board.
  70.  
  71. - On the actual implementation...
  72. - Protoype Creation
  73. - Desire to quickly use an already developed machine learning solution
  74. - The game is being development with UE4
  75. - TensorFlow uses python so implementing it in UE4 would require some time and effort.
  76. > embedding Python on consoles is especially annoying
  77. - Because of this, we decided to connect to UE4 over a TCP connection.
  78.  
  79. - Implementing an external machine learning solution with TCP
  80. - Able to make large changes without it effecting the game itself.
  81. > Changing the neural net configuration
  82. > Changes to pre-processing and post-processing
  83. - With handling the machine learning part, can use Tensor Flow from Python.
  84. > The machine learning portion is only described in Python.
  85.  
  86. Shortcommings
  87. - Need to rewrite it as a single process before the actual release.
  88. > Console support
  89.  
  90. - We tried various configurations in the prototype
  91. - After each change to the configuration, test
  92. - With the 4 layer LSTM configuration. It didn't learn very well.
  93. - GRU was flexible (catches many cases?)
  94. - Test an activation prefiler (those more familiar with ANNs will probably understand what this is about)
  95. - Optimized algorithims.
  96. > RMSprop
  97. > Adam (Note: Probably this technique: https://towardsdatascience.com/adam-latest-trends-in-deep-learning-optimization-6be9a291375c)
  98. - With Adam we were able to achieve satisfactory performance this time.
  99.  
  100. - Testing the prototype. Input and Output
  101. - We input data that we can get from the game system.
  102. - Character position, part of the collision boxes, animation state, gauge (Meter/HP?) state, time left, Last Round? etc..
  103. - The output was a percentage distribution of how long a player wanted to press each button combination.
  104. - How did it do? (Graph showing how closely the output matched the player's inputs based on the initial learning rate)
  105. - (Not too sure about this explanation on the graph, but basially they're saying if the rate is too big it won't ever converage to the desired output,
  106. but if it's too small it'll settle on a local minimum)
  107.  
  108. - Implementing the real thing
  109. - Implementing the model that TensorFlow processes ahead of time.
  110. > Matrix
  111. > Adam Optimization
  112. - Inference time processing
  113. - Learning time processing
  114. - Used a C++ template library called Eigen to handle matrices
  115. - There's quite a speed increase that comes from compiler optimization.
  116. > But trouble with slow downs in debug builds
  117. - Got quite a speed up when throwing away un-normalized numbers. (MXCSR register)
  118.  
  119. - Neural Network configuration (Diagram Showing the neural network's structure)
  120. - 76 Parameters (Game State Inputs)
  121. - Feeds into a Clockwork RNN
  122. - Feed into a Dense Layer
  123. - Feeds into softmax algorithim and outputs key codes, directional keys ( 9 outputs)
  124. - Feeds into softmax algorithim and outputs key codes, buttons ( 16 outputs )
  125.  
  126. Behavior of Inference Time Processing (Diagram showing a screenshot and a box labeled "Learning Thread" )
  127. - Game state feeds into the learning thread each frame.
  128. - The learning thread generates key codes and then feeds it into the game thread.
  129.  
  130. - Tricks for generating key code inputs
  131. - The inferred results are the probability of each key being pressed
  132. - The results are outputted each frame.
  133.  
  134. - When the learned key was not correct, character would just swing in place no matter what key was pressed
  135. > Kinda like they were mashing, which was undesirable.
  136. - So we made sure the key wasn't accepted when it didn't pass a certain probability threshold.
  137.  
  138. - How the machine learning process was handled.
  139. - For each round, once it was over, we process just one round's data
  140. - Battle history is sent to the machine learning thread for processing. (Note: Replay data)
  141. - Once that is complete, the neural network model is uploaded on the internet.
  142.  
  143. - Machine Learning Training Method
  144. - We support a training technique for recognizing the game screen.
  145. - We accumulate a batch and then have it learn the results and the combined data.
  146. - Hide some of the input (Hide some of the game state)
  147. - We simply applied a low pass filter over the game state.
  148.  
  149. - What we learned was difficult about machine learning.
  150.  
  151. - Getting play data from human is essential
  152. - Log data from an environment that is close as possible to an actual match.
  153. > Human vs Human
  154. - We had 2 designers play about 50 rounds per set
  155. > Whenever we made changes to the neural network's input, we had to redo matches.
  156. - This isn't ideal because it takes time
  157.  
  158.  
  159. - Move input commands and properties undergo a lot of changes during development
  160. - When you try to input these commands the neural network won't apply them.
  161. - Then we retake the data from real people
  162.  
  163. - Making small adjustments are not effective (Note: Probably means that it's not easy to make direct, granular changes to behavior)
  164. > "Make sure the CPU won't continue to jump in the corner when a character is far away."
  165. > "Want to make the CPU more reliably use this move. It's very important!"
  166. > "I want it to guard more!"
  167. - Trouble getting the neutral network to fulfill these requests.
  168. - Questions: "Can we make it work with training data?" "HOW do we me make it work?"
  169.  
  170. - Things we were not able to do.
  171. - We wanted to generate ouput that resembled human inputs more
  172. - Limitations on file size shared on the network
  173. - Limitations on the size of the neural network trained in real time
  174. - We also wanted to be able to onboard the training to a Google Edge TPU
  175. - Then training would occur on the cloud
  176.  
  177. - Conclusions
  178. - Implemented in game a system to learn from player actions in real time using a neutral network.
  179. - Make it possible to play with a neutral network model.
  180. - With ingeninuity and clear solutions, we were able to achieve this with current hardware
  181. - Furthermore, we hope to increase precision with access to larger models.
  182. > Processing speed
  183. > File size that can be shared online
  184. > We have to consider the difficult parts of machine learning during the planning stage.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement