Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- These are my personal translation and slide notes. I apologize ahead of time if many of the topics are unclear or don't go into enough detail. Feel free to contact me on Twitter: https://twitter.com/CMZinac
- Get a copy of the slides for reference here: https://docs.google.com/presentation/d/1_qlTcEW-PzB_hhi2HZx6kA8mez2j-WtA4wKEzkIPiW8/edit?usp=sharing
- Using Neural Network for Fighting Game AI. Hisanobu Tomari 泊久信
- Game Creators Conference'19. May 30, 2019. Osaka
- - What they did in their game.
- > Include a neutral network in the game itself (Wanted to be able to play with machine learning like a toy)
- - Real time machine learning while running the game
- > In order to run it on a consoles they made sure to use a configuration which was as small as they could make it.
- > A technique to allow efficient machine learning even for short play sessions.
- - What they wanted to do
- > First, train a model on Player A's behavior during a match.
- > Second, transmit that model to a player over the internet to Player B.
- > Finally, make it possible for Player B to have matches using the trained model imitating Player A.
- - They looked at existing machine learning experiments. First was OpenAI Five, which trained bots in DOTA2 using the Steam BOT API.
- - Hardware for that experiement used 128,000 preemptive CPU cores on GCP and 256 GPUs.
- - One observation of game state was ~36.8 KB, and they could achieve 7.5 observations for each second of gameplay.
- - The considered various models for neutral networks (CNNs and RNNs).
- - CNNs are good at handling data covering spacial ranges. Lately there's a lot of research into whether or not it can deal with sequential data too.
- - RNNs are good at handling sequential data.
- - They went with RNNs this time.
- - RNNs
- - LSTM: Works well even when there is a gap in time from a cause to its effect.
- - GRU: Similar to LSTMs, but structurally simpler.
- - Clockwork RNN: Can handle data that covers a longer gap of time.
- - Tests showed that this method was best for their use case.
- - Game Overview (Samurai Shodown)
- - Competitve Fighting Game
- - 1 v 1 battles between you and your opponent
- - You when when your opponent's life hits 0.
- - Input Commands in Fighting Games
- - "Commands" are a sequence of inputs that cause a move to be executed when they are read by the game.
- - The behavior of each move differs and leads to certain tradeoffs.
- - Blocks reduce damage.
- - Moves that takes a large chunk of the opponent's life can have a lot of recovery, thus prevents a player from guarding for a long time.
- - Moves that can damage an opponent nearby, and moves that hit an opponent far away.
- - During normal game play, players will look at the state of the game screen and input the appropriate buttons.
- - Secret of input commands
- - The commands are stored as tables in the game code.
- - A command will be matched with its inputs
- - Even unneeded inputs will be matched.
- - Commands are stored in the table based on priority.
- - When multiple commands are detected, the one with the highest priority is used.
- - Each command depends on the character's current state. Because of this, there are times where commands can't be triggered.
- - Plan for implementing machine learning.
- - Make it so you can play with a model that immitates a player's behavior.
- - The goal was to choose the keys that players would want to press depending on the game state.
- - Also wanted to created the model using the player's actions while running in real time on consoles.
- - Come up with a simple method for creating these models in real time.
- - Make it possible to share these models online.
- - How this was to be implemented as a game feature.
- - Dojo Mode
- - Download ghost data (machine learning model) on the leader board.
- - You can play against this ghost in a match.
- - During normal play (offline/online), these ghosts were to be generated and uploaded to a server.
- - Ghost of a player's most frequently used characters would be uploaded to the leader board.
- - On the actual implementation...
- - Protoype Creation
- - Desire to quickly use an already developed machine learning solution
- - The game is being development with UE4
- - TensorFlow uses python so implementing it in UE4 would require some time and effort.
- > embedding Python on consoles is especially annoying
- - Because of this, we decided to connect to UE4 over a TCP connection.
- - Implementing an external machine learning solution with TCP
- - Able to make large changes without it effecting the game itself.
- > Changing the neural net configuration
- > Changes to pre-processing and post-processing
- - With handling the machine learning part, can use Tensor Flow from Python.
- > The machine learning portion is only described in Python.
- Shortcommings
- - Need to rewrite it as a single process before the actual release.
- > Console support
- - We tried various configurations in the prototype
- - After each change to the configuration, test
- - With the 4 layer LSTM configuration. It didn't learn very well.
- - GRU was flexible (catches many cases?)
- - Test an activation prefiler (those more familiar with ANNs will probably understand what this is about)
- - Optimized algorithims.
- > RMSprop
- > Adam (Note: Probably this technique: https://towardsdatascience.com/adam-latest-trends-in-deep-learning-optimization-6be9a291375c)
- - With Adam we were able to achieve satisfactory performance this time.
- - Testing the prototype. Input and Output
- - We input data that we can get from the game system.
- - Character position, part of the collision boxes, animation state, gauge (Meter/HP?) state, time left, Last Round? etc..
- - The output was a percentage distribution of how long a player wanted to press each button combination.
- - How did it do? (Graph showing how closely the output matched the player's inputs based on the initial learning rate)
- - (Not too sure about this explanation on the graph, but basially they're saying if the rate is too big it won't ever converage to the desired output,
- but if it's too small it'll settle on a local minimum)
- - Implementing the real thing
- - Implementing the model that TensorFlow processes ahead of time.
- > Matrix
- > Adam Optimization
- - Inference time processing
- - Learning time processing
- - Used a C++ template library called Eigen to handle matrices
- - There's quite a speed increase that comes from compiler optimization.
- > But trouble with slow downs in debug builds
- - Got quite a speed up when throwing away un-normalized numbers. (MXCSR register)
- - Neural Network configuration (Diagram Showing the neural network's structure)
- - 76 Parameters (Game State Inputs)
- - Feeds into a Clockwork RNN
- - Feed into a Dense Layer
- - Feeds into softmax algorithim and outputs key codes, directional keys ( 9 outputs)
- - Feeds into softmax algorithim and outputs key codes, buttons ( 16 outputs )
- Behavior of Inference Time Processing (Diagram showing a screenshot and a box labeled "Learning Thread" )
- - Game state feeds into the learning thread each frame.
- - The learning thread generates key codes and then feeds it into the game thread.
- - Tricks for generating key code inputs
- - The inferred results are the probability of each key being pressed
- - The results are outputted each frame.
- - When the learned key was not correct, character would just swing in place no matter what key was pressed
- > Kinda like they were mashing, which was undesirable.
- - So we made sure the key wasn't accepted when it didn't pass a certain probability threshold.
- - How the machine learning process was handled.
- - For each round, once it was over, we process just one round's data
- - Battle history is sent to the machine learning thread for processing. (Note: Replay data)
- - Once that is complete, the neural network model is uploaded on the internet.
- - Machine Learning Training Method
- - We support a training technique for recognizing the game screen.
- - We accumulate a batch and then have it learn the results and the combined data.
- - Hide some of the input (Hide some of the game state)
- - We simply applied a low pass filter over the game state.
- - What we learned was difficult about machine learning.
- - Getting play data from human is essential
- - Log data from an environment that is close as possible to an actual match.
- > Human vs Human
- - We had 2 designers play about 50 rounds per set
- > Whenever we made changes to the neural network's input, we had to redo matches.
- - This isn't ideal because it takes time
- - Move input commands and properties undergo a lot of changes during development
- - When you try to input these commands the neural network won't apply them.
- - Then we retake the data from real people
- - Making small adjustments are not effective (Note: Probably means that it's not easy to make direct, granular changes to behavior)
- > "Make sure the CPU won't continue to jump in the corner when a character is far away."
- > "Want to make the CPU more reliably use this move. It's very important!"
- > "I want it to guard more!"
- - Trouble getting the neutral network to fulfill these requests.
- - Questions: "Can we make it work with training data?" "HOW do we me make it work?"
- - Things we were not able to do.
- - We wanted to generate ouput that resembled human inputs more
- - Limitations on file size shared on the network
- - Limitations on the size of the neural network trained in real time
- - We also wanted to be able to onboard the training to a Google Edge TPU
- - Then training would occur on the cloud
- - Conclusions
- - Implemented in game a system to learn from player actions in real time using a neutral network.
- - Make it possible to play with a neutral network model.
- - With ingeninuity and clear solutions, we were able to achieve this with current hardware
- - Furthermore, we hope to increase precision with access to larger models.
- > Processing speed
- > File size that can be shared online
- > We have to consider the difficult parts of machine learning during the planning stage.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement