Untitled

These are my personal translation and slide notes. I apologize ahead of time if many of the topics are unclear or don't go into enough detail. Feel free to contact me on Twitter: https://twitter.com/CMZinac

Get a copy of the slides for reference here: https://docs.google.com/presentation/d/1_qlTcEW-PzB_hhi2HZx6kA8mez2j-WtA4wKEzkIPiW8/edit?usp=sharing

Using Neural Network for Fighting Game AI. Hisanobu Tomari 泊久信
Game Creators Conference'19. May 30, 2019. Osaka

- What they did in their game.
 > Include a neutral network in the game itself (Wanted to be able to play with machine learning like a toy)

- Real time machine learning while running the game
 > In order to run it on a consoles they made sure to use a configuration which was as small as they could make it.
 > A technique to allow efficient machine learning even for short play sessions.

- What they wanted to do
 > First, train a model on Player A's behavior during a match.
 > Second, transmit that model to a player over the internet to Player B.
 > Finally, make it possible for Player B to have matches using the trained model imitating Player A.

- They looked at existing machine learning experiments. First was OpenAI Five, which trained bots in DOTA2 using the Steam BOT API.
- Hardware for that experiement used 128,000 preemptive CPU cores on GCP and 256 GPUs.
- One observation of game state was ~36.8 KB, and they could achieve 7.5 observations for each second of gameplay.

- The considered various models for neutral networks (CNNs and RNNs).
- CNNs are good at handling data covering spacial ranges. Lately there's a lot of research into whether or not it can deal with sequential data too.
- RNNs are good at handling sequential data.
- They went with RNNs this time.

- RNNs
- LSTM: Works well even when there is a gap in time from a cause to its effect.
- GRU: Similar to LSTMs, but structurally simpler.
- Clockwork RNN: Can handle data that covers a longer gap of time.
- Tests showed that this method was best for their use case.

- Game Overview (Samurai Shodown)
- Competitve Fighting Game
- 1 v 1 battles between you and your opponent
- You when when your opponent's life hits 0.

- Input Commands in Fighting Games
- "Commands" are a sequence of inputs that cause a move to be executed when they are read by the game.
- The behavior of each move differs and leads to certain tradeoffs.
- Blocks reduce damage.
- Moves that takes a large chunk of the opponent's life can have a lot of recovery, thus prevents a player from guarding for a long time.
- Moves that can damage an opponent nearby, and moves that hit an opponent far away.
- During normal game play, players will look at the state of the game screen and input the appropriate buttons.


- Secret of input commands
- The commands are stored as tables in the game code.
- A command will be matched with its inputs
- Even unneeded inputs will be matched.
- Commands are stored in the table based on priority.
- When multiple commands are detected, the one with the highest priority is used.
- Each command depends on the character's current state. Because of this, there are times where commands can't be triggered.

- Plan for implementing machine learning.
- Make it so you can play with a model that immitates a player's behavior.
- The goal was to choose the keys that players would want to press depending on the game state.
- Also wanted to created the model using the player's actions while running in real time on consoles.
- Come up with a simple method for creating these models in real time.
- Make it possible to share these models online.

- How this was to be implemented as a game feature.
- Dojo Mode
- Download ghost data (machine learning model) on the leader board.
- You can play against this ghost in a match.
- During normal play (offline/online), these ghosts were to be generated and uploaded to a server.
- Ghost of a player's most frequently used characters would be uploaded to the leader board.

- On the actual implementation...
- Protoype Creation
- Desire to quickly use an already developed machine learning solution
- The game is being development with UE4
- TensorFlow uses python so implementing it in UE4 would require some time and effort.
 > embedding Python on consoles is especially annoying
- Because of this, we decided to connect to UE4 over a TCP connection.

- Implementing an external machine learning solution with TCP
- Able to make large changes without it effecting the game itself.
 > Changing the neural net configuration
 > Changes to pre-processing and post-processing
- With handling the machine learning part, can use Tensor Flow from Python.
 > The machine learning portion is only described in Python.

Shortcommings
- Need to rewrite it as a single process before the actual release.
 > Console support

- We tried various configurations in the prototype
- After each change to the configuration, test
- With the 4 layer LSTM configuration. It didn't learn very well.
- GRU was flexible (catches many cases?)
- Test an activation prefiler (those more familiar with ANNs will probably understand what this is about)
- Optimized algorithims.
 > RMSprop
 > Adam (Note: Probably this technique: https://towardsdatascience.com/adam-latest-trends-in-deep-learning-optimization-6be9a291375c)
 - With Adam we were able to achieve satisfactory performance this time.

 - Testing the prototype. Input and Output
 - We input data that we can get from the game system.
 - Character position, part of the collision boxes, animation state, gauge (Meter/HP?) state, time left, Last Round? etc..
 - The output was a percentage distribution of how long a player wanted to press each button combination.
 - How did it do? (Graph showing how closely the output matched the player's inputs based on the initial learning rate)
 - (Not too sure about this explanation on the graph, but basially they're saying if the rate is too big it won't ever converage to the desired output,
  but if it's too small it'll settle on a local minimum)

- Implementing the real thing
- Implementing the model that TensorFlow processes ahead of time.
 > Matrix
    > Adam Optimization
- Inference time processing
- Learning time processing
- Used a C++ template library called Eigen to handle matrices
- There's quite a speed increase that comes from compiler optimization.
 > But trouble with slow downs in debug builds
- Got quite a speed up when throwing away un-normalized numbers. (MXCSR register)

- Neural Network configuration (Diagram Showing the neural network's structure)
- 76 Parameters (Game State Inputs)
- Feeds into a Clockwork RNN
- Feed into a Dense Layer
- Feeds into softmax algorithim and outputs key codes, directional keys ( 9 outputs)
- Feeds into softmax algorithim and outputs key codes, buttons ( 16 outputs )

Behavior of Inference Time Processing (Diagram showing a screenshot and a box labeled "Learning Thread" )
- Game state feeds into the learning thread each frame.
- The learning thread generates key codes and then feeds it into the game thread.

- Tricks for generating key code inputs
- The inferred results are the probability of each key being pressed
- The results are outputted each frame.

- When the learned key was not correct, character would just swing in place no matter what key was pressed
 > Kinda like they were mashing, which was undesirable.
- So we made sure the key wasn't accepted when it didn't pass a certain probability threshold.

- How the machine learning process was handled.
- For each round, once it was over, we process just one round's data
- Battle history is sent to the machine learning thread for processing. (Note: Replay data)
- Once that is complete, the neural network model is uploaded on the internet.

- Machine Learning Training Method
- We support a training technique for recognizing the game screen.
- We accumulate a batch and then have it learn the results and the combined data.
- Hide some of the input (Hide some of the game state)
- We simply applied a low pass filter over the game state.

- What we learned was difficult about machine learning.

- Getting play data from human is essential
- Log data from an environment that is close as possible to an actual match.
 > Human vs Human
- We had 2 designers play about 50 rounds per set
 > Whenever we made changes to the neural network's input, we had to redo matches.
- This isn't ideal because it takes time


- Move input commands and properties undergo a lot of changes during development
- When you try to input these commands the neural network won't apply them.
- Then we retake the data from real people

- Making small adjustments are not effective (Note: Probably means that it's not easy to make direct, granular changes to behavior)
 > "Make sure the CPU won't continue to jump in the corner when a character is far away."
 > "Want to make the CPU more reliably use this move. It's very important!"
 > "I want it to guard more!"
- Trouble getting the neutral network to fulfill these requests.
- Questions: "Can we make it work with training data?" "HOW do we me make it work?"

- Things we were not able to do.
- We wanted to generate ouput that resembled human inputs more
- Limitations on file size shared on the network
- Limitations on the size of the neural network trained in real time
- We also wanted to be able to onboard the training to a Google Edge TPU
- Then training would occur on the cloud

- Conclusions
- Implemented in game a system to learn from player actions in real time using a neutral network.
- Make it possible to play with a neutral network model.
- With ingeninuity and clear solutions, we were able to achieve this with current hardware
- Furthermore, we hope to increase precision with access to larger models.
 > Processing speed
 > File size that can be shared online
 > We have to consider the difficult parts of machine learning during the planning stage.