• API
• FAQ
• Tools
• Archive
SHARE
TWEET

# Pokemon AI Model.

luckytyphlosion Jun 11th, 2019 (edited) 134 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
1. so I went and looked at that game theory (study of field) textbook that had algorithms to solve payoff matrices
2. and I took the time to actually understand the estimation algorithm
3. turns out the estimation algorithm is very very very computationally feasible on a gba
4. you could probably run it even on a gameboy
5. so, now that actually solving a payoff matrix on a game boy is possible, this is how you would design a slightly more complex AI:
6. - firstly, we need to explain payoff matrices:
7. - this is an example of a payoff matrix: https://cdn.bulbagarden.net/upload/c/ca/GameTheory2.png
8. - player 1 is represented by the vertical strategies, while player 2 is represented by the horizontal strategies
9. - (note: the vertical move user is tyranitar (you), and the horizontal move user is alakazam (opponent))
10. - each cell represents the payoff (in this case the effective base power of the move) each player gets based on the strategy that they both choose
11. - for example, if tyranitar chooses earthquake and alakazam chooses HP fire, then player 1 would have a base power of 100 and player 2 would have a base power of 35
12. - there are algorithms which "solve" payoff matrices by finding the nash equilibrium, which is a state where neither player can do better by changing their own strategy if the other player does not change their strategy
13. - there are two types of strategy profiles: pure strategy (always pick this strategy) and mixed strategy (pick a combination of strategies with each strategy having a probability of being picked)
14. - in addition, there is something called "expected payoff", which is the payoff you would expect given your strategy
15. - if both players are playing a pure strategy, then the expected payoff is exactly the payoff you'd get
16. - if both players are playing a mixed strategy, then the expected payoff is the average payoff over a long (infinite) amount of games
17. - for this specific example, assuming that all you care about is effective base power, the nash equilibrium would be for player 1 to always pick pursuit and for player 2 to always pick shock wave
18. - however, effective base power is not the only measure of a good choice in pokemon
19. - if we look at pokemon hypothetically, the only reward of a battle is winning (excluding side effects like levelling up)
20. - so if we look at a hypothetical turn before the battle ends (assuming that battles are finite), you either win or lose
21. - the expected payoff of "win" or "lose" is either absolute, or an average. therefore it represents the probability of winning
22. - now with our probability of winning this turn, we can go back to the turn before, which each cell would have their own probabilities
23. - thus, we can work our way back all the way to turn 1 to get the probability of winning
24. - note that this is somewhat of an oversimplification, but I think the general concept is correct
25. - of course, you wouldn't be able to search deep enough to "all possible endgame states" as you'd run out of memory
26. - but this idea tells us that hypothetically, the payoffs are probabilities of winning
27. - of course, the issue is that estimating the probabilities of winning from turn 1 is really hard and the probabilities might be around 50% for each action
28. - and even if there was a way to get very accurate estimations of probabilities, just picking probabilities isn't the only key thing for a good AI. exploitation is what a good AI does
29. - for example, there are rock paper scissors AIs which get a higher win % than what would be expected (33% win, 33% tie, 33% lose), because it exploits the fact that humans are terrible random number generators
30. - e.g. try playing 50 rounds of rock paper scissors (WITHOUT USING A RANDOM NUMBER GENERATOR): https://www.afiniti.com/corporate/rock-paper-scissors
31. - also, human players are not necessarily rational, are not able to evaluate a large number of options, and we don't necessarily know the most optimal estimation strategy
32. - therefore, an AI that exploits would recognize common patterns that human players do and factor that into calculations
33. - for example, you can assume that on turn 1, if the player has a statusing move (e.g. toxic) then they are likely to use it. made-up probabilites could be 75% status move, 25% damaging move for the player
34. - an even better AI would profile the player to use their past actions as an indicator of what they would do
35. - one way to do this would be to create preset profiles of how most pokemon players play games, and try to guess which profile the player matches the most
36. - note that I don't have an algorithm to determine the best probabilities to pick a move given that we know how the player will choose their move, but I am guessing that it exists somewhere and is simple to implement
37.
38. - so the tl;dr is:
39. - make a payoff matrix given every possible action of the player and AI, and guess the probability of winning for each combination of strategies
40. - find the nash equilibrium of the payoff matrix
41. - out of the possible strategies, choose each strategy with the probability provided
42.
43. - even more simplified tl;dr:
44. - do something like the image except the numbers are "probability of winning"
45. - the numbers don't necessarily need to be "probability of winning", if you find something that works better then use it
46.
47. sample code for the algorithm (no context so it probably won't help unless if you already have a basic understanding of game theory): https://gist.github.com/luckytyphlosion/e7c520d34dd7db6fa904d02df44a8205
48.
49. Textbook referenced: https://www.math.ucla.edu/~tom/Game_Theory/mat.pdf
50. Algorithm taken from "4.7 Approximating the Solution: Fictitious Play"
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy.
Top