Advertisement
Guest User

Untitled

a guest
Jul 21st, 2017
52
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.97 KB | None | 0 0
  1. class Player:
  2. # ....
  3. def update_strategy(self):
  4. """
  5. Set the preference (strategy) of choosing an action to be proportional to positive regrets. e.g, a strategy that prefers PAPER can be [0.2, 0.6, 0.2]
  6. """
  7. self.strategy = np.copy(self.regret_sum)
  8. self.strategy[self.strategy < 0] = 0 # reset negative regrets to zero
  9.  
  10. summation = sum(self.strategy)
  11. if summation > 0:
  12. # normalise
  13. self.strategy /= summation
  14. else:
  15. # uniform distribution to reduce exploitability
  16. self.strategy = np.repeat(1 / RPS.n_actions, RPS.n_actions)
  17.  
  18. self.strategy_sum += self.strategy
  19.  
  20. def learn_avg_strategy(self):
  21. # averaged strategy converges to Nash Equilibrium
  22. summation = sum(self.strategy_sum)
  23. if summation > 0:
  24. self.avg_strategy = self.strategy_sum / summation
  25. else:
  26. self.avg_strategy = np.repeat(1/RPS.n_actions, RPS.n_actions)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement