Advertisement
Guest User

Untitled

a guest
Sep 22nd, 2017
457
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 20.67 KB | None | 0 0
  1. [04:22] *** now talking in #math
  2. [04:22] *** topic is Karlo's "Intro to Game Theory" talk, now in progress
  3. [04:22] *** set by Karlo_ on Thu Jul 07 04:16:23 2011
  4. [04:22] *** channel #math mode is +tnRl 91
  5. [04:22] *** channel created at Sun Mar 23 15:09:27 1997
  6. [04:22] <imanoffs> why do you loose logic when you mean smthg alone
  7. [04:23] <imanoffs> r u sure that your freedom likes math?
  8. [04:23] <imanoffs> i cannot figure that
  9. [04:23] <@Karlo_> imanoffs: Stop it, please
  10. [04:23] *** FullBorg2B (~Nosh@ppp121-45-171-179.lns20.syd6.internode.on.net) joined
  11. [04:24] <@FatherPi> back
  12. [04:24] <@FatherPi> sorry
  13. [04:24] <@FatherPi> poker!
  14. [04:25] <platypine> craps
  15. [04:25] <@FatherPi> backgammon
  16. [04:25] <@FatherPi> chess, go
  17. [04:25] *** kmh quit (Quit: Verlassend)
  18. [04:25] <@Karlo_> Rock-Paper-Scissors
  19. [04:25] <@Karlo_> Nim
  20. [04:25] <platypine> tactix
  21. [04:25] <@Karlo_> OK, that's enough of a collection for now
  22. [04:26] <@Karlo_> Q: How could we classify these games into different types? Pick some attribute which would distinguish between two games, and which would allow us to partition the set in some way. There will be several possible ways to do this!
  23. [04:26] <platypine> board game, card game
  24. [04:26] *** Omega_Noshpt quit (Ping timeout)
  25. [04:26] <@kororaa> ones based on strategy vs. ones based on luck
  26. [04:26] <@Karlo_> Or "Random vs Deterministic", sure
  27. [04:26] <disbeLIEf> kororaa: and poker is both!
  28. [04:27] <Fatal-> material's involved, number of players, social or skill based
  29. [04:27] <Fatal-> err, no '
  30. [04:27] <@Karlo_> "Perfect information vs Incomplete information" is important, too -- incomplete information can add an effective random element.
  31. [04:27] <@Karlo_> OK, let's call that Solitaire vs Two-player vs Multi-player
  32. [04:28] <@Karlo_> And Physical vs Mental
  33. [04:28] <disbeLIEf> What about magnitude of complexity?
  34. [04:28] <@kororaa> you could add sub categories to multiplayer
  35. [04:28] <@kororaa> teams or not
  36. [04:28] <@Karlo_> True, but let's not go that far for now.
  37. [04:29] <@FatherPi> monetary component.
  38. [04:29] <disbeLIEf> or however you want to phrase that...as in simple (tic-tac-toe) vs complex (Go)...ok :P
  39. [04:29] <@Karlo_> Sequential vs Simultaneous is a useful distinction.
  40. [04:29] *** meingtsla (retrograde@facepalm.jpe.gs) joined
  41. [04:29] <@Karlo_> Easy vs Hard, OK, though that's subjective.
  42. [04:30] <@Karlo_> Symmetric vs Asymmetric (whether there's any distinction between players)
  43. [04:31] <@Karlo_> Impartial vs Partisan -- that's not so well known, but Nim is an example of an impartial game, which means that at any point, the set of moves available depends only on the board state, not whose turn it is. (The set of legal moves for one player is the same as for the other.)
  44. [04:31] <@Karlo_> Single-turn vs Multi-turn
  45. [04:31] <@Karlo_> Zero-sum vs Nonzero-sum (including competitive vs cooperative)
  46. [04:32] <@Karlo_> Finite strategies vs Infinite strategies
  47. [04:32] <@Karlo_> Bounded duration vs Unbounded duration
  48. [04:32] <@Karlo_> That's plenty
  49. [04:33] <@FatherPi> [someone once claimed that of all major sports, only baseball was unbounded duration]
  50. [04:33] <@kororaa> quidditch
  51. [04:33] <@Karlo_> In general, a game has some number of players, some decisions, some information flow, and eventually an outcome denoted by a numerical payoff.
  52. [04:33] <@Karlo_> For *this* talk, we'll be dealing with the two-player finite-strategy zero-sum game. As the name suggests, we're making the following assumptions:
  53. [04:33] <@Karlo_> [1] There are exactly two players, P1 and P2.
  54. [04:33] <@Karlo_> [2] Each player has only a finite number of options ("pure strategies").
  55. [04:34] <@Karlo_> [3] No value flows into or out of the game; if P1 wins some amount, then P2 loses that same amount.
  56. [04:34] <@Karlo_> Violating any of #1-3 would put us into an entirely different class of game. We'll also be assuming:
  57. [04:34] <@Karlo_> [4] There's no feedback, and only one move, both players choosing simultaneously.
  58. [04:34] <@Karlo_> (Theoretically, this one isn't a real restriction. Even though Chess, for example, is normally played out over multiple moves, the players could enumerate *all* of the situations that they could possibly find themselves in, and then let a referee determine what the actual outcome would be. The total number of strategies would be finite, though astronomical!)
  59. [04:35] <@Karlo_> Q: Check the list of games from earlier -- which ones satisfy the constraints here?
  60. [04:35] <@kororaa> chess
  61. [04:35] <disbeLIEf> Hmmmmm, almost, kororaa
  62. [04:35] <disbeLIEf> Both players don't choose simultaneously in chess; they alternate.
  63. [04:36] <@Karlo_> With the astronomical collapsing into a single play, it works.
  64. [04:36] *** nsh (~nsh@02dbd7da.bb.sky.com) joined
  65. [04:36] <Fatal-> rock paper scissors
  66. [04:36] <@FatherPi> heads up poker, if no table rake...Q: even with rake, can we categorize that as zero-sum (with percentage rake)?
  67. [04:37] *** ^LoNeR^ (~omasaveu@pool-184-19-137-121.clrkwv.dsl.ncnetwork.net) joined
  68. [04:37] *** X sets channel #math mode +o ^LoNeR^
  69. [04:38] *** Tietze (~omasaveu@pool-184-19-137-121.clrkwv.dsl.ncnetwork.net) joined
  70. [04:38] <@Karlo_> You could model the table rake as a fee that all players pay in advance, perhaps, and then *after* the fee, the remainder of the game is zero-sum.
  71. [04:38] *** Tietze quit (Quit )
  72. [04:38] <@FatherPi> smart.
  73. [04:39] <@Karlo_> A single round of Rock-Paper-Scissors is definitely within the scope.
  74. [04:40] <@Karlo_> Q: Let's suppose that P1 has m pure strategies, and P2 has n. How many possible outcomes can the game have?
  75. [04:40] <@FatherPi> mn?
  76. [04:40] <@Karlo_> Yep.
  77. [04:40] <@Karlo_> There are m*n outcomes, each of which can be represented by a number indicating how much P1 gains (and P2 loses). If it's negative, then P1 is losing it and P2 is gaining it.
  78. [04:41] <@Karlo_> So, the complete rules for any game of this class can be summarized by an m by n grid of numbers, like a Bingo card. The mathematical object is a matrix, but we won't be doing any matrix operations today.
  79. [04:41] <@Karlo_> Here's an example where P1 has three strategies, A B C, and P2 has four, W X Y Z:
  80. [04:41] <@Karlo_> W X Y Z
  81. [04:41] <@Karlo_> +------------
  82. [04:41] <disbeLIEf> rock-paper-scissors has exactly 9 outcomes with 3 possible results (P1 wins, P2 wins, or draw.)
  83. [04:41] <@Karlo_> A | -4 -3 2 6
  84. [04:41] <@Karlo_> B | 4 -2 1 2
  85. [04:41] <@Karlo_> C | 6 -5 0 -7
  86. [04:41] <disbeLIEf> all of equal probability :P
  87. [04:42] <@Karlo_> (I'll repeat that to make sure it came through cleanly)
  88. [04:42] <@Karlo_> W X Y Z
  89. [04:42] <@Karlo_> +------------
  90. [04:42] <@Karlo_> A | -4 -3 2 6
  91. [04:42] <@Karlo_> B | 4 -2 1 2
  92. [04:42] <@Karlo_> C | 6 -5 0 -7
  93. [04:42] <@Karlo_> To play this, P1 will pick one of the three rows, and simultaneously P2 will pick one of three columns, and then the number in that row and column will be the payoff (from P2 to P1).
  94. [04:42] <@Karlo_> Q: Which of those 12 outcomes would each player most like to occur?
  95. [04:43] <@FatherPi> +6
  96. [04:43] <disbeLIEf> so positive numbers are benefit to P1 and negative to P2?
  97. [04:43] <@Karlo_> Correct
  98. [04:43] <disbeLIEf> hmmmmmmm
  99. [04:44] <@FatherPi> oh, FROM p2 to p1
  100. [04:44] <@FatherPi> k
  101. [04:44] <disbeLIEf> BTW did you mean P2 picks one of four columns?
  102. [04:44] <@Karlo_> Oops, yes.
  103. [04:44] <@kororaa> nice catch
  104. [04:45] *** pedro3005 (~pedro@189.25.80.204) joined
  105. [04:45] <@Karlo_> P1's favorite spot is either CW or AZ, where he would win 6 units. P2's favorite spot is CZ, where P1 would lose 7 units, and hence P2 would win that amount.
  106. [04:45] <@Karlo_> To reiterate, P1 is trying to *maximize* the payoff, while P2 is trying to *minimize* it.
  107. [04:46] <@Karlo_> If we were to actually play this game...
  108. [04:46] <disbeLIEf> if you add the values in the rows, the best row for P1 is B, and if you add the values in the columns, the best column for P2 is X. Of course, P1 picks B and P2 picks X, P2 wins :P
  109. [04:46] <@Karlo_> True.
  110. [04:47] <@Karlo_> Adding the values turns out to not be so useful a metric, though.
  111. [04:47] <disbeLIEf> (naturally; it presumes the opponent's response is random.)
  112. [04:47] <@Karlo_> Exactly.
  113. [04:47] <@Karlo_> It still might be a good first approximation!
  114. [04:48] <@Karlo_> But for now, let's try a different idea.
  115. [04:48] <@Karlo_> P1 thinks, "What's the *worst* that can happen? If I choose A, P2 might have chosen W, and I'll lose 4. Or if I choose B, then P2 might have chosen X, and I'll lose 2. Or if I choose C, then P2 might have chosen Z, and I'll lose 7."
  116. [04:48] <@Karlo_> P1 thinks, "If I assume the worst case behavior happens, then the payoff will be one of -4, -2, or -7. Of those three choices, I prefer the -2, so I'd better go with strategy B."
  117. [04:48] <@Karlo_> Meanwhile, P2 has similar thoughts. "If I pick W, then P1 might have chosen C for a payoff of 6; if I pick X, he might have chosen B for a payoff of -2; if I choose Y, he might have chosen A for a payoff of 2; if I choose Z, he might have chosen A for a payoff of 2. So if the worst case behavior happens, then the payoff will be one of 6, -2, 2, 2. Of those four choices, I prefer the -2, so I'd better go with strategy X."
  118. [04:48] <@Karlo_> P1's worst case scenario and P2's worst case scenario are both BX, with payoff of -2. This is a "saddle point" or "minimax" -- it's the smallest entry in row B, and the largest entry in column X.
  119. [04:49] <@Karlo_> This pair of strategies is stable: even if the other player *knows* what you're going to pick, he has no incentive to change his strategy. The players are going to stick with B and X, and the payoff will be -2.
  120. [04:49] <@Karlo_> Q: Why do you suppose P1 playing this game at all, if the payoff is negative?
  121. [04:50] <@kororaa> for fun
  122. [04:50] <@kororaa> or because P1 didn't do the math
  123. [04:50] <@kororaa> oh well i guess he did
  124. [04:50] <@kororaa> [¬є-°]¬
  125. [04:51] <@Karlo_> When a game results in a nonzero payoff, one way to interpret that is that the value is the amount that P1 should be willing to pay, or the amount that P2 would accept in payment, to play the game. In this case with a negative value, P1 might agree to play the game if P2 agrees to pay him 2. Another possibility is that after the game is finished, they're going to switch roles and play the same game again.
  126. [04:51] <@Karlo_> But usually we just take it as given that the game *is* going to happen as described, and choosing not to play is not an option, _War Games_ notwithstanding.
  127. [04:51] <@Karlo_> If we were to add 10 to each element of the matrix, then all payoffs would be positive, and then it's P2 who might be reluctant to play it without being promised some compensation. But it's not going to affect the strategy -- P1 will still like B best, and P2 will still like X best.
  128. [04:52] <@Karlo_> Does that make sense?
  129. [04:52] <@FatherPi> yep
  130. [04:52] * disbeLIEf nods
  131. [04:52] <Fatal-> why would they both play conservatively
  132. [04:52] <@kororaa> yes
  133. [04:53] <@Karlo_> In this case, the conservative play simply led us quickly to the saddle point. Once they get there, there's nothing better that either of them can do.
  134. [04:53] <@Karlo_> Now let's consider a different game, with a smaller matrix but different behavior.
  135. [04:53] <@Karlo_> You're relaxing on a cruise ship, and a well-dressed stranger engages you in a game of "Match". The two of you will simultaneously throw one of two hand signals, "Up" or "Down". If they match, then he'll pay you $5. If they're opposite, then you'll pay him $5. So the matrix looks like this:
  136. [04:54] <@Karlo_> U D
  137. [04:54] <@Karlo_> +-------
  138. [04:54] <@Karlo_> U | 5 -5
  139. [04:54] <@Karlo_> D |-5 5
  140. [04:54] <disbeLIEf> Regarding the last game ....P1 is clearly screwed in this game. I thought of another strategy; adding the best and worst-case scenarios together, and it still ended up with P1 losing (P1 picks row A because 6-4=2, and P2 picks column X because -5. The problem is, P2 -always- wins in column 2.
  141. [04:54] *** PlusTAX changed nick to Checksum
  142. [04:54] * disbeLIEf listens
  143. [04:55] <@Karlo_> It's true that P2 always wins in that column -- but the important point is that the payoff can be *limited* to a mere -2.
  144. [04:56] <@Karlo_> If we were to add 10 to each entry, as I suggested before, then the game would favor P1. But we'd still be interested in *how much* P1 would gain -- and in that case, with all the entries being positive, P1 comes out ahead no matter which row he picks.
  145. [04:56] <disbeLIEf> rightright. on to #2...which looks to be more balanced than flipping a coin.
  146. [04:56] <@Karlo_> But P1 still wants to maximize the payoff, and P2 wants to minimize it. Don't think of zero as being a significant dividing line, unless you already know that the game has a value of zero (aka "fair").
  147. [04:57] <@Karlo_> So, yes, on to the matching game.
  148. [04:57] <@Karlo_> This time, if you adopt the pessimistic train of thought, you'll find that your worst case payoff is -5; but unlike the previous example, it doesn't represent a saddle point.
  149. [04:57] <@Karlo_> In this game, if you prefer U, then your opponent will prefer D, which will make you prefer D, which will make your opponent prefer U, which will make you prefer U...
  150. [04:58] <@Karlo_> You now have a dilemma. Surely there's a logical strategy for you to apply. But if it's logical, then your opponent could figure it out as well. This suggests that to thwart your strategy being discovered, you should do something illogical. But that would defy logic!
  151. [04:58] <@Karlo_> The answer, as you probably know, is to pick your strategy *randomly* -- hence illogically and unpredictably -- but to choose the randomness in a logical way!
  152. [04:59] <@Karlo_> We can introduce a third strategy: "Flip a coin, unseen, to decide between U and D". This is called a "mixed strategy"; it incorporates elements from multiple pure strategies. If we explicitly add this to the matrix, we get:
  153. [04:59] <@Karlo_> U R D
  154. [04:59] <@Karlo_> +-------
  155. [04:59] <@Karlo_> U | 5 0 -5
  156. [04:59] <@Karlo_> R | 0 0 0
  157. [04:59] <@Karlo_> D |-5 0 5
  158. [04:59] <@Karlo_> and now the RR outcome is a saddle point.
  159. [04:59] <@FatherPi> Q: i thought one of your conditions was NO feedback...isn't observation of a prior move feedback?
  160. [05:00] <@Karlo_> This isn't feedback -- it's second-guessing.
  161. [05:00] <@FatherPi> gotcha
  162. [05:00] <@Karlo_> Note that the new matrix entries represent *expected values*. The game never has an actual payoff of 0, but with either player using the R strategy, it will *average* 0 in the long run. Again, this is a standard assumption, and it makes sense if you're going to be repeating the same game a large number of times.
  163. [05:01] <@Karlo_> After a few friendly rounds, the stranger offers to make the game more interesting. In case of a mismatch, you pay him $5, as before; but now a match on "Up" is worth $10, while a match on "Down" is worth only $1. Thus, the matrix looks like this:
  164. [05:01] <@Karlo_> U D
  165. [05:01] <@Karlo_> +-------
  166. [05:01] <@Karlo_> U |10 -5
  167. [05:01] <@Karlo_> D |-5 1
  168. [05:02] <@Karlo_> Perhaps you think that he's mistakenly given you an advantage, since 10+1 > 5+5. Let's take a closer look.
  169. [05:02] <@FatherPi> hehe
  170. [05:03] <@Karlo_> Worst case scenario in the original matrix is still -5, as before. What happens when we add the "R" strategy based on a coin flip? We'll call that "1/2 U + 1/2 D" this time, for more precision:
  171. [05:03] <@Karlo_> U D
  172. [05:03] <@Karlo_> +---------
  173. [05:03] <@Karlo_> U |10 -5
  174. [05:03] <@Karlo_> 1/2 U + 1/2 D | 2.5 -2
  175. [05:03] <@Karlo_> D |-5 1
  176. [05:03] <@Karlo_> OK, now the worst case scenario is down to -2, so this is an improvement. It looks like we can do even better by having less U and more D...
  177. [05:03] <@Karlo_> U D
  178. [05:03] <@Karlo_> +---------
  179. [05:03] <@Karlo_> U |10 -5
  180. [05:03] <@Karlo_> 1/2 U + 1/2 D | 2.5 -2
  181. [05:03] <@Karlo_> 1/4 U + 3/4 D |-1.25 -0.25
  182. [05:03] <@Karlo_> D |-5 1
  183. [05:03] <@Karlo_> Now the worst case scenario is -1.25, but we seem to have gone too far.
  184. [05:03] <@Karlo_> We could try fiddling with a huge number of possible mixed values; it turns out that the best we can do is a mix "2/7 U + 5/7 D", and the payoff will be -5/7. Hey! This game favors the stranger, after all!
  185. [05:04] *** Karlo_ changed nick to SockPuppet_
  186. [05:04] <@SockPuppet_> But Karlo, it sounds like you had to check lots and lots of fractions to find the best mix. Isn't there a better way to solve that game??
  187. [05:04] *** SockPuppet_ changed nick to Karlo_
  188. [05:04] <@Karlo_> That's a very good question! Let's apply some algebra, instead of a brute force search.
  189. [05:05] <@Karlo_> The mixed strategy is of the form "p U + q D", where p and q are complementary probabilities -- that is, p + q = 1. The payoffs for that row will be p times the U row, plus q times the D row:
  190. [05:05] <@FatherPi> [sockpuppet...i luv it!!]
  191. [05:05] <@Karlo_> U D
  192. [05:05] <@Karlo_> +---------------
  193. [05:05] <@Karlo_> | 10 -5
  194. [05:05] <@Karlo_> p U + q D | 10p-5q -5p+1q
  195. [05:05] <@Karlo_> D | -5 1
  196. [05:05] <@Karlo_> If our opponent correctly guesses what (p,q) mix we're using, then what would he do about it? If he chooses U, the payoff will be 10p-5q; if he chooses D, it will be -5p+1q. He's going to choose whichever is smaller.
  197. [05:05] <@Karlo_> If p is too high, then -p+1q will be small and he'll pick that; if p is too low, then 10p-5q will be small and he'll pick that. The minimax occurs when the two are equal: 10p-5q = -5p+1q.
  198. [05:06] <@Karlo_> Some elementary algebraic manipulation turns that into 15p = 6q, or 5p = 2q; and since we also have p + q = 1, that gives us p = 2/7, q = 5/7.
  199. [05:06] <@Karlo_> To summarize, what's happening here is that we're finding a mixed strategy that has the property that once we've chosen it, it *doesn't matter* what the other player does -- all of his strategies produce the same value.
  200. [05:06] *** Karlo_ changed nick to SockPuppet_
  201. [05:06] <@SockPuppet_> But, we're losing 5/7 each time we play the game! Maybe there's a different strategy that does better?
  202. [05:06] *** SockPuppet_ changed nick to Karlo_
  203. [05:06] <@Karlo_> We've found a mixed strategy for P1 that *guarantees* a payoff of -5/7 *no matter what* P2 does. This means that the "value of the game" -- the payoff when both players do the best possible -- is *at least* -5/7.
  204. [05:07] <@Karlo_> If we do the same analysis on the columns instead of the rows, we would find that P2 has a mixed strategy that *also* guarantees a payoff of -5/7, no matter what P1 does. This means that the value of the game is *at most* -5/7.
  205. [05:07] *** Karlo_ changed nick to SockPuppet_
  206. [05:07] <@SockPuppet_> Oh, so the value must be exactly -5/7, then! Does it always come out the same for both sides, like that?
  207. [05:07] *** SockPuppet_ changed nick to Karlo_
  208. [05:08] <@Karlo_> Yes! This is John von Neumann's "Minimax Theorem" -- for games of the class we're considering here, there is always some value V such that P1 can guarantee at least V, and P2 can guarantee at most V.
  209. [05:08] <@kororaa> i see
  210. [05:10] <@FatherPi> But what if I think the other player *isn't* playing that strategy? I know someone that I can almost always beat in Rock-Paper-Scissors.
  211. [05:11] <@Karlo_> Yes, it's true that if your opponent is *known* to be playing non-optimally, then it's possible to exploit that by using a non-optimal strategy yourself. But in doing so, you're also introducing the possibility for him to exploit *your* non-optimality. This might be more suited to psychology than to math.
  212. [05:11] <@Karlo_> You would be second-guessing his moves in order to make your own move, and then he might start predicting what you think he thinks, and if he adjusts his strategy to match, then you would have to predict what he thinks you think he thinks... Ultimately, it could only become stable when you each play your "optimal strategy", the one that guarantees the payoff V.
  213. [05:12] *** joo (~joo@188-220-73-95.zone11.bethere.co.uk) joined
  214. [05:12] <@Karlo_> Let's take a look at Rock-Paper-Scissors. The rules are simple: Each player throws one of the three hand signals; if they match, the game is a draw, otherwise, Paper covers Rock, Rock blunts Scissors, Scissors cuts Paper. It's just a simple win/lose/draw outcome, so we'll use +1/-1/0 for the payoffs. Here's the matrix:
  215. [05:12] <@Karlo_> R P S
  216. [05:12] <@Karlo_> +------------
  217. [05:12] <@Karlo_> R | 0 -1 1
  218. [05:12] <@Karlo_> P | 1 0 -1
  219. [05:12] <@Karlo_> S | -1 1 0
  220. [05:13] <@Karlo_> I'm sure it's no surprise to anybody that the value of the game is zero. The strategy that guarantees this outcome is (1/3 R + 1/3 P + 1/3 S). As noted above, you can do better *if* you're certain that your opponent has weaknesses to exploit; but this is the unique strategy that guarantees that *you* have no exploitable weakness.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement