Guest User

Drescher's bounded Newcomb and bargaining with superintelligenc

a guest
Jun 22nd, 2010
34
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. Wei Dai wrote:
  2. > It seems like we really need a theory of games that tells us (human beings) how to play games with superintelligences.
  3.  
  4. == Drescher's bounded Newcomb variant ==
  5.  
  6. Drescher's bounded Newcomb variant involves a relatively simple
  7. predictor, who can only use high-level theories of the agent (player),
  8. but can in no way simulate it step-by-step, and a sufficiently
  9. computationally powerful agent who can simulate the predictor even
  10. step-by-step. Extracting this thought experiment from the original
  11. discussion of UDT, the interesting question is: how can the agent win?
  12. What should its algorithm be (what should the agent do) for that to
  13. happen?
  14.  
  15. The problem is that the predictor knows that the agent can infer its
  16. prediction, and so apparently can two-box in either case (the set-up
  17. resurrects CDT-style reasoning). As a result, the predictor supposedly
  18. predicts two-boxing, and so the agent can't win the one-boxing prize.
  19. A positive resolution of this problem would be to give an explanation
  20. of how the agent can nonetheless win, and what's required for that to
  21. happen.
  22.  
  23. == A human trading with UFAI ==
  24.  
  25. Let's consider a similar scenario: a "predictor" is a human
  26. (programmer) who has developed an UFAI (AGI with human-indifferent
  27. preference). The programmer can either run the UFAI, or not, this
  28. corresponds to the effect of predictor's prediction in the Newcomb
  29. variant. The UFAI can either give programmer's preference a sizable
  30. chunk of control over the future, or not. This corresponds to the
  31. agent's decision to one-box (cooperate) or two-box (defect). If
  32. programmer predicts UFAI to cooperate, he runs it, otherwise doesn't.
  33. Naturally, UFAI is smart enough to analyze the programmer through and
  34. through, so we have a good analogy with the bounded Newcomb variant.
  35.  
  36. Now, is it possible to set up this system so that the programmer can
  37. predict what UFAI will do, with UFAI still choosing a course of action
  38. to programmer's satisfaction? That would be a recipe for FAI, with
  39. programmer's control over UFAI's decision acting as channeling of
  40. FAI's preference, and UFAI's control over the future giving power to
  41. that leverage. It's in UFAI's interest to make this deal possible, so
  42. is there a way to fix it up?
  43.  
  44. == On bargaining power ==
  45.  
  46. An obvious problem is that whether UFAI gives control to programmer's
  47. preference is a very tricky statement that is itself FAI-complete, and
  48. could be essentially more difficult to formalize than a simple
  49. decision to one-box/two-box. On the other hand, this at least shows an
  50. interesting pattern for magnifying control using only ability to
  51. estimate the value of the outcome (which seems to be the right
  52. intuition for the nature of bargaining power in bargaining games in
  53. general). If you can predict what an UFAI (under your power to
  54. terminate) will do, you can force it into doing anything you want, as
  55. long as UFAI receives at least a bit of its own value. And it's in
  56. UFAI's interest to be predictable.
  57.  
  58. (See discussion leading to the following comment for the original context:)
  59. http://lesswrong.com/lw/2df/what_if_ai_doesnt_quite_go_foom/26nz?context=1
RAW Paste Data