Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Wei Dai wrote:
- > It seems like we really need a theory of games that tells us (human beings) how to play games with superintelligences.
- == Drescher's bounded Newcomb variant ==
- Drescher's bounded Newcomb variant involves a relatively simple
- predictor, who can only use high-level theories of the agent (player),
- but can in no way simulate it step-by-step, and a sufficiently
- computationally powerful agent who can simulate the predictor even
- step-by-step. Extracting this thought experiment from the original
- discussion of UDT, the interesting question is: how can the agent win?
- What should its algorithm be (what should the agent do) for that to
- happen?
- The problem is that the predictor knows that the agent can infer its
- prediction, and so apparently can two-box in either case (the set-up
- resurrects CDT-style reasoning). As a result, the predictor supposedly
- predicts two-boxing, and so the agent can't win the one-boxing prize.
- A positive resolution of this problem would be to give an explanation
- of how the agent can nonetheless win, and what's required for that to
- happen.
- == A human trading with UFAI ==
- Let's consider a similar scenario: a "predictor" is a human
- (programmer) who has developed an UFAI (AGI with human-indifferent
- preference). The programmer can either run the UFAI, or not, this
- corresponds to the effect of predictor's prediction in the Newcomb
- variant. The UFAI can either give programmer's preference a sizable
- chunk of control over the future, or not. This corresponds to the
- agent's decision to one-box (cooperate) or two-box (defect). If
- programmer predicts UFAI to cooperate, he runs it, otherwise doesn't.
- Naturally, UFAI is smart enough to analyze the programmer through and
- through, so we have a good analogy with the bounded Newcomb variant.
- Now, is it possible to set up this system so that the programmer can
- predict what UFAI will do, with UFAI still choosing a course of action
- to programmer's satisfaction? That would be a recipe for FAI, with
- programmer's control over UFAI's decision acting as channeling of
- FAI's preference, and UFAI's control over the future giving power to
- that leverage. It's in UFAI's interest to make this deal possible, so
- is there a way to fix it up?
- == On bargaining power ==
- An obvious problem is that whether UFAI gives control to programmer's
- preference is a very tricky statement that is itself FAI-complete, and
- could be essentially more difficult to formalize than a simple
- decision to one-box/two-box. On the other hand, this at least shows an
- interesting pattern for magnifying control using only ability to
- estimate the value of the outcome (which seems to be the right
- intuition for the nature of bargaining power in bargaining games in
- general). If you can predict what an UFAI (under your power to
- terminate) will do, you can force it into doing anything you want, as
- long as UFAI receives at least a bit of its own value. And it's in
- UFAI's interest to be predictable.
- (See discussion leading to the following comment for the original context:)
- http://lesswrong.com/lw/2df/what_if_ai_doesnt_quite_go_foom/26nz?context=1
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement