Drescher's bounded Newcomb and bargaining with superintelligenc

Wei Dai wrote:
> It seems like we really need a theory of games that tells us (human beings) how to play games with superintelligences.

== Drescher's bounded Newcomb variant ==

Drescher's bounded Newcomb variant involves a relatively simple
predictor, who can only use high-level theories of the agent (player),
but can in no way simulate it step-by-step, and a sufficiently
computationally powerful agent who can simulate the predictor even
step-by-step. Extracting this thought experiment from the original
discussion of UDT, the interesting question is: how can the agent win?
What should its algorithm be (what should the agent do) for that to
happen?

The problem is that the predictor knows that the agent can infer its
prediction, and so apparently can two-box in either case (the set-up
resurrects CDT-style reasoning). As a result, the predictor supposedly
predicts two-boxing, and so the agent can't win the one-boxing prize.
A positive resolution of this problem would be to give an explanation
of how the agent can nonetheless win, and what's required for that to
happen.

== A human trading with UFAI ==

Let's consider a similar scenario: a "predictor" is a human
(programmer) who has developed an UFAI (AGI with human-indifferent
preference). The programmer can either run the UFAI, or not, this
corresponds to the effect of predictor's prediction in the Newcomb
variant. The UFAI can either give programmer's preference a sizable
chunk of control over the future, or not. This corresponds to the
agent's decision to one-box (cooperate) or two-box (defect). If
programmer predicts UFAI to cooperate, he runs it, otherwise doesn't.
Naturally, UFAI is smart enough to analyze the programmer through and
through, so we have a good analogy with the bounded Newcomb variant.

Now, is it possible to set up this system so that the programmer can
predict what UFAI will do, with UFAI still choosing a course of action
to programmer's satisfaction? That would be a recipe for FAI, with
programmer's control over UFAI's decision acting as channeling of
FAI's preference, and UFAI's control over the future giving power to
that leverage. It's in UFAI's interest to make this deal possible, so
is there a way to fix it up?

== On bargaining power ==

An obvious problem is that whether UFAI gives control to programmer's
preference is a very tricky statement that is itself FAI-complete, and
could be essentially more difficult to formalize than a simple
decision to one-box/two-box. On the other hand, this at least shows an
interesting pattern for magnifying control using only ability to
estimate the value of the outcome (which seems to be the right
intuition for the nature of bargaining power in bargaining games in
general). If you can predict what an UFAI (under your power to
terminate) will do, you can force it into doing anything you want, as
long as UFAI receives at least a bit of its own value. And it's in
UFAI's interest to be predictable.

(See discussion leading to the following comment for the original context:)
 http://lesswrong.com/lw/2df/what_if_ai_doesnt_quite_go_foom/26nz?context=1