"(edited to make clear what this is all about)

Hi! This is Eliezer Yudkowsky, original founder but no-longer-moderator of LessWrong.com and also by not-quite-coincidence the first AI In A Box Roleplayer Guy. I am also the author of "Harry Potter and the Methods of Rationality", a controversial fanfic which causes me to have a large, active Internet hatedom that does not abide by norms for reasoned discourse. You should be very careful about believing any statement supposedly attributed to me that you have not seen directly on an account or page I directly control.

I was brought here by a debate in the comments about "Roko's Basilisk" mentioned in 1450's alt tag. Roko's Basilisk is a weird concept which a false Internet meme says is believed on LessWrong.com and used to solicit donations (this has never happened on LessWrong.com or anywhere else, ever). The meme that this is believed on LessWrong.com or used to solicit donations was spread by a man named David Gerard who made over 300 edits to the RationalWiki page on Roko's Basilisk, though the rest of RationalWiki does seem to have mostly gone along with it.

The tl;dr on Roko's Basilisk is that a sufficiently powerful AI will punish you if you did not help create it, in order to give you an incentive to create it.

RationalWiki basically invented Roko's Basilisk as a meme - not the original concept, but the meme that there's anyone out there who believes in Roko's Basilisk and goes around advocating that people should create AI to avoid punishment by it. So far as I know, literally nobody has ever advocated this, ever. Roko's original article basically said "And therefore you SHOULD NOT CREATE [particular type of AI that Yudkowsky described that has nothing to do with the Basilisk and would be particularly unlikely to create it even given other premises], look at what a DANGEROUS GUY Yudkowsky is for suggesting an AI that would torture people that didn't help create it" [it wouldn't].

In the hands of RationalWiki generally, and RationalWiki leader David Gerard particularly who also wrote a wiki article smearing effective altruists that must be read to be believed, this somehow metamorphosed into a Singularity cult that tried to get people to believe a Pascal's Wager argument to donate to their AI god on pain of torture. This cult that has literally never existed anywhere except in the imagination of David Gerard.

I'm a bit worried that the alt text of XKCD 1450 indicates that Randall Munroe thinks that there actually are "Roko's Basilisk people" somewhere and that there's fun to be had in mocking them (another key part of the meme RationalWiki spreads), but this is an understandable mistake since Gerard et. al. have more time on their hands and have conducted a quite successful propaganda war. With tacit cooperation from a Slate reporter who took everything in the RationalWiki article at face value, didn't contact me or anyone else who could have said otherwise, and engaged in that particular bit of motivated credulity to use in a drive-by shooting attack on Peter Thiel who was heavily implied to be funding AI work because of Basilisk arguments; to the best of my knowledge Thiel has never said anything about Roko's Basilisk, ever, and I have no positive indication that Thiel has ever heard of it, and he was funding AI work long long before then, etcetera. And then of course it was something the mainstream media had reported on and that was the story. I mention this to explain why it's understandable that Munroe might have bought into the Internet legend that there are "Roko's Basilisk people" since RationalWiki won the propaganda war to the extent of being picked up by a Slate reporter that further propagated the story widely. But it's still, you know, disheartening.

It violates discourse norms to say things like the above without pointing out specific factual errors being made by RationalWiki, which I will now do. Checking the current version of the Roko's Basilisk article on RationalWiki, virtually everything in the first paragraph is mistaken, as follows:

    Roko's basilisk is a proposition that says an all-powerful artificial intelligence from the future may retroactively punish those who did not assist in bringing about its existence.

Roko's basilisk was the proposition that a self-improving AI that was sufficiently powerful could do this; all-powerful is not required. Note hyperbole.

    It resembles a futurist version of Pascal's wager; an argument used to try and suggest people should subscribe to particular singularitarian ideas, or even donate money to them, by weighing up the prospect of punishment versus reward.

This sentence is a lie, originated and honed by RationalWiki with the deliberate attempt to smear the reputation of what, I don't know, Gerard sees as an online competitor or something. Nobody ever said "Donate so the AI we build won't torture you." I mean, who the bleep would think that would work even if they believed in the Basilisk thing? Gerard made this up.

    Furthermore, the proposition says that merely knowing about it incurs the risk of punishment.

This is a bastardization of work that I and some other researchers did on Newcomblike reasoning in which, e.g., we proved mutual cooperation on the oneshot Prisoner's Dilemma between agents that possess each other's source code and are simultaneously trying to prove theorems about each other's behavior. See http://arxiv.org/abs/1401.5577 The basic adaptation to Roko's Basilisk as an infohazard is that if you're not even thinking about the AI at all, it can't see a dependency of your behavior on its behavior because you won't have its source code if you're not thinking about it at all. This doesn't mean if you are thinking about it, it will get you; I mean it's not like you could prove things about an enormous complicated AI even if you did have the source code, and it has a resource-saving incentive to do the equivalent of "defecting" by making you believe that it will torture you and then not bothering to actually carry out the threat. Cooperation on the Prisoner's Dilemma via source code simulation isn't easy to obtain, it would be easy for either party to break if they wanted, and it's only the common benefit of cooperation that establishes a motive for rational agents to preserve the delicate conditions for mutual cooperation on the PD. There's no motive on your end to carefully carry out necessary conditions to be blackmailed. (But taking Roko's premises at face value, his idea would zap people as soon as they read it. Which - keeping in mind that at the time I had absolutely no idea this would all blow up the way it did - caused me to yell quite loudly at Roko for violating ethics given his own premises, I mean really, WTF? You're going to get everyone who reads your article tortured so that you can argue against an AI proposal? In the twisted alternate reality of RationalWiki, this became proof that I believed in Roko's Basilisk, since I yelled at the person who invented it without including twenty lines of disclaimers about what I didn't necessarily believe. And since I had no idea this would blow up that way at the time, I suppose you could even read the sentences I wrote that way, which I did not edit for hours first because I had no idea this was going to haunt me for years to come. And then, since Roko's Basilisk was a putatively a pure infohazard of no conceivable use or good to anyone, and since I didn't really want to deal with the argument, I deleted it from LessWrong which seemed to me like a perfectly good general procedure for dealing with putative pure infohazards that jerkwads were waving in people's faces. Which brought out the censorship!! trolls and was certainly, in retrospect, a mistake.)

    It is also mixed with the ontological argument, to suggest this is even a reasonable threat.

I have no idea what "ontological argument" is supposed to mean here. If it's the ontological argument from theology, as was linked, then this part seems to have been made up from thin air. I have never heard the ontological argument associated with anything in this sphere, except on this RationalWiki article itself.

    It is named after the member of the rationalist community LessWrong who most clearly described it (though he did not originate it).

Roko did in fact originate it. Also, anyone can sign up for LessWrong.com, David Gerard has an account there but that doesn't make him a "member of the rationalist community".

And that is just the opening paragraph.

I'm a bit sad that Randall Monroe seems to possibly have jumped on this bandwagon - since it was started by people who were playing the role of jocks sneering at nerds, the way they also sneer at effective altruists, and having XKCD join in on that feels very much like your own mother joining the gang hitting you with baseball bats. On the other hand, RationalWiki has conducted a very successful propaganda campaign here. So it's saddening but not too surprising if Randall Monroe has never heard hinted any version but RationalWiki's. I hope he reads this and reconsiders.


Post 2, a reply to someone else's comment:


(Xixidu runs an anti-Yudkowsky blog and has done so for years.)

Today's motivated failure of reading comprehension, HT FeepingCreature below:

        ...there is the ominous possibility that if a positive singularity does occur, the resultant singleton may have precommitted to punish all potential donors who knew about existential risks but who didn't give 100% of their disposable incomes to x-risk motivation. This would act as an incentive to get people to donate more to reducing existential risk, and thereby increase the chances of a positive singularity. This seems to be what CEV (coherent extrapolated volition of humanity) [Yudkowsky's proposal that Roko was arguing against] might do if it were an acausal decision-maker. So a post-singularity world may be a world of fun and plenty for the people who are currently ignoring the problem, whilst being a living hell for a significant fraction of current existential risk reducers (say, the least generous half). You could take this possibility into account and give even more to x-risk in an effort to avoid being punished.

This does not sound like somebody saying, "Give all your money to our AI project to avoid punishment." Reading the original material instead of the excerpt makes it even more obvious that Roko is posting this article for the purpose of arguing against a proposal of mine called CEV (which I would say is actually orthogonal to this entire issue, except insofar as CEV's are supposed to be Friendly AIs and doin' this ain't Friendly).

Managing to find one sentence, which if interpreted completely out of the context of the surrounding sentences, could maybe possibly also have been written by an alternate-universe Roko who was arguing for something completely different, does not a smoking gun make.

I repeat: Nobody has ever said, "Give money to our AI project because otherwise the future AI will torture you." RationalWiki made this up.

The Observer article is by a reporter whose same article describes a friend of mine as "selling drugs on the Internet" after she'd described herself to the reporter as selling legal, unregulated psychoactives on the Internet. I have no idea whether the alleged event with Mr. Mowshowitz took place in anything remotely like the alleged context, but I expect it was carefully plucked out of context and twisted, and whatever actually happened indicated "fear of muckraking reporter" more than "fear of basilisk".

        RationalWiki basically invented Roko's Basilisk as a meme - particularly the meme that there's anyone out there who believes in Roko's Basilisk and goes around advocating that people should create AI to avoid punishment by it.

    This is a lie: (1) I and others have talked to a bunch of people who were worried about it. (2) Many people have said that they did not read up on it because they fear it might be dangerous.[1] (3) One of your initial reasons for banning Roko's post was for it to not give people horrible nightmares.[2] (4) Roko mentioned in his original post that a person working at MIRI was severely worried by this, to the point of having terrible nightmares.[3] (5) Roko himself wishes that he had never learnt about the idea.[4]

Reading comprehension fail: After being spread by RationalWiki and censorship!! trolls on LessWrong, some other people had nightmares about Roko's Basilisk. Those people did not go around advocating that people donate for that reason. Note the "and" rather than the "or" in my paragraph. I have in any case edited my post above to be clearer about which part RationalWiki made up.

Banning something because other people are having nightmares about it does not require that I or even they believe the thing is true. (I wish, with the hindsight of time travel, that I could go back in time and be like "no this definitely doesn't work" instead of my being cautious about making strong professional assertions of knowledge and saying "I doubt this works", but that is the mere hindsight of timetravel). More importantly, the people who have nightmares about this idea that Roko invented and that you and Gerard spread, and that I certainly never tried to spread and that no nonprofit has ever advocated in any form, do not constitute an AI cult propagating an idea to get donations, as David Gerard claimed to be the case.

(The nightmares may be hard to understand and tempting to mock if you don't have any friends with the flavor of OCD tendencies that react very strongly and oddly with "Don't think about this thought... or you might end up being tortured..." People with this particular flavor of vulnerability can end up with uncontrollable obsessive thoughts even if their System 2 entirely disbelieves the thought, because it's exactly the sort of thing their System 1 seizes on to suddenly say is plausible at 2AM in the morning. RationalWiki et. al. find non-neurotypicals of any stripe to be approved mockery-targets, and hence find it quite amusing to think that they are giving them nightmares.)

Roko's post does not say who "at" SIAI (the early thing that became MIRI) was allegedly having nightmares, over what exactly, nor have I ever heard Roko specify this. But I don't think Roko, unlike some other people involved, is an active liar or given to violating honesty norms, so Roko probably did mean something true by what he wrote. On the other hand Roko has never said anything like "But I didn't invent that idea" even after disowning it later, so I don't think Roko means he got it from someone else either. But note that the thought of being punished by a normative superintelligence does not have to arise from Roko's Basilisk particularly. I know several people, though they are not current or past employees of SIAI, whose brains seem liable to generate punishment scenarios for reasons having nothing to do with timeless decision theory and more along the lines of their brain constantly talking to itself about how they're bad and a powerful strict figure would punish them. There were also a lot of people, not SIAI employees, in an old experimental "Visiting Fellows" program (that didn't work out too well, but did create a couple of positive goods over the course of the experiment) and some of them might have fit the description. It's possible that Roko talked to one of these people about his idea before posting it and gave them nightmares, and it's also possible that Roko ran into someone obsessed about being punished by the ultra-strict thing for other reasons. But I have no indication that anyone before Roko ever talked about Roko's Basilisk with anyone including Roko, and while Roko later said he wished he'd never made his post (understandable considering the consequences to his name) and withdrew from all online rationality forums, again, I have never heard him disavow having invented the idea.

(Except that all sorts of researchers in the field of Newcomblike problems have considered generic blackmail as a Newcomblike problem, of course.)