Untitled

Google’s Gemini issue is not really about woke/DEI, and everyone who is obsessing over it has failed to notice the much, MUCH bigger problem that it represents.

First, to recap: Google injected special instructions into Gemini so that when it was asked to draw pictures, it would draw people with “diverse” (non-white) racial backgrounds.
This resulted in lots of weird results where people would ask it to draw pictures of people who were historically white (e.g. Vikings, 1940s Germans) and it would output black people or Asians.
Google originally did this because they didn’t want pictures of people doing universal activities (e.g. walking a dog) to always be white, reflecting whatever bias existed in their training set.
This is not an unreasonable thing to do, given that they have a global audience. Maybe you don’t agree with it, but it’s not unreasonable.

Google most likely did not anticipate or intend the historical-figures-who-should-reasonably-be-white result.
We can argue about whether they were ok with that unexpected result, but the fact that they decided to say something about it and “do additional tuning” means they didn’t anticipate it and probably didn’t intend for that to happen.
If you have a woke/anti-woke axe to grind, kindly set it aside now for a few minutes so that you can hear the rest of what I’m about to say, because it’s going to hit you from out of left field.

Everyone is obsessed with woke-whatever because it is the culture war of the moment. So everyone thinks this is significant because Google is “captured by woke” or whatever.

No.
That’s not why this is important. This event is not significant for culture war reasons. You just think so because that’s all anyone is thinking about these days, and everyone has missed the real danger.
It doesn’t matter that it made a bunch of woke mistakes and showed you a bunch of black Nazis. It’s not important that free speech or diversity or inclusion or whatever is under attack or taking over or whatever.
This event is significant because it is major demonstration of someone giving a LLM a set of instructions and the results being totally not at all what they predicted.
It is demonstrating very clearly, that one of the major AI players tried to ask a LLM to do something, and the LLM went ahead and did that, and the results were BONKERS.
Do you remember those old Asimov robot stories where the robots would do something really quite bizarre and sometimes scary, and the user would be like WTF, the robot is trying to kill me, I knew they were evil!
And then Susan Calvin would come in, and she’d ask a couple questions, and explain, “No, the robot is doing exactly what you told it, only you didn’t realize that asking it to X would also mean it would do X2 and X3, these seemingly bizarre things.”
And the lesson was that even if we had the Three Laws of Robotics, supposedly very comprehensive, that robots were still going to do crazy things, sometimes harmful things, because we couldn’t anticipate how they’d follow our instructions?
In fact, in the later novels, we even see how (spoiler) the robots develop a “Zeroth Law” where they conclude that it’s a good idea to irradiate the entire planet so that people are driven off of it to colonize the galaxy.
And that’s the scenario where it plays out WELL…. in the end.

There’s a few short stories in between where people are realizing the planet is radioactive and it’s not very pleasant.
Are you getting it?
Woke drawings of black Nazis is just today’s culture-war-fad.

The important thing is how one of the largest and most capable AI organizations in the world tried to instruct its LLM to do something, and got a totally bonkers result they couldn’t anticipate.
What this means is that @ESYudkowsky
 has a very very strong point.

It represents a very strong existence proof for the “instrumental convergence” argument and the “paperclip maximizer” argument in practice.
If this had been a truly existential situation where “we only get one chance to get it right,” we’d be dead.

Because I’m sure Google tested it internally before releasing it and it was fine per their original intentions. They probably didn’t think to ask for Vikings or Nazis.
It demonstrates quite conclusively that with all our current alignment work, that even at the level of our current LLMs, we are absolutely terrible at predicting how it’s going to execute an intended set of instructions.
When you see these kinds of things happen, you should not laugh.
Every single comedic large-scale error by AI is evidence that when it is even more powerful and complex, the things it’ll do wrong will be utterly unpredictable and some of them will be very consequential.
I work in climate change, I’m very pro-tech, and even I think the biggest danger would be someone saying to AI, “solve climate change.”
Because there are already people who say “humans are the problem; we should have fewer humans” so it will be VERY plausible for an AI to simply conclude that it should proceed with the most expedient way to delete ~95% of humans.
That requires no malice, only logic.
Again, I will say this: any time you see a comedic large-scale error by AI, it is evidence that we do not know how to align and control it, that we are not even close.
Because alignment is not just about “moral alignment” or “human values,” it is just about whether a regular user can give an AI an instruction and have it do exactly that, with no unintended results. You shouldn’t need to be Susan Calvin.
I like robots, I like AI, but let’s not kid ourselves that we’re playing with fire here.