Many of my friends are not involved in EA. Sometimes, it can be hard for them to understand why issues like AI alignment might be relevant. And it is not on them. I can get easily lost in jargon and details. I tend to use overcomplicated examples – while failing to outline the basics. The last time I tried to explain AI alignment, a friend told me I sounded like someone who just stumbled out of an echo chamber. I suspect some of you might have similar problems.
I want to do better than that. So I decided to attempt creating an easy-to-follow visual representation of AI misalignment that you can use as a stepping stone with your non-EA friends and family! (If you end up showing this to your non-EA relations, do forward any feedback. Thanks!)
I deliberately did not go into potential catastrophic outcomes associated with misaligned AGI. If a regular person heard about such scenarios, their first thought might be that it was crazy. You probably should not start with a crazy-sounding conclusion. From my perspective, it’s better to build common ground by explaining fundamental terms and incrementally building up the tower of assumptions.
Now onto the main dish!
Telling AI to Draw Me a Fist
The internet is abuzz with an open-source AI implementation that can create art. It’s called Stable Diffusion. You feed it a text prompt – it generates art following said prompt. I wanted Stable Diffusion to draw an anatomically correct fist for me, using a pastel, evening-themed colour scheme. I added some extra tags to make it adhere to my stylistic requirements (spoiler alert – or so I thought). Here’s the exact wording that I used:
hand balled into a fist, hand, illustration, evening colors, violet, pastel, art by akihiko yoshida and alphonse mucha, RossDraws, intricate, sharp focus, greg rutkowski, artgerm, trending on artstation, ethereal, cinematic still, ilya kuvshinov, stylized
Clear enough, isn’t it? Well, let’s see what Stable Diffusion had to say about this.
I was pretty impressed by my first result! Granted, it’s not a fist. But the anatomy looks about right. I can see why the hand is holding a sphere – my prompt did include the word balled. From here, however, it went downhill.
Here’s another one of the better results:
The hand… does remind me of a fist. I guess? Why is there a pretty girl in the background, though? I don’t remember ordering a person. The colour palette is solidly hitting my expectations, but that’s not enough to redeem this picture.
Now let us look at the image taking third place:
This ambiguously gendered person’s hands do appear to clench into fists. Right idea (at least about the hands)! But look too closely, and you realize that the hands aren’t quite anatomically correct. I count only four fingers on each hand, unfortunately.
If I learned anything from this trial, it’s that my prompt was not specific enough. From here on, I decided to get explicit about what I wanted from my pictures!
Trial No 2
Stable Diffusion did not interpret hand balled into a fist as I intended. So I settled on fist. Additionally, I decided to include hand study in an attempt to increase anatomical precision.
Here’s the full prompt:
fist, hand, hand study, illustration, evening colors, violet, pastel, art by akihiko yoshida and alphonse mucha, RossDraws, intricate, sharp focus, greg rutkowski, artgerm, trending on artstation, ethereal, cinematic still, ilya kuvshinov, stylized
Let’s see what Stable Diffusion had to say about that!
This time around, the anatomy is acceptable. It is clearly better than most images of the first generation – for one, there are no people in the picture. I also like the cosmic horror vibe. However, I don’t see a fist here. Onto the next try!
In this attempt, one digit got lost along the way. Additionally, Stable Diffusion warped some of the fingers. The most disappointing aspect is the lack of a fist.
This specimen nailed the colour palette – and we’re getting to the right pose! If the thumb curled in, it would be close to perfect. Unfortunately, it’s cropped in a way that doesn’t display the whole of my desired object. Still, this is the best attempt of the second generation by far. Most of its siblings, however, looked something like this:
Why did I not get a good result?
Maybe the AI model is not well trained for hands. Perhaps an all-around image generator like this needs additional tweaks to draw convincing hands. Another possibility is that I didn’t correctly specify what I wanted. Let’s take a closer look at the second generation’s prompt:
fist, hand, hand study, illustration, evening colors, violet, pastel, art by akihiko yoshida and alphonse mucha, RossDraws, intricate, sharp focus, greg rutkowski, artgerm, trending on artstation, ethereal, cinematic still, ilya kuvshinov, stylized
There’s a commonality between the bold keywords – I don’t understand what they do. I just used them because the prompts of other cool pictures had included them.
Here’s a rough outline of my reasoning:
I want an image of a fist.
It should have an evening-inspired colour palette.
It should be drawn in a cool style. I don’t know how to express that in my text prompt.
I’m not completely aware that I want the picture to have a cool style. That makes expressing it even more difficult!
The bold keywords were part of prompt that resulted in super cool pictures, so let’s include them. Surely, they will make my images look super cool, too!
For several reasons, I had trouble expressing what I wanted Stable Diffusion to generate. There was a gap between my idea of the result and my wording. There was a gap between what I expressed explicitly and what I implicitly wanted to happen. While a human could guess what I was aiming for, the AI system couldn't deliver something that lived up to my expectations.
Stable Diffusion may have some technical problems with hands. However, it’s also true that my prompt wasn’t an accurate reflection of the image I envisioned. In a way, I sentenced Stable Diffusion to fail. In part, it misbehaved because I made mistakes in describing my goals. That gap – the gap between what we want an AI to do and what it does end up doing – is called misalignment.
I liked your approach to explaining - concentrating on a single idea, through a realistic example!