(If you haven’t read my entry for April 23 yet, please do so—this entry is a follow-on, now that I’ve had a chance to do a little more research.)
AI image generators can’t draw hands worth a rat’s heiny. That’s the lesson I took away from my efforts some days ago, trying to see if any of the AI imagers could create an ebook cover image for my latest novelette, “Volare!” It wasn’t just me, and it wasn’t just the two image generators I tried. If you duckduck around the Web you’ll find a great many essays asking “Why can’t AIs draw hands and feet?” and then fail to answer the question.
The standard answer (and it’s one I can certainly accept, with reservations) is that human hands are very complicated machines with a lot of moving parts and a great many possible positions. I would argue that an infinite variety of positions is what hands are for—and are in fact the reason that we created a high-tech civilization. Even artists have trouble drawing hands, and to a lesser extent, feet. This is a good long-form tutorial on how to draw hands and feet. Not an easy business, even for us.
In photographs and drawn/painted art, hands are almost always doing things, not just resting in someone’s lap. And in doing things, they express all those countless positions that they take in ordinary and imaginary life. So if AIs are trained by showing them pictures of people and their hands, some of those pictures will show parts of hands occluded by things like beer steins and umbrella handles, or—this must be a gnarly challenge—someone else’s hands. In some pictures, it may look like hands have four fingers, or perhaps three. Fingers can be splayed or together and clenched against their palm. AIs are pattern matchers, and with hands and especially fingers, there are a huge number of patterns.
So faced with too many patterns, the AI “guesses,” and draws something that violates one or more traits of all hands.
The most serious flaw in this reasoning comes from elsewhere in the body: feet. In the fifty-odd images the AIs created of a barefoot woman sitting in a basket, deformed feet were almost as common as deformed hands. This is a lot harder to figure, for this reason: feet have nowhere near the number of possible positions that hands have. About the most extreme position a foot can have is curled toes. Most of the time, feet are flat on the floor, and that’s all the expressive power they have. This suggests that AIs should have no particular trouble with feet.
But they do.
I’ll grant that in most photos and art, feet are in shoes, while hands generally go naked except in bad weather or messy/hazardous work. So there are fewer images of feet to train an AI. I had an AI gin up some images this morning from the following description: “A woman sitting in a wicker basket in a nightgown, wearing ballet slippers.” I did five or six, and the best one is below:
Her left leg seems smaller than her right, which is a different but related problem with AI images. And her hands this time, remarkably, are less grotesque than her arms. But add some ballet slippers, and the foot problem goes away. The explanation should be obvious: In a ballet slipper, all feet look more or less alike. The same is likely the case for feet in Doc Martin boots or high-top sneakers. (I may or may not ask an AI for an image of a woman in sandals, because I think I already know what I’d get.)
There were other issues with the images I got back from the two AIs I messed with, especially in faces. Even in the relatively good image above, her face seems a little off. This may be because we humans are very good at analyzing faces. Hands and feet, not so much. Defects there have to be more serious to be obvious.
Anyway. The real problem with AI image generation is that they are piecing together bits of images that they’ve digested as part of their training. They are not creating a wire-frame outline of a human body in a given position and then fleshing it out. At best they’re averaging thousands or millions of images of hands (or whatever) and smushing them together into an image that broadly resembles a human being.
Not knowing the nature of the algorithms that AI image generators use, I can’t say whether this is a solvable problem or not. My guess is that it’s not, not the way the software works today. And this is how we can spot deepfakes: Count fingers. The hands don’t lie.
It made me wonder, what if, rather than an AI, Jeff had a team of artists / models who would compose live on a stage whatever was asked. What decisions would they fill in from the gaps in the request?
For example: “A woman sitting in a wicker basket in a nightgown, wearing ballet slippers.”, they’d get a wicker basket, a woman, etc and have her pose. Would they choose a young woman or a grandma (both of who are equally woman). How would the woman pose? Would it be a nightgown from the 19th century?
I’m impressed the AI: picked a decent sized basket, matched the colour of the nightgown and shoes. But as you point out, it did not seem to know certain rules: that the two legs ought to be the same length, both arms should have the same girth (unless a tennis player)
As a long term Second Life user, I can tell you: *people* don’t get hands, feet, or faces much better for the most part. The image you have above would look damned strange in real life, but you could find a dozen of them on SL—and worse— without trying.
*proportions* are hard. It took me years of actually looking at people (yay for mass transit) to finally learn to see the proportions of a human being, especially the face. By nature, our visual system accentuates the eyes and plays down the nose. It accentuates the face, and plays down the rest of the skull. To be able to create a realistically proportioned person, you have to be able to see past the abstraction your brain automatically does.
It’s probably the same with shading, but I’m not an artist, so I don’t know much about that.
Even artists get proportions wrong. In my studies, working from real human medical data, about half of the artistic ideal for human drawing is simply wrong. If you are male, your forearms are likely *longer* than your upper arms, not so if you’re female, and in fact it’s a gender cue you don’t want to get wrong. The artistic ideal has them the same length. The artistic ideal has the distance from your hip joint to the floor and to your scalp about the same. Very few people’s legs are actually this long. And so on.
And then there’s the matter of photoshop. Most pictures of models are photoshopped, in many cases producing fairly grotesque distortions.
So what’s the problem with AI art?
Garbage in, Garbage out, and AIs have no way of knowing which data is valid.
[…] you recall, last April I posted a couple of entries about my experiments with AI image generators. There were serious problems drawing hands, feet, and […]