2023 April 28 – Jeff Duntemann's Contrapositive Diary

Feet Have No Excuse

(If you haven’t read my entry for April 23 yet, please do so—this entry is a follow-on, now that I’ve had a chance to do a little more research.)

AI image generators can’t draw hands worth a rat’s heiny. That’s the lesson I took away from my efforts some days ago, trying to see if any of the AI imagers could create an ebook cover image for my latest novelette, “Volare!” It wasn’t just me, and it wasn’t just the two image generators I tried. If you duckduck around the Web you’ll find a great many essays asking “Why can’t AIs draw hands and feet?” and then fail to answer the question.

The standard answer (and it’s one I can certainly accept, with reservations) is that human hands are very complicated machines with a lot of moving parts and a great many possible positions. I would argue that an infinite variety of positions is what hands are for—and are in fact the reason that we created a high-tech civilization. Even artists have trouble drawing hands, and to a lesser extent, feet. This is a good long-form tutorial on how to draw hands and feet. Not an easy business, even for us.

In photographs and drawn/painted art, hands are almost always doing things, not just resting in someone’s lap. And in doing things, they express all those countless positions that they take in ordinary and imaginary life. So if AIs are trained by showing them pictures of people and their hands, some of those pictures will show parts of hands occluded by things like beer steins and umbrella handles, or—this must be a gnarly challenge—someone else’s hands. In some pictures, it may look like hands have four fingers, or perhaps three. Fingers can be splayed or together and clenched against their palm. AIs are pattern matchers, and with hands and especially fingers, there are a huge number of patterns.

So faced with too many patterns, the AI “guesses,” and draws something that violates one or more traits of all hands.

The most serious flaw in this reasoning comes from elsewhere in the body: feet. In the fifty-odd images the AIs created of a barefoot woman sitting in a basket, deformed feet were almost as common as deformed hands. This is a lot harder to figure, for this reason: feet have nowhere near the number of possible positions that hands have. About the most extreme position a foot can have is curled toes. Most of the time, feet are flat on the floor, and that’s all the expressive power they have. This suggests that AIs should have no particular trouble with feet.

But they do.

I’ll grant that in most photos and art, feet are in shoes, while hands generally go naked except in bad weather or messy/hazardous work. So there are fewer images of feet to train an AI. I had an AI gin up some images this morning from the following description: “A woman sitting in a wicker basket in a nightgown, wearing ballet slippers.” I did five or six, and the best one is below:

Her left leg seems smaller than her right, which is a different but related problem with AI images. And her hands this time, remarkably, are less grotesque than her arms. But add some ballet slippers, and the foot problem goes away. The explanation should be obvious: In a ballet slipper, all feet look more or less alike. The same is likely the case for feet in Doc Martin boots or high-top sneakers. (I may or may not ask an AI for an image of a woman in sandals, because I think I already know what I’d get.)

There were other issues with the images I got back from the two AIs I messed with, especially in faces. Even in the relatively good image above, her face seems a little off. This may be because we humans are very good at analyzing faces. Hands and feet, not so much. Defects there have to be more serious to be obvious.

Anyway. The real problem with AI image generation is that they are piecing together bits of images that they’ve digested as part of their training. They are not creating a wire-frame outline of a human body in a given position and then fleshing it out. At best they’re averaging thousands or millions of images of hands (or whatever) and smushing them together into an image that broadly resembles a human being.

Not knowing the nature of the algorithms that AI image generators use, I can’t say whether this is a solvable problem or not. My guess is that it’s not, not the way the software works today. And this is how we can spot deepfakes: Count fingers. The hands don’t lie.