Jeff Duntemann's Contrapositive Diary Rotating Header Image

images

AI Image Generators, Mon Dieu

I finished a 10,700 novelette the other day, the first short fiction I’ve finished since 2008, when I wrote “Sympathy on the Loss of One of Your Legs,” now available in my collection, Souls in Silicon. I’ve mostly written novels and short novels since then. (I’ll have more to say about “Volare” in a future entry here.)

To be published, it needs a cover. I have no objection to paying artists for covers, which apart from an experiment or two (see “Whale Meat”) I’ve always done in the past. Given all the yabbjabber about AI content creation recently, I thought, “Hey, here’s a chance to see if it’s all BS.”

The spoiler: It’s not all BS, but parts of it are BS-ier than others.

Ok. I’ve tested two AI image generators: OpenAI’s DALL-E 2, and Microsft’s Bing Image Generator. I found them through a solid article on ZDNet by Sabrina Ortiz. As it happens, Bing Image Generator outsources the process to DALL-E. I wanted to try Midjourney, and may eventually, but you have to have a paid subscription (about $8/month) to use it.

I’m not going to summarize the story here. One image I wanted to try as a cover would be the female lead sitting with her behind in a wicker basket, floating through the air at dawn a thousand feet or so over Baltimore. In both generators (which are basically the same generator) you feed the AI a detailed text description and turn it loose. I started simple: “A woman flying through the air in a wicker basket.” Edy Gagliano does precisely that in the story. What DALL-E gave me was this:

DALL·E 2023-04-23 14.46.55 - a woman flying through the air in a wicker basket - 500 Wide

Well, the woman is flying through the air, but we have a preposition problem here. She is over, not in the basket. Good first shot, though. I tried various extensions of that basic description, to the tune of 48 images on Dall-E. I won’t post them all here for space reasons, but they ran the gamut: A woman flying through the air holding a basket, a woman flying through the air in a basket the size and shape of a bathtub, and on and on.

The next one here is perhaps the best I’ve gotten from DALL-E. It’s a woman in a basket over Baltimore, I guess. Here’s the description: “a barefoot woman sitting down inside a magical wicker basket that flies through the air at dawn over Baltimore.” In one sense, it’s not a bad picture:

DALL·E 2023-04-23 10.05.40 - a barefoot woman sitting down inside a magical wicker basket that flies through the air at dawn over Baltimore 500 wide

That said, It looks out of focus. The basket is not wicker and it’s yuge. And in the story, Edy just puts her butt in the basket and lets her legs hang over the side.

Now let us move over to Bing Image Generator. In a way, it came closer than nearly all of the DALL-E images. But now we confront a well-known weakness of AI image generators: They can’t draw realistic hands or feet or faces. Here’s my first take on the image from Bing:

_77229ce5-3d7c-4c09-964f-b2b784ba3580 - 500 Wide

Look closely. Her hands and feet appear to be drawn by something that doesn’t know what a human hand or foot looks like. The face, furthermore, looks like it has one eye missing. (That’s easier to see in the full-sized image.)

I’ll give Bing credit: The images are less fuzzy and smeary. Because Bing uses DALL-E, I suspect there are DALL-E settings I don’t know about yet. I tried a few more times and got some reasonable images, all of them including some weirdness or another. The one below is a better rendering of a woman who is actually sitting in the basket with her legs hanging over the basket’s edge. But did I order a helicopter? Her face is a little lopsided, and her hands and feet, while not grotesque, aren’t quite right.

_090cd681-df9a-4736-8fcd-cdaafe028ae1 - 500 wide

Bing gave me about 24 images while I messed with it, and some of the images, while not capturing what I intended, were well-rendered and not full of weirdness. The one below is probably closest to Edy as I imagine her, and we get a SpaceX booster burning up in the atmosphere to boot. Is she over Baltimore? I don’t know Baltimore well enough to be sure, but that, at least, doesn’t matter. Stock photos of anonymous cities are everywhere.

_794c2ce1-7cd6-492d-9712-7e75ab646a3c - 500 wide

None of the others are notable enough to show here.

So where does this leave us? AIs can draw pictures. That’s real, and I’m guessing that if you tell it to draw something a little less loopy than a woman with her butt in a flying basket, it might do a better job. I remain puzzled why hands and feet and faces are so hard to do. Don’t AIs need training? And aren’t there plenty of photos of hands and feet and faces for them to generalize from a substantial number of specific examples?

I have no idea how these things are supposed to work, and if there were a good overview book on AI image generator internals, I’d buy it like a shot. In the meantime, I may practice some more and look at specific settings. If nothing else, I can produce some concept images to show to a cover artist. And maybe I’ll luck into something usable as-is.

Whatever I discover, you can count on seeing it here.