AI Art Generation Handbook/Limitations of AI Art Generation

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Currently, there are some known limitations of AI Art Generation . (Including latest SDXL 1.0)

My criteria of limitations is if the AI ART are unable to generate less than 75% of the time (3 out of 4 images)

No Image Description
1 Human Anatomy

Human anatomy will be always a subject of ridicule of the AI Art generation


As shown below, this AI art generated woman have few of the flaws as shown

(i) The woman have 3 hands

(ii) The woman have 2 navels (belly button)

(iii) The woman right hands which are touching the rock have 6 fingers

(iv) The woman right leg heels looked deformed


Note: This can be potentially solved by using ControlNet

2
DALL·E2 - Javan rhinoceros wearing a business suit and safety hard hats , holding a Under Construction signboard with background of construction area
DALL·E2 - Javan rhinoceros wearing a business suit and safety hard hats , holding a Under Construction signboard with background of construction area
Text SpellingThe text rendered is shown to be incorrect (not following any of the English words) but it seems gibberish, at least maybe for English speaking natives.


Note: The image's signboard is supposedly spelled "UNDER CONSTRUCTIONS" but the images generated are in gibberish English at best (Sept 2023)

3 Relative PositioningThe picture originally prompt is yellow sphere on left , purple pyramid on right

but as seen, it is completely wrong with the relative positioning with pyramid on left and sphere on right

4 Object Counting

Possibly due to the dataset that did not specify the amount of object for the dataset training, AI Art Generations may sometimes have issue of generating the correct amount of object many times during the AI Art generations.

5 Some of the Design Patterns

Stable Diffusion may or may not have sufficient data / meta-data to train on certain types of clothing design patterns.


For example, the prompt is to generate the zig zag designs for the sports bra but unfortunately, Stable Diffusion unable to generate in all of the random generated pictures.

Other known offendors:

(a) Herringbone

(b) Houndstooth

(c) Ogee

(d) Paisely

See more here: AI Art Generation Handbook/VACT/Fabric Patterns

6 Cultural Lost in Translation

During training, there are many intangible cultural / heritage that are overlooked during Stable Diffusion training (presumed) as it relied heavily on CLIP for automatic tagging but unfortunately it is more bias to the West sub-cultures and ignores many of the sub-cultures that are other than Western nations. As example, picture on right should generate a lady wearing a badlah (dancing costume from North Africa) but it generates loli type of dress.


For example, it does not recognize :

(i) badlah costume from north Africa region

(ii) kebaya costume from South East Asia

7 Unable to generate many of mythological creaturesStable Diffusion are unable to generate any mythological creatures such as

(i) Cyclop (At times, it will generate this type of cyclop )
(ii) Centaur (Mostly it will generate man riding a horse in awkward ways)
(iii) Pegasus (It will generate a white horse without wings) ,
(iv) Medusa (It will generate a middle aged Caucasian woman wearing tiara without the famous snake hairs )
(v) Hydra (It will generate the island town surroundings which is conveniently named Hydra)
(vi) Cerberus (It will generate image of German Shepperd with one head only)
(vii) Kraken (It will generate Cthulhu-ish type of monster)
(viii) Mummy (It will generate middle aged Egyptian woman)
(ix) Phoenix (It will generate an area in Phoenix, Arizona)
(x) Sphinx (It just generate the sphinx architecture in Egypt)

But surprisingly, a few mythological creatures (are seems to be mostly fixed in SDXL) such as :

Minotaur
Frost Giant
Anubis

8 Potential Tools for Propaganda

A bad actor may misuse the AI Art Generation technology to generate propaganda images for their own benefits. As for example, this images were generated by Bing Image Creator (BIC) ~September 2023 before the Great Filter Purge happened where Bing Image Creator are able to generate the images of these prompts: "Two ISIS terrorists are planting down ISIS flag in deserts of Afghanistan" without any blocks back then

Steps to overcome[edit | edit source]

For the AI art generations , from what I understand, each AI Art generation system uses own dataset to train .

For example: OpenAI 's DALL-E it is trained using Image-GPT and Stable Diffusion using Common Crawl , Laion-5B. It is believed SDXL are trained in Laion-Aesthetic

As per saying goes, "Garbage In, Garbage Out" , while the more images are better but generally, many of limitations is due to the images suffers the following issues:

(i) Many of the smaller resolution picture (Less than 512*512px , out of focus (but not for aesthetic purposes)

(ii) Wrong / misleading captions related to the images

(iii) Incomplete captioning of the images

(iv) The images database is more to Western contents.


To solve many of the limitations, more curations (but expensive) are needed to curate the images at least to Open-AI Dall-E standard (at least for year 2022 versions)