AI Art Generation Handbook/Limitations of AI Art Generation

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Currently, there are some known limitations of AI Art Generation . (Including latest SDXL 1.0)

My criteria of limitations is if the AI ART are unable to generate less than 75% of the time (3 out of 4 images)

No Image Description
1 Human Anatomy

Human anatomy will be always a subject of ridicule of the AI Art generation


As shown below, this AI art generated woman have few of the flaws as shown

(i) The woman have 3 hands

(ii) The woman have 2 navels (belly button)

(iii) The woman right hands which are touching the rock have 6 fingers

(iv) The woman right leg heels looked deformed


Note: This can be potentially solved by using ControlNet

2 Text SpellingThe ,text rendered is shown to be correct.

(following the English alphabets) but it seems gibberish, at least maybe for English speaking natives.

3 Relative PositioningThe picture originally prompt is yellow sphere on left , purple pyramid on right

but as seen, it is completely wrong with the relative positioning.

4 Some of the Design Patterns

Stable Diffusion may or may not have sufficient data / meta-data to train on certain types of clothing design patterns.


For example, the prompt is to generate the zig zag designs for the sports bra but unfortunately, Stable Diffusion unable to generate in all of the random generated pictures.

Other known offendors:

(a) Swirl

(b) Lattice

See more here: List of Design Patterns in Stable Diffusion

5 Cultural Lost in TranslationDuring training, there are many intangible cultural / heritage that are overlooked during Stable Diffusion training (presumed) as it relied heavily on CLIP for automatic tagging but unfortunately it is more bias to the West sub-cultures and ignores many of the sub-cultures that are other than Western nations.

As example, picture on right should generate a lady wearing a badlah (dancing costume from North Africa) but it generates loli type of dress.


For example, it does not recognize :

(i) badlah costume from north Africa region

(ii) kebaya costume from South East Asia

Unable to generate many of mythological creaturesStable Diffusion are unable to generate any mythological creatures such as

(i) Cyclop (At time, it will generate this type of cyclop )
(ii) Centaur (Mostly it will generate man riding a horse in awkward ways)
(iii) Pegasus (It will generate a white horse without wings) ,
(iv) Medusa (It will generate a middle aged Caucasian woman wearing tiara without the famous snake hairs )
(v) Hydra (It will generate the island town surroundings which is conveniently named Hydra)
(vi) Cerberus (It will generate image of German Shepperd with one head only)
(vii) Kraken (It will generate Cthulhu-ish type of monster)
(viii) Mummy (It will generate middle aged Egyptian woman)
(ix) Phoenix (It will generate an area in Phoenix, Arizona)
(x) Sphinx (It just generate the sphinx architecture in Egypt)

But surprisingly, a few mythological creatures (are seems to be mostly fixed in SDXL) such as :

Minotaur
Frost Giant
Anubis

Steps to overcome[edit | edit source]

For the AI art generations , from what I understand, each AI Art generation system uses own dataset to train .

For example: OpenAI 's DALL-E it is trained using Image-GPT and Stable Diffusion using Common Crawl , Laion-5B.

As per saying goes, "Garbage In, Garbage Out" , while the more images are better but generally, many of limitations is due to the images suffers the following issues:

(i) Many of the smaller resolution picture (Less than 512*512px , out of focus (but not for aesthetic purposes)

(ii) Wrong / misleading captions related to the images

(iii) Incomplete captioning of the images

(iv) The images database is more to Western contents.


To solve many of the limitations, more curations (but expensive) are needed to curate the images at least to Open-AI Dall-E standard (at least for year 2022 versions)