AI Art Generation Handbook/Limitations of AI Art Generation

At Currently, there are some known limitations of AI Art Generation . (Including latest SDXL 1.0)

My criteria of limitations is if the AI ART are unable to generate less than 75% of the time (3 out of 4 images)


No	Image	Description
1		Human Anatomy Human anatomy will be always a subject of ridicule of the AI Art generation, most of the time is more onto hands/fingers As shown below, this AI art generated woman have few of the flaws as shown (i) The woman have 3 hands (ii) The woman have 2 navels (belly button) (iii) The woman right hands which are touching the rock have 6 fingers (iv) The woman right leg heels looked deformed Note: This can be potentially solved by using ControlNet
2	DALL·E2 - Javan rhinoceros wearing a business suit and safety hard hats , holding a Under Construction signboard with background of construction area	Text Rendition Spelling The part of text prompt for the images is actually `"UNDER CONSTRUCTIONS"` (Prompted during Sept 2023) but rendered is shown to be gibberish most of the time (not following any known English words) at least maybe for English speaking natives. However, the text rendition is slowly improved with models such as IF-Deepfloyd , DALL-E3 (As of March 2024) and SDXL
3		Relative Positioning The picture originally prompt is `yellow sphere on left , purple pyramid on right` but as seen, it is completely wrong with the relative positioning with pyramid on left and sphere on right
4		Object Counting Originally, the prompt for this SD images is `three rabbits`. However, possibly due to the training dataset that did not specify the amount of object appeared in the picture, AI Art Generations may sometimes have issue of generating the correct amount of object many times during the AI Art generations.
5		Some of the Design Patterns Stable Diffusion may or may not have sufficient data / meta-data to train on certain types of clothing design patterns. For example, the prompt is to generate the `zig zag` designs for the sports bra but unfortunately, Stable Diffusion unable to generate in all of the random generated pictures. Other known offendors: (a) Herringbone (b) Houndstooth (c) Ogee (d) Paisely See more here: AI Art Generation Handbook/VACT/Fabric Patterns
6		Cultural Lost in Translation During training, there are many intangible cultural / heritage that are overlooked during Stable Diffusion training (presumed) as it relied heavily on CLIP for automatic tagging but unfortunately it is more bias to the West sub-cultures and ignores many of the sub-cultures that are other than Western nations. As example, picture on right should generate a lady wearing a badlah (dancing costume from North Africa) but it generates loli type of dress. For example, it does not recognize : (i) badlah costume from north Africa region (ii) kebaya costume from South East Asia
7		Unable to generate many of mythological creatures Many AI Image Models are unable to generate any mythological creatures such as (i) Cyclop (At times, it will generate this type of copyrighted cyclop ) (ii) Centaur (Mostly it will generate man riding a horse in awkward ways) (iii) Pegasus (It will generate a white horse without wings) , (iv) Medusa (It will generate a middle aged Caucasian woman wearing tiara without the famous snake hairs ) (v) Hydra (It will generate the island town surroundings which is conveniently named Hydra) (vi) Cerberus (It will generate image of German Shepperd with one head only) (vii) Kraken (It will generate Cthulhu-ish type of monster) (viii) Mummy (It will generate middle aged Egyptian woman) (ix) Phoenix (It will generate an area in Phoenix, Arizona) (x) Sphinx (It just generate the sphinx architecture in Egypt) But surprisingly, a few mythological creatures (are seems to be mostly fixed in SDXL) such as : Minotaur Frost Giant Anubis
8		Bleeding Concepts There are some concepts that are so strong that they " bleed " into the other subjects. For examples, the intention of prompt for this images is the anthropomorphic rhinoceros are touching up paintings of Girls with Pearl Earring (but in human forms) `Anthropomorphic rhinoceros wearing business suit touching up painting Girl with Pearl Earring with brushes` At times, changing word ordering may successfully improved the chances of your images according to your intention : Refer here for more examples
9		Limited Training Data on Under Represented Subjects In context of painting , we may know the more popular painting such as Mona Lisa or The Great Wave off Kanagawa but we may not know painting names of "The Self Portrait of Mocker" (apart from the "Classical Art Men Pointing" meme in the late 2000's Internet) . For example, the prompt of this image is `"Oil painting of Self-portrait of a Mocker by painter Joseph Ducreux, the painting's subject talking to a smartphone"` but the generated images does not looked anything like the original painting at all Hence, the "data curator" may need to curate to include more of the under-represented subjects .
10		Potential Tools for Propaganda A bad actor may misuse the AI Art Generation technology to generate propaganda images for their own benefits. As for example, this images were generated by Bing Image Creator (BIC) ~September 2023 before the Great Filter Purge happened where Bing Image Creator are able to generate the images of these prompts: `Two ISIS terrorists are planting down ISIS flag in deserts of Afghanistan` without any blocks back then

Training Image Dataset Issues

For the AI art generations , from the white paper, each AI Art generation system uses own dataset to train .

For example: OpenAI 's DALL-E it is trained using Image-GPT and Stable Diffusion using Common Crawl , Laion-5B(but it is believed it is not trained on all of 5B images) . It is believed SDXL are trained in Laion-Aesthetic.

As per saying goes, "Garbage In, Garbage Out" , generally meant as if the training images (input) is not properly curated, there are chances that the output images may be gibberish as well. This is the lesser known issues but as times goes on, the AI Image models themself are also finetuned to let the generated images are getting better overtime . But generally, many of limitations is due to the images suffers the following issues:

(i) Many of the smaller resolution picture [Less than 512*512px , out of focus (but not for aesthetic purposes)]

(ii) Wrong / misleading captions related to the images

(iii) Incomplete captioning of the images

(iv) The images database are heavily biased towards Western contexts inside images

(v) Absence of certain images / subjects

To solve many of the limitations, more curations (but expensive) are needed to curate the images at least to Open-AI Dall-E standard (at least for year 2022 versions)

AI Art Generation Handbook/Limitations of AI Art Generation

Training Image Dataset Issues

Navigation menu

Search