AI Art Generation Handbook/Limitations of AI Art Generation
At Currently, there are some known limitations of AI Art Generation . (Including latest SDXL 1.0)
My criteria of limitations is if the AI ART are unable to generate less than 75% of the time (3 out of 4 images)
No | Image | Description |
---|---|---|
1 | Human Anatomy Human anatomy will be always a subject of ridicule of the AI Art generation, most of the time is more onto hands/fingers
(i) The woman have 3 hands (ii) The woman have 2 navels (belly button) (iii) The woman right hands which are touching the rock have 6 fingers (iv) The woman right leg heels looked deformed
| |
2 | Text Rendition Spelling The part of text prompt for the images is actually | |
3 | Relative Positioning The picture originally prompt is | |
4 | Object Counting Originally, the prompt for this SD images is | |
5 | Some of the Design Patterns Stable Diffusion may or may not have sufficient data / meta-data to train on certain types of clothing design patterns.
Other known offendors: (a) Herringbone (b) Houndstooth (c) Ogee (d) Paisely See more here: AI Art Generation Handbook/VACT/Fabric Patterns | |
6 | Cultural Lost in Translation During training, there are many intangible cultural / heritage that are overlooked during Stable Diffusion training (presumed) as it relied heavily on CLIP for automatic tagging but unfortunately it is more bias to the West sub-cultures and ignores many of the sub-cultures that are other than Western nations. As example, picture on right should generate a lady wearing a badlah (dancing costume from North Africa) but it generates loli type of dress.
For example, it does not recognize : (i) badlah costume from north Africa region (ii) kebaya costume from South East Asia | |
7 | Unable to generate many of mythological creatures Many AI Image Models are unable to generate any mythological creatures such as But surprisingly, a few mythological creatures (are seems to be mostly fixed in SDXL) such as : Minotaur | |
8 | Bleeding Concepts There are some concepts that are so strong that they " bleed " into the other subjects.
For examples, the intention of prompt for this images is the anthropomorphic rhinoceros are touching up paintings of Girls with Pearl Earring (but in human forms)
| |
9 | Limited Training Data on Under Represented Subjects
In context of painting , we may know the more popular painting such as Mona Lisa or The Great Wave off Kanagawa but we may not know painting names of "The Self Portrait of Mocker" (apart from the "Classical Art Men Pointing" meme in the late 2000's Internet) .
For example, the prompt of this image is | |
10 | Potential Tools for Propaganda A bad actor may misuse the AI Art Generation technology to generate propaganda images for their own benefits.
As for example, this images were generated by Bing Image Creator (BIC) ~September 2023 before the Great Filter Purge happened where Bing Image Creator are able to generate the images of these prompts:
|
Training Image Dataset Issues
[edit | edit source]For the AI art generations , from the white paper, each AI Art generation system uses own dataset to train .
For example: OpenAI 's DALL-E it is trained using Image-GPT and Stable Diffusion using Common Crawl , Laion-5B(but it is believed it is not trained on all of 5B images) . It is believed SDXL are trained in Laion-Aesthetic.
As per saying goes, "Garbage In, Garbage Out" , generally meant as if the training images (input) is not properly curated, there are chances that the output images may be gibberish as well. This is the lesser known issues but as times goes on, the AI Image models themself are also finetuned to let the generated images are getting better overtime . But generally, many of limitations is due to the images suffers the following issues:
(i) Many of the smaller resolution picture [Less than 512*512px , out of focus (but not for aesthetic purposes)]
(ii) Wrong / misleading captions related to the images
(iii) Incomplete captioning of the images
(iv) The images database are heavily biased towards Western contexts inside images
(v) Absence of certain images / subjects
To solve many of the limitations, more curations (but expensive) are needed to curate the images at least to Open-AI Dall-E standard (at least for year 2022 versions)