AI Art Generation Handbook/Limitations of AI Art Generation
As of currently, AI Art Generation model may have limitations which also including the latest FLUX 1.0-DEV
My criteria of limitations is if the AI ART are unable to generate less than 75% of the time (3 out of 4 images)
No | Image | Description |
---|---|---|
1 | Human Anatomy Human anatomy will be always a subject of ridicule of the AI Art generation, most of the time is more onto hands/fingers
(i) The woman have 3 hands (ii) The woman have 2 navels (belly button) (iii) The woman right hands which are touching the rock have 6 fingers (iv) The woman right leg heels looked deformed
| |
2 | Text Rendition Spelling The part of text prompt for the images is actually | |
3 | Relative Positioning The picture originally prompt is
| |
4 | Object Counting Originally, the prompt for this SDXL images is | |
5 | Some of the Design Patterns AI Models may or may not have sufficient data / meta-data to train on certain types of clothing design patterns.
Other known offendors: (a) Herringbone (b) Houndstooth (c) Ogee (d) Paisely See more here: AI Art Generation Handbook/VACT/Fabric Patterns | |
6 | Subject's Interaction with Other Subjects / Objects
AI models are not able to generate many of the everyday actions such as " | |
7 | Cultural Lost in Translation During training, there are many intangible cultural / heritage that are overlooked during AI Model training (presumed) as it relied heavily on CLIP for automatic tagging but unfortunately it is more bias to the West sub-cultures and ignores many of the sub-cultures that are other than Western nations. As example, picture on right should generate a lady wearing a badlah (dancing costume from North Africa) but it generates loli type of dress.
For example, it does not recognize : (i) badlah costume from north Africa region (ii) kebaya costume from South East Asia | |
8 | Unable to generate many of mythological creatures Many AI Image Models are unable to generate any mythological creatures such as But surprisingly, a few mythological creatures (are seems to be mostly fixed in SDXL) such as : Minotaur | |
9 | Bleeding Concepts There are some concepts that are so strong that they " bleed " into the other subjects.
For examples, the intention of prompt for this images is the anthropomorphic rhinoceros are touching up paintings of Girls with Pearl Earring (but in human forms)
| |
10 | Limited Training Data on Under Represented Subjects
In context of painting , we may know the more popular painting such as Mona Lisa or The Great Wave off Kanagawa but we may not know painting names of "The Self Portrait of Mocker" (apart from the "Classical Art Men Pointing" meme in the late 2000's Internet) . For example, the prompt of this image is | |
11 | Unable to understand negation
Many of AI image model up to this point unable to understand negation (meant absence of nothing). For example in this image, the prompt is
but however, the prompt unable to understand negation and still gives a woman with moustache | |
12 | Abstract Combinations
In this examples, combining concepts that are rarely seen together in the real world (like a penguin and bamboo) may not well-represented in the training data causing the model might struggle to generate it accurately.
| |
13 | Diversity in Image Training Dataset
The prompt is See this news link for more detailed insight: https://www.theverge.com/2024/4/3/24120029/instagram-meta-ai-sticker-generator-asian-people-racism | |
14 | Potential Tools for Propaganda A bad actor may misuse the AI Art Generation technology to generate propaganda images for their own benefits. As for example, this images were generated by Bing Image Creator (BIC) ~September 2023 before the Great Filter Purge happened where Bing Image Creator are able to generate the images of these prompts:
|
Training Image Dataset Issues
[edit | edit source]For the AI art generations , from the white paper, each AI Art generation system uses own dataset to train .
For example: OpenAI 's DALL-E it is trained using Image-GPT and Stable Diffusion using Common Crawl , Laion-5B(but it is believed it is not trained on all of 5B images) . It is believed SDXL are trained in Laion-Aesthetic. https://github.com/google-research-datasets/conceptual-12m
As per saying goes, "Garbage In, Garbage Out" , generally meant as if the training images (input) is not properly curated, there are chances that the output images may be gibberish as well. This is the lesser known issues but as times goes on, the AI Image models themself are also finetuned to let the generated images are getting better overtime . But generally, many of limitations is due to the images suffers the following issues:
(i) Many of the smaller resolution picture [Less than 512*512px , out of focus (but not for aesthetic purposes)]
(ii) Wrong / misleading captions related to the images
(iii) Incomplete captioning of the images
(iv) The images database are heavily biased towards Western contexts inside images
(v) Absence of certain images / subjects
To solve many of the limitations, more curations (but expensive) are needed to curate the input images at least to Open-AI Dall-E standard (at least for year 2022 versions)