AI Art Generation Handbook/Limitations of AI Art Generation
Currently, there are some known limitations of AI Art Generation . (Including latest SDXL 1.0)
My criteria of limitations is if the AI ART are unable to generate less than 75% of the time (3 out of 4 images)
Human anatomy will be always a subject of ridicule of the AI Art generation
(i) The woman have 3 hands
(ii) The woman have 2 navels (belly button)
(iii) The woman right hands which are touching the rock have 6 fingers
(iv) The woman right leg heels looked deformed
|2||Text SpellingThe ,text rendered is shown to be correct.
(following the English alphabets) but it seems gibberish, at least maybe for English speaking natives.
|3||Relative PositioningThe picture originally prompt is |
but as seen, it is completely wrong with the relative positioning.
|4||Some of the Design Patterns
Stable Diffusion may or may not have sufficient data / meta-data to train on certain types of clothing design patterns.
Other known offendors:
See more here: List of Design Patterns in Stable Diffusion
|5||Cultural Lost in TranslationDuring training, there are many intangible cultural / heritage that are overlooked during Stable Diffusion training (presumed) as it relied heavily on CLIP for automatic tagging but unfortunately it is more bias to the West sub-cultures and ignores many of the sub-cultures that are other than Western nations.
As example, picture on right should generate a lady wearing a badlah (dancing costume from North Africa) but it generates loli type of dress.
For example, it does not recognize :
(i) badlah costume from north Africa region
(ii) kebaya costume from South East Asia
|Unable to generate many of mythological creaturesStable Diffusion are unable to generate any mythological creatures such as |
(i) Cyclop (At time, it will generate this type of cyclop )
But surprisingly, a few mythological creatures (are seems to be mostly fixed in SDXL) such as :
Steps to overcome[edit | edit source]
For the AI art generations , from what I understand, each AI Art generation system uses own dataset to train .
As per saying goes, "Garbage In, Garbage Out" , while the more images are better but generally, many of limitations is due to the images suffers the following issues:
(i) Many of the smaller resolution picture (Less than 512*512px , out of focus (but not for aesthetic purposes)
(ii) Wrong / misleading captions related to the images
(iii) Incomplete captioning of the images
(iv) The images database is more to Western contents.
To solve many of the limitations, more curations (but expensive) are needed to curate the images at least to Open-AI Dall-E standard (at least for year 2022 versions)