AI Art Generation Handbook/Training/Dataset

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Type of training[edit | edit source]

Before start on training, consider first the concept you want to use

As far as per limited studies goes, it seems like Dreambooth able to perform four types of training

(i) Introduce totally new concepts to models

As per current existing version of Stable Diffusion 1.5,although SD model able to generate many types of things but however there are quite few things unable to be generated by SD model.

Such as new concepts of centaur

(ii) Adding dataset to existing concepts but create a separate "token"

This is more usual route of existing concept of "male" / "woman" but you want to add dataset of yourself to create a a look that more resembled to dataset images.

(iii) Finetuning the existing concept

A concept that already existed but due to limitations of CLIP / limited dataset of images, it may not be able to generate properly.

(iv) Force the existing concepts to learn different concepts

Strongly discouraged as many concepts that you need to trained are possibly linked together

Source of Images[edit | edit source]

There are wide range of free photos to choose from when you want to train a model of images you found.

Here is the list of free stock photo site that you can use

Wikimedia Commons (*

Unsplash (

Pexels (

Pixabay (

Flickr Creative Commons (

FreeImages (

Game Art for Glitch (

Josh Game Asset (

Quality of Images[edit | edit source]

You may heard a lot of Dreambooth tutorial mentioning dataset must have quality Quality of the output image produced by the AI Art generative model is directly related to the quality of the input image used to train the model. If the input images used to train the model are of low quality, contain noise or artifacts, or are poorly composed, the resulting output images will also have similar issues.

Image should have following attributes:

(a) Diverse but consistent -

This is an example to make a diverse dataset to train for object/style but make sure the subject (in this case rhino) is always the center of the training subject.

<Note this is for references only, your specific use case may be different from what is stated here:>




Type of medium:

Note: Whenever possible, try not to include images for training if they have following characteristic:

(i) Have more than 1 subject (although same subjects) in same picture

(ii) Have distinct but common features (If you trained with 1 horns, ensure all images dataset ideally have 1 horn)

(b) Noiseless - No compression artefacts and such

(c) Enough Resolution

(d) Clear and Consistent Lighting