Jump to content

AI Art Generation Handbook/Prompting in Stable Diffusion Style/GUI Interface

From Wikibooks, open books for an open world

Assuming you successfully installed Stable Diffusion after following instructions from here, you can see the following screen as followed:









Tabs Functions / Descriptions
text2img This tab is where the text typed in (known as prompt from here onward) are magically turned into images that more or less fit the descriptions (The results will usually looked different/similar totally from what you imagined)
img2img This tab is where usually a simple sketch accompanied by prompt to guide how the image will looked like later on (Precursor to ControlNet)
Extras This tab is were the images that are to be enlarged can be done here
PNG Info This tab is where you can recover the info of settings used (prompt, seed, CFG Scale, etc...) that are used based on metadata

Note1: If the image source is from Reddit/Facebook, the metadata is usually striped clean and no usable info will be retrieved

Note2: This will only works from images generated from Automatic1111 or SD.Next

Checkpoint Merger If you want to merge multiple models without doing any model training , you can use this tabs
Train This is to be trained using TI (Textual Inversion) methods
Settings This tab is where all of SD settings are here
Extensions This tab is where the extensions is managed . See here for more details


To start to do AI art generation in Stable Diffusion, just type any (yes, anything) that you had in mind. inside the first field text.

Just remember that there are a maximum limit of 75 tokens for Stable Diffusion.

So, you may wonder what is a token?

A token is a sequence of characters that represents a single unit of meaning in a text. It is a fundamental concept in NLP, as most NLP models operate on a token level, meaning they process text one token at a time.

To understand tokens, let's consider the following sentence: "The quick brown fox jumped over the lazy dog." In this sentence, each word is a token. Each token has its own meaning, and together they convey the meaning of the entire sentence, where each word is a separate token.

In the context of AI language models, tokens are often created by a process called tokenization, which involves breaking down a text into individual tokens or words. This process can involve removing punctuation, lowercasing the text, and dealing with other special cases, such as contractions.

Once a text has been tokenized, the tokens can be further processed and analyzed by an AI language model.

Face Restoration

[edit | edit source]

For the images generations of human faces, it is highly recommended to use Codeformer (instead of GPFGAN)










Template

[edit | edit source]

Here are some of my sample template of generating the following

Change the name of parameters enclosed in < > brackets.

Target Sample prompt Negative prompt
Plausible realistic human face A realistic photographs of <ethnicity> wearing <type of wardrobe wear> , <describe activity> in <describe place> Cartoon, anime, drawing, sketches, CGI
Product photoshoot as seen in e-commerce website High quality professional studio product photoshoot of  <products> product , ((white backgrounds isolated)) , (isometric view) cluttered, off centered, cropped, collage, montage, grid, series, human
Mermaid with underwater effect shots realistic photo of beautiful <rthicity> mermaid tail, , partial underwater shot, lower body in water , lower half frame underwater. upper half frame sky, blue sky nude, leg ,upper legs , lower legs , split tails, conjoined tails
Images set in outer space cinematic, dark lighting, high resolution, sharp focus