AI Art Generation Handbook/Stable Diffusion settings

Behind this digital canvas of Stable Diffusion , lies a palette of adjustable parameters, akin to an artist selecting brushes, paints, and techniques. These settings allow creators to fine-tune every aspect of the image generation process, from broad compositional elements to the finest details. Understanding and mastering these parameters is crucial for artists looking to harness the full potential of Stable Diffusion, enabling them to bridge the gap between imagination and image generations.

Prompt

In this case, we will be using the following prompts in ComfyUI/Others also can be used:

Anthropomorphic rhinoceros wearing business suit in NYSE trading floor, screaming with his hands on his cheek after seeing the stock market crashed

We will modify each parameters to see how it changes

This is the original settings for the images

Seeds: 368039721048954

Steps: 80

CFG: 7

Sampler_name: DDIM

Scheduler: Normal

Parameters

Seeds

Seed is a numerical value that serves as the starting point for the image generation process. This number is used by Stable Diffusion to generate a specific pattern of noise, which then forms the foundation for the resulting image.

The significance of seeds lies in their ability to ensure reproducibility. Every image generated by Stable Diffusion has a unique seed associated with it, acting as a sort of master key or fingerprint for that particular image. While the process may appear random to human observers, the algorithms used for noise generation and image production are actually quite consistent given the same inputs. This means that the same seed will reliably produce the same initial noise pattern, which, when combined with other parameters like text prompts and settings, will result in a predictable output.

Anyone with knowledge of an image's seed can recreate that exact image or produce variations of it, making seeds invaluable for consistency and iteration in image generation. [1] [2] [3]

In Auto1111 , value "-1" is randomly assigned seed number

CFG Scales (Classifier Free Guidance)

CFG controls how closely the generated image adheres to the given text prompt or input image. This scale acts as a balancing mechanism between strict prompt adherence and creative interpretation.

Higher CFG values (typically ranging from 8 to 30) result in images that more closely follow the input prompt, ensuring greater fidelity to the user's description. Conversely, lower values (with 1 being the minimum) grant the AI more creative freedom, potentially producing more diverse and imaginative results that may deviate from the original prompt.

The default CFG value is often set at 7, offering a good starting point that balances the system's adherence to instructions with some creative latitude. Users can adjust this value based on their specific needs, whether they prioritize prompt fidelity or desire more unexpected and creative outcomes.

In practice, finding the optimal CFG value often involves experimentation to achieve the perfect balance between their artistic vision and the AI's interpretation, ultimately producing the desired image quality and style. [4] [5] [6]

P.S: Take note that, higher CFG doesn't necessarily meant better art. It can lead to less diversity and potentially lower overall image quality as the AI focuses intensely on prompt adherence at the expense of other aspects.

Steps

Steps refer to the number of iterations the AI goes through to transform random noise into a coherent image based on the given prompt. This process, known as "de-noising," gradually refines the image from a completely random state to the final output.

The image generation always begins with random noise and ends with a fully de-noised image, regardless of the number of steps. What the step count controls is how finely this process is divided. A higher step count means the AI makes smaller, more gradual changes in each iteration, potentially leading to more refined results. But, most of the time , it is not noticeable.

It's important to note that the effectiveness of step count can vary with different samplers. Modern, fast samplers like DDIM and DPM Solver++ often produce high-quality results with fewer steps compared to older samplers like LMS.

The relationship between steps and image quality isn't always linear. Too few steps may result in underdeveloped images, while too many steps can lead to unnecessary details or duplications, especially with simpler prompts or styles. [7] [8] [9]

P.S: Take note that lower the Steps is, the faster the image generations. But it doesn't translate well to image aesthetic with lower steps.(With exception of some AI Model like Stable Cascade which are able to generate images in as low as 20 steps)

"Big 5" Sampler

Samplers in Stable Diffusion are algorithms that control how the AI denoise and refines an image during generation. This denoising process is called sampling because as Stable Diffusion generates a new sample image in each step. The method used in sampling is called the sampler or sampling method.

Different samplers offer various trade-offs between speed, quality, and control.

Here are some of the types of sampler used in Stable Diffusion (as of Jan 2023):

Euler , Euler a, LMS, Heun, DPM2, DPM2 a, DPM++ 2S a, DPM++ 2M, DPM++ SDE, DPM fast, DPM adaptive, LMS Karras, DPM2 Karras, DPM2 a Karras,DPM++ 2S a Karras, DPM++ 2M Karras, DPM++ SDE Karras, DDIM, PLMS

We focus on the " Big 5 " of the Samplers that are usually used by new beginners as they produces same results

Note: All of the samplers are generated with Seeds 368039721048954 , CFG 7 and Steps 80 for consistency

Euler :

Euler is often the default sampler in many Stable Diffusion interfaces.
It offers a good balance of speed and quality, making it suitable for a wide range of applications.
Euler A tends to produce significantly different images as sampling steps increase.

Heun:

Heun is based on second-order method, which means it's generally more accurate than first-order methods inlike the basic Euler method. It is basically like the method's is taking a quick look ahead and adjusting your prediction based on what they see. This extra step often results in more accurate predictions, especially when dealing with changing conditions
This method is named after the Heun's method, which is a numerical method used to solve ordinary differential equations.

DPM2:

DPM2 is designed to generate high-quality samples in fewer steps
The DPM name stands for "Diffusion Probabilistic Models" and this sampler is based on the paper by researchers at Tsinghua University, Beijing for paper : DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps

LMS :

LMS uses information from several previous steps to calculate the next step, unlike single-step methods like Euler, thus it will an produce noise artifacts at lower step counts but stabilizes at higher steps.
The name LMS comes from Linear Multistep methods are a class of numerical methods used for solving ordinary differential equations. They've been used in various fields of science and engineering before being adapted for use in diffusion models.

DDIM (Denoising Diffusion Implicit Models):

DDIM allows faster sampling by allowing larger step sizes without significantly compromising image quality. DDIM typically produces high-quality images even at lower step counts compared to other samplers.
Usually the result is quite similiar to Euler

The DDIM name stands for "Denoising Diffusion Implicit Models", is introduced in the paper by the Stanford Researcher : Denoising Diffusion Implicit Models"

LCM

LCM aims to generate high-quality images in significantly fewer steps than traditional diffusion models by leveraging a novel consistency-based sampling approach.
Instead of gradually denoising an image, LCM uses a consistency model to directly predict the final output.
Based on the sample tests , only narrow ranges of CFG are "usable" for normal AI art generation (CFG 3 - CFG7) meanwhile higher CFG shows the images is more to psychedelic art styles.
Due to this, it trades some fine detail and photorealism for fast* generation speed , in relative to Big 5 samplers
Each steps will produced wildly different cartoony image results [See sample image below]
The name LCM stands for Latent Consistency Models, it was introduced in the paper "Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference" by Tsinghua University Researchers.

UniPC

UniPC adaptively adjusts the balance between prediction and correction steps in a unified manner to accelerate the sampling process of diffusion models (within it's recommended parameter range). It also employs a dynamic weighting strategy to optimize the sampling trajectory.
Based on tests, if the steps count becoming higher (>60) , the image results will appeared to be more bias to abstract art form. [See sample image below]
UniPC , stands for Unified Predictor-Corrector is introduced in the paper "UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models" by researchers in Tsinghua university

DPM Variant

DPM++ SDE Karras uses stochastic sampling, resulting in large variations between generated images.
DPM++ 2M Karras is known for producing high-quality, detailed images, making it ideal for complex scenes or portraits.

When choosing a sampler, consider the following factors:

For High Detail: DPM++ 2M or LMS Karras are excellent choices for intricate details in portraits or landscapes.
For Speed: Euler A or DPM Adaptive offer faster processing times.
For Control: DDIM provides a more hands-on approach to image outcome.

It's important to note that the effectiveness of samplers can vary depending on the number of steps used. Some samplers, like Euler A and DPM2 A, show significant changes as step count increases, while others like LMS tend to stabilize after a certain point.

The choice of sampler should be based on the specific requirements of your project, balancing factors such as image quality, generation speed, and the level of control needed over the output. Experimentation with different samplers and step counts can help in finding the optimal settings for your particular use case.

Ancestral sampling

Ancestral sampling technique introduces a degree of randomness or "ancestral noise" at each step of the denoising process. While ancestral methods add randomness, they do so in a structured way that often leads to similar overall compositions in the generated images.

Since, this is the early stage of sampler method, all of the ancestral method are not producing good aesthetic images even at high steps ~200 (but if you aim to produce "potato camera quality" photos, you can use them)

Schedulers

Scheduler in Stable Diffusion controls how the noise is added and removed during the image generation process. It's an important component that can significantly affect the quality and characteristics of the generated images.

Using DDIM Sampler method as control, we try different scheduler to see its effect on image generations.

Note 1: Normal, UGM Uniform and Simple did not show any differences between these generated images.

Note 2: Although DDIM Uniform scheduler shares the same name as the sampler DDIM, it makes the background image worse based on this sampling test.