AI Art Generation Handbook/ControlNet

From Wikibooks, open books for an open world
Jump to navigation Jump to search

What is ControlNet ?

[edit | edit source]

ControlNet is developed by Stanford researcher named Lvmin Zhang and Maneesh Agrawala . It was first introduced in paper Adding Conditional Control to Text-to-Image Diffusion Models on Feb 2023. .

As you experimented around with Stable Diffusion, you may experience that at times you may want to create a character with a cool imposing pose, but using prompt itself is a huge undertaking. You may generate up to hundreds of images but unable to get that type of images that you wanted.

This is where, ControlNet is to the rescue.

ControlNet allows prompt-crafter to guide the image generation process using additional input images or conditions, beyond just text prompts.

ControlNet 22022023
ControlNet 22022023

For example:

Let's say you want to control the pose of a human in an image (as shown , a girl squatting down)

AI model would be trained on a dataset of images of humans in different poses. Meanwhile, ControlNet network are trained on the same dataset of images.

The ControlNet network would learn to associate the pose of the human in the image with the desired output of the diffusion model.

Once the ControlNet network is trained, you can use it to control the pose of the human in an image and use the prompt to change from female to male as shown.

How ControlNet works ?

[edit | edit source]
ControlNet Simplified Flow

It uses a "zero convolution" layer that starts with zero weights.

This layer is added to each block of a pre-trained diffusion model (like Stable Diffusion).

The zero convolution allows ControlNet to be trained on specific tasks without altering the original model's knowledge.

(1) Text prompts is sent to ControlNet neural network

(2) The trained AI model is then used to generate an image.

(3) ControlNet is then used to add extra conditions to the generated image.

(4) The generated image is then fine-tuned using ControlNet.

(5) The fine-tuned image is then output.

How to use ControlNet ?

[edit | edit source]

First, assuming you are using SDXL base model to generate images, highly recommended to download the Union ControlNet

(trained especially for SDXL) here:

We assume you have Web-UI by Automatic1111 or SD.Next (Vladmandic ) .


[edit | edit source]

(1) First of all, go to Extension -> Available ->Click "Load From" button to download the extension.

(2) Search for extension sd-webui-controlnet-manipulations (by author Mikubill). Click "Install" button located at right side.

(3) Restart the whole Automatic1111 to make sure the Extension is properly installed.

(4) Once installed, inside txt2img or img2img tab, you should see ControlNet (usually on below of Seed).

(5) Click the to expand on the menu and you should see something similar to the left

Here are the some examples of settings that are found at the ControlNet style:

To use any ControlNet models, you need to click Enable

If the Graphic Card VRAM is not powerful enough for additional pre-procesing, click on Low VRAM

For Pixel Perfect is suitable if you wanted the ControlNet to retain most of the details of the picture

(Note: This is useful if you wanted to change from a photorealism to anime style)


[edit | edit source]

In SD.Next, it is more easier as ControlNet is integrated into the WebUI without using external extensions.

Copy your downloaded model here into this location :

Your installed locations\SDNext\automatic\models\control\controlnet

When opening up the SD.Next, go directly to the "Control" tab and you will see few options:

Initially, all of these will be hidden and you will need to click on ◀ button on the right of the screen to reveal the hidden menu

Control Input

You can upload of your "base" image here, and the generated images are shown

Also the settings for the usual image generations are located here, you can click on the button to reveal them

Control Elements

You can choose ControlNet, T2I Adapter , XS, Lite and Reference but for this chapter, we are going to use ControlNet


This is Preprocessor settings for more finetune control

Type of Control Net

[edit | edit source]
Canny Scribble Pose Line-art QR Code

[edit | edit source]

ControlNet Layman Introductions - Controlnet

Huggingface - Download all of pre-trained ControlNet models here (For SD1.5)

Github - All info authors works are here