Cg Programming/Unity/Computing Image Effects

From Wikibooks, open books for an open world
Jump to navigation Jump to search
Post-processing effect applied to a video image.

This tutorial covers the basic steps to create a minimal compute shader in Unity for image post-processing of camera views. If you are not familiar with image effects in Unity, you should read Section “Minimal Image Effect” first. Note that compute shaders are not supported on macOS.

Compute Shaders in Unity[edit]

Compute shaders are in some respects similar to fragment shaders and it is sometimes helpful to think of them as “improved” fragment shaders because compute shaders address some problems of fragment shaders:

  • Fragment shaders are part of the graphics pipeline, which makes it cumbersome to use them for anything else, in particular GPGPU programming (General-Purpose computing on Graphics Processing Units).
  • Fragment shaders are designed for the embarrassingly parallel problem of rasterizing the fragments of triangles (and other geometric primitives). Thus, they are not well suited for problems that are not embarrassingly parallel, e.g., when shaders have to share or communicate data between themselves or need to write to an arbitrary location in memory.
  • The graphics hardware that runs fragment shaders offers features for more advanced parallel programming but it was not considered wise to offer those features in fragment shaders; thus, a different application programming interface (API) was considered necessary.

Historically, the first approach to solve these shortcomings of fragment shaders was to introduce completely new APIs, e.g., CUDA, OpenCL, etc. While some of these APIs are still very popular for GPGPU programming, they are less popular for graphics tasks (e.g., image processing) because of several reasons. (One reason being the overhead of using two APIs (compute and graphics) for the same hardware; another reason being the difficulties of communicating data between the compute API and the graphics API.)

Due to the problems of separate compute APIs, compute shaders were introduced in graphics APIs (in particular Direct3D 11, OpenGL 4.3, and OpenGL ES 3.1) as another class of shaders. This is also what Unity supports.

In this tutorial, we look at how to use compute shaders in Unity for image processing in order to introduce the basic concepts of compute shaders as well as specific issues of using compute shaders for image processing, which is an important application area. Further tutorials discuss more advanced features of compute shaders and applications apart from image processing.

Creating a Compute Shader[edit]

Creating a compute shader in Unity is not complicated and very similar to creating any shader: In the Project Window, click on Create and choose Shader > Compute Shader. A new file named “NewComputeShader” should appear in the Project Window. Double-click it to open it (or right-click and choose Open). A text editor with the default shader in DirectX 11 HLSL should appear. (DirectX 11 HLSL is different from Cg but it shares many common syntax features.)

The following compute shader is useful to tint an image with a user-specified color. You can copy&paste it into the shader file:

#pragma kernel TintMain

float4 Color;

Texture2D<float4> Source;
RWTexture2D<float4> Destination;

void TintMain (uint3 groupID : SV_GroupID, 
      // ID of thread group; range depends on Dispatch call
   uint3 groupThreadID : SV_GroupThreadID, 
      // ID of thread in a thread group; range depends on numthreads
   uint groupIndex : SV_GroupIndex, 
      // flattened/linearized GroupThreadID between 0 and 
      // numthreads.x * numthreads.y * numthreadz.z - 1 
   uint3 id : SV_DispatchThreadID) 
      // = GroupID * numthreads + GroupThreadID
   Destination[id.xy] = Source[id.xy] * Color;

Let's go through this shader line by line: The (Unity-specific) line #pragma kernel TintMain defines the compute shader function; this is very similar to #pragma fragment ... for fragment shaders.

The line float4 Color; defines a uniform variable that is set in a script as described below. This is just like a uniform variable in a fragment shader. In this case, Color is used to tint the images.

The line Texture2D<float4> Source; defines a 2D texture with four floating-point components such that the compute shader can read it (without interpolation). In a fragment shader, you would use sampler2D Source; to sample a 2D texture (with interpolation). (Note that HLSL uses separate texture objects and sampler objects; see Unity's manual for how to define a sampler object for a given texture object, which you would need if you want to sample a 2D texture with interpolation in a compute shader using the function SampleLevel().)

RWTexture2D<float4> Destination; specifies a read/write 2D texture, which the compute shader can read from and write to. This corresponds to a render texture in Unity. A compute shader can write to any position in a RWTexture2D, while a fragment shader can usually write only to the position of its fragment. Note, however, that multiple threads of the compute shader (i.e., calls of the compute shader function) might write to the same location in an undefined order which results in undefined results unless special care is taken to avoid these problems. In this tutorial we avoid any of these problems by letting each thread only write to its own, unique location in the RWTexture2D.

The next line is [numthreads(8,8,1)]. This is a special line for compute shaders, which defines the dimensions of a thread group. A thread group is a group of calls to the compute shader function that are executed in parallel and, therefore, their execution can be synchronized, i.e., one can specify barriers (with functions like GroupMemoryBarrierWithGroupSync()) that all threads in the thread group have to reach before any of the threads may be executed further. Another feature of a thread group is that all threads within one thread group may share some particularly fast ("groupshared") memory, while the memory that may be shared by threads in different groups is usually slower.

The threads are organized in a 3D array of thread groups and each thread group is itself a 3D array with the three dimensions specified by the three arguments of numthreads. For image processing tasks, the third (z) dimension is usually 1 as in our example [numthreads(8,8,1)]. The dimensions (8,8,1) specify that each thread group consists of 8 × 8 × 1 = 64 threads. (For an illustration see Microsoft's documentation of numthreads.) There are certain platform-specific limitations on these numbers, e.g., for Direct3D 11 the x and y dimension must be less than or equal to 1024 and the z dimension must be less than or equal to 64, and the product of the three dimensions (i.e., the size of the thread group) must be less than or equal to 1024. On the other hand, thread groups should have a minimum size of about 32 (depending on the hardware) for best efficiency.

As described below, the compute shader is called in a script with the function ComputeShader.Dispatch(int kernelIndex, int threadGroupsX, int threadGroupsY, int threadGroupsZ), where kernelIndex specifies the compute shader function and the other arguments specify the dimensions of the 3D array of thread groups. For our example of [numthreads(8,8,1)], there are 64 threads in each group, thus, the total number of threads would be 64 * threadGroupsX * threadGroupsY * threadGroupsZ.

The rest of the code specifies the compute shader function void TintMain(). Usually, it is important for the compute shader function to know for which position in the 3D array of threads it was called. It might also be important to know the position of the thread group in the 3D array of thread groups, as well as the position of the thread within the thread group. HLSL offers the following semantics for this information:

  • SV_GroupID: a uint3 vector that specifies the 3D ID of the thread group; each coordinate of the ID starts at 0 and goes up to (but excluding) the dimension specified in the ComputeShader.Dispatch() call.
  • SV_GroupThreadID: a uint3 vector that specifies the 3D ID of a thread within a thread group; each coordinate of the ID starts at 0 and goes up to (but excluding) the dimension specified in the numthreads line.
  • SV_GroupIndex: a uint that specifies the flattened/linearized SV_GroupThreadID between 0 and numthreads.x * numthreads.y * numthreadz.z - 1.
  • SV_DispatchThreadID: a uint3 vector that specifies the 3D ID of the thread in the whole array of all thread group. It's equal to SV_GroupID * numthreads + SV_GroupThreadID.

The compute shader function can receive any of these values as in the example: void TintMain (uint3 groupID : SV_GroupID, uint3 groupThreadID : SV_GroupThreadID, uint groupIndex : SV_GroupIndex, uint3 id : SV_DispatchThreadID).

The particular function TintMain actually uses only the variable id with the semantic SV_DispatchThreadID. The function calls are organized in a 2D array of (at least) the dimensions of the Destination and Source texture; thus, id.x and id.y can be used to access these texels Destination[id.xy] and Source[id.xy]. The basic operation is just to multiply the color of the Source texture with Color and write it to the Destination render texture:

Destination[id.xy] = Source[id.xy] * Color;

Applying the Compute Shader to the Camera View[edit]

In order to apply the compute shader to all pixels of the camera view, we have to define the function OnRenderImage(RenderTexture source, RenderTexture destination) and use these render textures in the compute shader. There are, however, some problems; specifically in newer Unity versions, we have to copy the source pixels to a temporary texture before we can use them in a compute shader. Furthermore, if Unity renders directly to the frame buffer, destination is set to null and we have no render texture to use for our compute shader. Also, we need to enable the render texture for random write access before we create it, which we cannot do with the render textures that we get in OnRenderImage(). We can handle these cases (and cases where the source and destination render textures have different dimensions) by creating a temporary render texture of the same dimensions as the source render texture and letting the compute shader write to that temporary render texture. The result can then be copied to the destination render texture, which might be null in which case the result is copied to the frame buffer.

The following C# script implements this idea with the temporary render texture tempDestination:

using System;
using UnityEngine;


public class tintComputeScript : MonoBehaviour {

   public ComputeShader shader;
   public Color color = new Color(1.0f, 1.0f, 1.0f, 1.0f);
   private RenderTexture tempSource = null;
      // we need this intermediate render texture to access the data   
   private RenderTexture tempDestination = null;  
      // we need this intermediate render texture for two reasons:
      // 1. destination of OnRenderImage might be null 
      // 2. we cannot set enableRandomWrite on destination
   private int handleTintMain;

   void Start() 
      if (null == shader) 
         Debug.Log("Shader missing.");
         enabled = false;
      handleTintMain = shader.FindKernel("TintMain");
      if (handleTintMain < 0)
         Debug.Log("Initialization failed.");
         enabled = false;

   void OnDestroy() 
      if (null != tempSource)
         tempSource = null;
      if (null != tempDestination) {
         tempDestination = null;

   void OnRenderImage(RenderTexture source, RenderTexture destination)
      if (null == shader || handleTintMain < 0 || null == source) 
         Graphics.Blit(source, destination); // just copy

      // do we need to create a new temporary source texture?
      if (null == tempSource || source.width != tempSource.width
         || source.height != tempSource.height)
         if (null != tempSource)
         tempSource = new RenderTexture(source.width, source.height,

      // copy source pixels
      Graphics.Blit(source, tempSource);
      // do we need to create a new temporary destination render texture?
      if (null == tempDestination || source.width != tempDestination.width 
         || source.height != tempDestination.height) 
         if (null != tempDestination)
         tempDestination = new RenderTexture(source.width, source.height, 
         tempDestination.enableRandomWrite = true;

      // call the compute shader
      shader.SetTexture(handleTintMain, "Source", tempSource); 
      shader.SetTexture(handleTintMain, "Destination", tempDestination);
      shader.SetVector("Color", (Vector4)color);
      shader.Dispatch(handleTintMain, (tempDestination.width + 7) / 8, 
         (tempDestination.height + 7) / 8, 1);
      // copy the result
      Graphics.Blit(tempDestination, destination);

The script should be saved as "tintComputeScript.cs". To use it, it has to be attached to a camera and the public variable shader has to be set to a compute shader, for example, the one we have defined above.

The Start() function of the script does only some error checking, gets the number of the compute shader function with shader.FindKernel("TintMain"), and writes it to handleTintMain for use in the Update() function.

The OnDestroy() function releases the temporary render texture because the garbage collector does not automatically release the hardware resources that are necessary for render textures.

The Update() function does some error checking, then — if necessary — it creates new render textures in tempSource and tempDestination and copies the pixels to tempSource, and after that it sets all the uniform variables of the compute shader with the functions SetTexture(), SetVector() and SetInt() before it calls the compute shader function with a call to Dispatch(). In this case we use (tempDestination.width + 7) / 8 times (tempDestination.height + 7) / 8 thread groups (both numbers implicitly rounded down). We divide by 8 in both dimensions because we specify the number of thread groups and each thread group has the size 8 times 8 as specified by [numthreads(8,8,1)] in the compute shader. The addition of 7 is required to make sure that we are not short by one if the dimensions of the render texture are not divisible by 8. After dispatching the compute shader, the result is copied from tempDestination to the actual destination of OnRenderImage() with the help of a call to Graphics.Blit().

Comparison with Fragment Shaders for Image Effects[edit]

This compute shader and C# script implement the same effect as the fragment shader in Section “Minimal Image Effect”. Apparently more code is necessary for an image effect with a compute shader than for an image effect with a fragment shader. However, you should remember two things: 1) the reason for the additional code is mainly that Unity's OnRenderImage() function and Graphics.Blit() function were designed to work smoothly with fragment shaders while compute shaders were not considered when these functions were defined, and 2) the compute shader is able to do things that fragment shaders cannot do, e.g., writing to arbitrary positions in the destination render texture, sharing data between threads, synchronizing the execution of threads, etc. Some of these features are discussed in other tutorials.


Congratulations, you have learned the basics about compute shaders in Unity and how to use them for image effects. A few of the things you have seen are:

  • How to create a compute shader for an image effect.
  • How to set the uniform variables of a compute shader in a C# script.
  • How to call the compute shader function with the ComputeShader.Dispatch() function.

Further reading[edit]

If you still want to know more

< Cg Programming/Unity

Unless stated otherwise, all example source code on this page is granted to the public domain.