JPEG - Idea and Practice/The colour components

The BMP format

In the computer a colour is given by its composition of the three primary colours red, green and blue, and their shares are measured in bytes, that is, integers from 0 to 255. Therefore a colour corresponds to a triple of bytes, called a RGB triple. A picture is a rectangular matrix of RGB triples. If the picture is of width w and height h, the colour values (RGB triples) are indexed by the pairs (i, j), i = 0, ..., w-1, j = 0, ..., h-1, so that the left top corner has coordinate set (0, 0)(that is, the ordinate is measured downwards). The picture takes up 3wh bytes, and it can be stored in a memory-block by storing the h horizontal lines consisting of 3w bytes one after another. The procedure for showing the picture by transferring the memory-block (directly) to the screen is called a bitmap.

(In the bitmap procedure of Windows it is demanded that the number of bytes in the horizontal lines is divisible by 4, this means that the line segments of the memory-block possibly must be increased by 1, 2 or 3 bytes, usually filled with zeros.)

A picture can be stored permanently in a file consisting of the data bytes arranged in this way and supplied with a header specifying the type of the file and the dimensions of the picture. This is so for the BMP file format of Windows (BMP = Bit Map Picture). A BMP file begins with a header of 54 bytes. As the data in a BMP file lie precisely in the way used to draw a bitmap, the picture can be drawn directly from the reading of the file - without involving RAM-memory and without the use of other than elementary arithmetic calculations.

(The header of a BMP file is divided up in 17 blocks consisting of one, two or four bytes. Two bytes determine an integer from 0 to 256² - 1 = 65535, called a word, and four bytes determine an integer from 0 to 256⁴ - 1 = 4294967295, called a double word. The first two blocks of the BMP header are the bytes 66 and 77, identified with the characters 'B' and 'M' and specifying the type of the file. Block 8 and 9 are double words stating the width and the height, block 10 and 11 are words, usually set to 1 and 24 (= bit per colour), respectively, and block 7 is a double word usually set to 40. The other blocks, except block 4 and 5, which are words, are double words, and all these blocks can be set to 0, as they usually are not read by the program reading the file.)

Data compression

The BMP file format and a memory-block to be transferred to the screen as a bitmap are easy tasks for the computer and for the programmer, but these ways of storing a picture take up a lot of memory: a picture of 1000x750 pixels takes up 3x1000x750 = 2.2 Mb. This can be accepted provisionally in the working-up procedure of a picture or for storing of relatively few pictures where the highest possible quality is desired, but so much space is unacceptable in folders with hundreds of pictures or in films or in transmissions from the internet. One would immediately think that it is impossible to get digitalized data to take up lesser space, because the material with the bits cannot be reduced like a photographic negative. But a digitalized data set consists of sequences of bits, and these can be replaced by sequences that are shorter - and if there are repetitions, the thing that repeats itself can be replaced by a sequence which acts as a symbol for its type and the number of repetitions. If the data are copies of the elements in some fixed set (of numbers, for instance), then we can assign to the elements of the set sequences of bits such that the elements which are used most frequently are assigned to the shortest sequences. Besides, if the elements of the data set are numbers of strongly varying size, we can, instead of allocating equal space to each number, try to remove the empty spaces between the numbers. This cannot of course be done without ceremony, since (in lack of a third bit) we must have a tool with which we can separate the sequences of bits corresponding to the numbers. However, we can insert sequences of bits acting as codes.

Only a non-negative integer can immediately be digitalized, namely by writing its binary digit expression:

n = c_m2^m + ... + c₂2² + c₁2 + c₀

where c₀, ..., c_m are bits: 0 or 1 - we order the sequence so that the most significant bit comes first. If the number is rational or real, we must in some way express it as the composite of two non-negative integers. The codes to be inserted can be chosen so that they are in one-to-one correspondence with the natural numbers, and such that the natural number assigned to a code is the number of digits of the following non-negative integer. The codes must be chosen so that the most frequently used natural numbers (stating number of digits) have the shortest codes, and moreover so that we can determine when a code ends.

When the data are to be used (in order to show a picture, for instance), the compressed data set is subjected to a decoding procedure, leaving a data set that is exactly as the original. In almost all image file formats there is a possibility for compressing the data in this way. Such tricks are of course used in the JPEG procedure, but in this procedure the data are modified before the compression: by first transforming the colour values and then reducing the new values by dividing them by certain numbers and rounding off. The last procedure is called quantization and it may introduce (small) deviations.

The RGB values

The basis colours are the pure colours, these are the "strongest" colours which have maximum saturation. The pure colours make up a cyclic colour scale:

The pure colours

Therefore a pure colour is determined by an angle. Every colour different from a grey scale colour is the result of mixing a uniquely determined pure colour with a grey scale colour. The pure colours are not of the same luminance: three of them have lesser luminance than the others, and these are the primary colours: pure red, pure green and pure blue, assigned to the angles 0, 120 and -120 degrees. A pure colour that is not primary lies between two primary colours, and is the result of mixing the nearest of these with part of the other. If we mix the three primary colours, we get white - the colour of maximum luminance. From this we can see that every colour is produced by mixing the three primary colours, each made more or less darker. This is the RGB representation. We usually measure the three amounts in bytes, so that 255 corresponds to the primary colour and 0 corresponds to black.

(We can find the pure colour associated to the colour C (different from a grey scale colour) in the following way: By subtracting the RGB values of C from white, we get the colour C1 with RGB triple (255-R, 255-G, 255-B). If we assume that blue has most share in this colour, then C1 = β C2, for some β <= 1 and a colour C2 for which blue has share 255. By subtracting C2 from white, we get a colour C3, and if we assume that red has most share in this, then C3 = α C4, for some α <= 1 and a pure colour C4, for which red has share 255 and blue has share 0. This is the pure colour associated to C, and we get C by mixing this pure colour with black according to α and with white according to β.)

The YCbCr values

There is, however, a drawback to the RGB representation of the colours: the three values are of equal significance. We would prefer a triple representation where one of the values (the first) was more significant than the two others, because then, in the quantization procedure, we could allow larger deviations in the two less significant components. Such a representation is easy to imagine, as the four pictures below show: we can let the first value in the triple be the average value of the three RGB values, thus expressing the intensity of the colour (and giving the corresponding grey scale picture), and let the two other values form the "colour additions". We imagine the colours (the RGB triples) as the integral points in a cube of side length 256, having the three positive coordinate axes as sides, and its origin in the corner corresponding to black. In this cube the grey scales lie on the diagonal, and we take the diagonal as the first axis. We could let the two other coordinate axes be orthogonal to the diagonal and to each other, but in order to get a simple transform, we let them lie in the B-G-plane and the R-G-plane. Note that the new coordinate system means the two last colour values can be negative. We choose the units such that the first coordinate is measured in bytes and the two others are measured in signed bytes: integers from -128 to 127. The new coordinate triple is connected with the RGB triple by a linear transform.

We call the new representation the YCbCr values of the colour. Y stands for luminance (or luma) and C stands for chroma: Cb for chromatic blue and Cr for chromatic red. Our assumptions mean that there are parameters kb and kr, such that the linear transform and its inverse are given by:

Y = kr∙R + (1 - kr - kb)∙G + kb∙B

Cb = ½(B - Y)/(1 - kb)

Cr = ½(R - Y)/(1 - kr)

R = Y + 2(1 - kr)∙Cr

G = Y - (kb∙(B - Y) + kr∙(R - Y))/(1 - kb - kr)

B = Y + 2(1 - kb)∙Cb

We see that if a colour is a grey scale colour, that is, if R = G = B, then Y is this number and Cb and Cr are zero. Mathematically, it would be natural to set kb and kr to 1/4, because the transform then would get a simple and natural form:

Y = R/4 + G/2 + B/4

Cb = -R/6 - G/3 + B/2

Cr = R/2 - G/3 - B/6

R = Y + (3/2)Cr

G = Y - (3/2)(Cb + Cr)/2

B = Y + (3/2)Cb

However, in the JPEG implementation - which we are guided by here - the parameters kb and kr are set to 0.144 and 0.299, and with these values the formulas become:

Y = 0.299∙R + 0.587∙G + 0.114∙B

Cb = -0.168736∙R - 0.331264∙G + 0.5∙B

Cr = 0.5∙R - 0.418688∙G - 0.081312∙B

R = Y + 1.402∙Cr

G = Y - 0.3441∙Cb - 0.71414∙Cr

B = Y + 1.772∙Cb

This means that the coordinate axes are: the diagonal, the line (-0.34, 1.77) in the G-B-plane and the line (1.40, -0.71) in the R-G-plane. As the two chromatic coordinates range in the interval [-128, 127], we must add 128 to them in order to get bytes, so that we can draw "projections" of the picture on the coordinate axes. Instead of the composition of the picture in pictures in red-, green- and blue-scales, we now get pictures in grey-scale, blue-green-scale and red-green-scale:

The original picture
The grey scale component

The blue-green component
The red-green component

As we want our numbers (integers) numerically as small as possible, we subtract 128 from the Y value, so that this, like the Cb and Cr, becomes a signed byte.