Most people think that compression is mostly about coding. Well at one time it was. Before information theory, people spent years developing the perfect code to store data efficiently. To understand the limits of coding as a compression mechanism, we have to understand what coding is.
The computer really understands only one type of data, strings of 0s and 1s. You know 0 is off, 1 is on, and all that rot.these actually have to do with the state of a memory cell, where 0 is a low voltage output of the cell, and 1 is a slightly higher voltage. The actual voltage spread involved is getting smaller as the size of the circuits gets smaller and the voltage needed to jump the gap between parallel circuits gets smaller.
Hardware configuration used to define the size of a code, by limiting the number of bits as the two state memory element is called, that could be moved at a single pass. An example is the 4bit 8bit, 16bit 32bit, 64bit, and now 128bit architectures found during the development of microcomputers.
Now essentially everything in a computer is stored as sequences of bits. The storage capacity of the computer is limited by how many of those bits, are wasted. Ideally we don't want to waste bits, so we tend to use codes that minimize the number of bits that are wasted for any particular architecture of computer they will run on. The ultimate code is the code that covers the exact amount of variations that are possible in a data type, with the least amount of wasted bits.
It is interesting to find that this is not a consideration in the human brain, in fact we have had to define a term to explain the redundant information carried in the codes for the brain, and that term is degenerate coding, a pejorative term that really just means that there is more than one code for the same information.
Consider a string of numbers, that mean something. Are they a string of digits, in which case we want to code them to minimize the storage for each digit, An integer value, in which case we want to code them to minimize the storage for an arbitrarily sized integer, A floating point value, in which case we want to code them to minimize the storage for an arbitrarily sized mantissa, and include information on what power the number is raised to, and how many bits past the point we are going to store. Or an address string in which case we want to code it to minimize the number of bits it takes for a character in a specific language.
To a human all of these different types of storage look the same on the printed page. But it takes time to convert between the different types of storage, so you might not want to translate them into the tightest coding possible while you are using them. However later when you want to store them away for posterity, it looks like a good idea to compress them down to the smallest size possible. That is the essence of compression.
limits to coding density
So the limit to coding density, is determined by the type of data, you are trying to code and also by the amount of information that is available from a separate location.
As an example, the military have the practice to limit radio chatter during a campaign, this is often considered good thing. Even taking it to the point of limiting themselves to a click on their mike, to trigger the next phase of the attack. To a person who is fully briefed on it, a click on a mike will be all that is needed to send a wealth of information about the state of the operation. The event will have no meaning to the enemy waiting in the trenches, so that the first evidence they have of attack is when guns start going off.
Ideally all storage could be achieved in one bit files that contained whole encyclopedias. In actuality that is not likely to happen. Coding is still important, but it is very limited in what it can now do for compression. New algorithms for compression are increasingly hard to come by, hence the path of specialization on the source data new approaches take, true advances have recently been reduced to the hardware where the code will run.
bits, bytes, symbols, pixels, streams, files
|“||Suffice it to say that real compression algorithms are a complex technology, bordering on an art form.||”|
—Leo A. Notenboom, 
In the field of data compression, it is convenient to measure the "size" of data in terms of the fundamental unit of data: bits.
Many data compression algorithms produce a compressed data stream that is a stream of bits -- with no particular alignment to any other size.
It is sometimes convenient to consider the input data in terms of "symbols". Some algorithms compress English text in terms of the symbols from an input and process them converting them in a reversible representation on the compressed output. Symbols are usually sets of bits but they in turn can represent any type of character representation/encoding, even pixels or waveforms.
When compressing CD-quality audio, the symbols are the 2^16 possible amplitude levels -- the 2^16 possible physical positions of the speaker cone.
Some file compression utilities consider the uncompressed file in terms of 257 different symbols. At any point in the file, the "next symbol" could be any one of 257 symbols: one of the 256 possible bytes -- or we could have already reached the end of the file, the 257th symbol indicating "end-of-file".
When comparing image compression routines, sometimes the term "bpp" (bits per pixel) is used. An uncompressed full-color bitmap image is 24 bpp. An uncompressed 2 color bitmap image (containing only black pixels and white pixels) is 1 bpp. Some image compression algorithms can compress some images to much less than 0.1 bpp. The same image compression algorithm may be doing pretty good to compress some other image to 7.5 bpp.
With variable-length coding, we can make some symbols very short -- shorter than any fixed-length encoding of those symbols. This is great for compressing data.
However, variable-length coding almost always end up make other symbols slightly longer than best fixed-length encoding of those symbols.
Using more bits store a symbol sounds ridiculous at first -- Why don't we simply store a shorter fixed-length version of such symbols?
OK, yeah, *other* symbols could be shorter. So ... why can't we use the variable-length version when that would be shorter, and use the fixed-length version when that would be shorter? The best of both worlds? Perhaps use a "1" prefix to indicate "the fixed-length representation follows" ?
We discuss variable-length coding in more detail in the entropy coding section of this book.