Data Compression

From Wikibooks, the open-content textbooks collection

Jump to: navigation, search

There are many algorithms -- and even more implementations of those algorithms -- that accept unencoded information and encode it to use fewer bits. Perhaps even more important is the paired algorithm that accepts the encoded bits and extracts the information.

Each pair of algorithms -- one that creates the encoded form, and the other that accepts the encoded form and extracts the information -- is called a data compression algorithm.

Data compression algorithms are increasing abundant as are their variations, most utilize dictionary based schemes and statistical methods. They are also becoming increasingly specialized in compressing a specific kinds of data -- text, speech, music, photos, or video...

Data compression is useful in some situations because "compressed data" will save time (in reading and on transmission) and space if compared to the unencoded information it represent. Since most attempt to increase the compression ratio, consisting in compressing the data into as few bits as possible (while still being able to recover the relevant information), in other cases there is a deliberately sacrifice a few bits in order to improve latency, reduce compression time, or reduce decompression time.

Some communication systems carefully and deliberately add small amounts of "redundancy" -- the Data Coding Theory Wikibook has more details.

There are also several data compression benchmarks available for comparing data compression algorithms -- even one 50,000 euro cash prize for compressing one particular benchmark file as small as possible (and, of course, uncompressing it afterwards).

In this book, we describe the decompressor first, for several reasons: [1]

  • Decompression routines are usually much easier to understand than the paired compression routine.
  • Too many people write the compressor first, then later discover that it doesn't save enough information to reconstruct the original file.
  • Once we have a decompressor, we can often improve the compressor alone without changing the file format or the decompressor.
  • Many compression systems only document and standardize the decompressor and the file format. (Probably because of the above reasons).

[edit] Contents


 

In other languages