Data Compression

From Wikibooks, the open-content textbooks collection

Jump to: navigation, search

There are many algorithms -- and even more implementations of those algorithms -- that accept unencoded information and encode it to use fewer bits. Perhaps even more important is the paired algorithm that accepts the encoded bits and extracts the information.

Each pair of algorithms -- one that creates the encoded form, and the other that accepts the encoded form and extracts the information -- is called a data compression algorithm.

Data compression is useful because this "compressed data" saves time and space compared to unencoded information.

There are many data compression algorithms available.

Most algorithms are specialized for one particular kind of data -- text, speech, music, photos, or video.

There are also several data compression benchmarks available for comparing data compression algorithms -- even one 50,000 euro cash prize for compressing one particular benchmark file as small as possible (and, of course, uncompressing it afterwards).

Most data compression programmers try to find ways to compress the data into fewer bits (while still being able to recover the information). However, some data compression programmers deliberately sacrifice a few bits in order to improve latency, reduce compression time, or reduce decompression time.

Some communication systems carefully and deliberately add small amounts of "redundancy" -- the Data Coding Theory has more details.

In this book, we describe the decompressor first, for several reasons: [1]

  • Decompression routines are usually much easier to understand than the paired compression routine.
  • Too many people write the compressor first, then later discover that it doesn't save enough information to reconstruct the original file.
  • Once we have a decompressor, we can often improve the compressor alone without changing the file format or the decompressor.
  • Many compression systems only document and standardize the decompressor and the file format. (Probably because of the above reasons).

[edit] Contents


 

In other languages