Data Compression/References

From Wikibooks, open books for an open world
Jump to: navigation, search

Contents

[edit] Benchmark files

To do:
Are there any benchmarks for evaluating Wikipedia: differential compression?

[edit] open-source example code

Most data compression creators release open-source implementations of their algorithms. This makes it much easier to evolve the algorithms by combining combines clever ideas from many different sources.

To do:
Should we link to good, open-source, well-commented implementation of, for example, LZW *here*, or in the section of the book that discusses LZW ?

  • jvm-compressor-benchmark is a benchmark suite for comparing the time and space performance of open-source compression codecs on the JVM platform. It currently includes the Canterbury corpus and a few other benchmark file sets, and compares LZF, Snappy, LZO-java, gzip, bzip2, and a few other codecs. (Is the API used by the jvm-compressor-benchmark to talk to these codecs a good interface standard for compression algorithms?)
  • inikep has put together a benchmark for comparing the time and space performance of open-source compression codecs that can be compiled with C++. It currently includes 100 MB of benchmark files (bmp, dct_coeffs, english_dic, ENWIK, exe, etc.), and compares snappy, lzrw1-a, fastlz, tornado, lzo, and a few other codecs.
  • "Compression the easy way" simple C/C++ implementation of LZW (variable bit length LZW implementation) in one .h file and one .c file, no dependencies.
  • BALZ by Ilia Muraviev - the first open-source implementation of ROLZ compression[1]
  • QUAD - an open-source ROLZ-based compressor from Ilia Muraviev
  • LZ4 "the world's fastest compression library" (BSD license)
  • QuickLZ "the world's fastest compression library" (GPL and commercial licenses)
  • FastLZ "free, open-source, portable real-time compression library" (MIT license)
  • The .xz file format (one of the compressed file formats supported by 7-Zip and LZMA SDK) supports "Multiple filters (algorithms): ... Developers can use a developer-specific filter ID space for experimental filters." and "Filter chaining: Up to four filters can be chained, which is very similar to piping on the UN*X command line."

[edit] Further reading

[edit] non-wiki resources

  • "comp.compression" newsgroup
  • http://data-compression.info/ has some information on several compression algorithms, several "data compression corpora" (benchmark files for data compression), and the results from running a variety of data compression programs on those benchmarks (measuring compressed size, compression time, and decompression time).
  • "Data Compression Explained" by Matt Mahoney. It discusses many things neglected in most other discussions of data compression. Such as the practical features of a typical archive format (the stuff in the thin wrapper around your precious compressed data), the close relation between data compression and artificial intelligence, etc.
  • Mark Nelson writes about data compression
  • the Encode's Forum claims to be "probably the biggest forum about the data compression software and algorithms on the web".
  • "The LZW controversy" by Stuart Caie. (LZ78, LZW, GIF, PNG, Unisys, patents, etc.)
Personal tools
Namespaces
Variants
Actions
Navigation
Community
Toolbox
Sister projects
Print/export