# Data Compression/grammar-based compression

From Wikibooks, open books for an open world

## grammar-based compression[edit]

Grammar-based compression was proposed in 2000.^{[1]}

Several data compression methods can be viewed as grammar-based compression:

- The Sequitur data compression algorithm uses a context-free grammar to represent a string such as a data file. It is relatively fast (linear-time encoding and linear-time decoding).
^{[2]} - Longest Match
- Most frequent digram
- The Sequential data compression algorithm is an improvement on Sequitur.
^{[3]} - Although it pre-dates grammar-based compression, LZ78 and LZW (but not LZ77) can be interpreted as a kind of grammar.
^{[3]} - It has been shown that "a structured grammar" gives better compression than "an irreducible grammar".
^{[4]}

When the output of the grammar transform is compressed with a zero-order entropy coder (such as a zero-order arithmetic coder), grammar compression outperforms the Unix Compress and Gzip algorithms.^{[1]}

The Re-Pair algorithm is a grammar based compression algorithm. Its decoder operates as follows:^{[5]}

FIXME: fill in details.

## Further reading[edit]

- ↑
^{a}^{b}En-hui Yang and John C. Kieffer. "Efficient Universal Lossless Data Compression Algorithms Based on a Greedy Sequential Grammar Transform—Part One: Without Context Models". 2000. - ↑ Richard Ladner. "Sequitur" p. 56. 2007.
- ↑
^{a}^{b}Eric Lehman and Abhi Shelat. "Approximation Algorithms for Grammar-Based Compression". "Approximation Algorithms for Grammar-Based Compression". 2002. - ↑ John Kieffer and En-hui Yang. "Structured grammar-based codes for universal lossless data compression". 2002.
- ↑ Matej Mežik. "Compression Using Context Free Grammers". 2011.

- Re-Store is a phrase browsing system based on the Re-Pair compression algorithm.[1]