Data Compression/grammar-based compression

From Wikibooks, open books for an open world
Jump to navigation Jump to search

grammar-based compression[edit | edit source]

Clipboard

To do:
... give more details on grammar-based compression ...


Grammar-based compression was proposed in 2000.[1]

Several data compression methods can be viewed as grammar-based compression:

  • The Sequitur data compression algorithm uses a context-free grammar to represent a string such as a data file. It is relatively fast (linear-time encoding and linear-time decoding).[2]
  • Longest Match
  • Most frequent digram
  • The Sequential data compression algorithm is an improvement on Sequitur.[3]
  • Although it pre-dates grammar-based compression, LZ78 and LZW (but not LZ77) can be interpreted as a kind of grammar.[3]
  • It has been shown that "a structured grammar" gives better compression than "an irreducible grammar".[4]


When the output of the grammar transform is compressed with a zero-order entropy coder (such as a zero-order arithmetic coder), grammar compression outperforms the Unix Compress and Gzip algorithms.[1]

The Re-Pair algorithm is a grammar based compression algorithm. Its decoder operates as follows:[5]

FIXME: fill in details.

Further reading[edit | edit source]

  • Re-Store is a phrase browsing system based on the Re-Pair compression algorithm.[1]


  1. Wan, R., Re-Store Software, archived from the original on 2015-07-21