Data Compression/compressed file systems

From Wikibooks, open books for an open world
Jump to: navigation, search

file compression vs compressed file systems[edit]

FIXME: Is there a better place in this book to discuss compressed file systems and flash memory?

Many data compression algorithms are whole-file oriented. They assume that the entire original file is available up-front, and people will want to decompress the entire thing from beginning to end.

Compressed file systems (and especially flash-memory file systems) break that assumption. The ZFS file system uses the LZJB compression algorithm. Many internet routers and internet firewalls use a compressed flash-memory file system, often the cramfs file system. (See Reverse Engineering/File Formats#Compression, Encryption & Scrambling for an example). Some "solid-state hard drives" (SSDs) internally use compression algorithms to reduce the amount of flash "used up" by files -- this doesn't actually give users any more storage space, but the larger amount of empty storage space can be used internally by the SSD in ways that extend the lifetime of the drive.

These systems require relatively rapid random-access to data stored in the file system -- decompressing everything from the beginning would be too slow.

One approach is to use an index that, given some file name we want to look up (or a "logical block number"), it tells where exactly on the disk (or in the bulk flash) to start decompressing, combined with a streaming compression algorithm -- so the decompressor can jump into the middle of the compressed data and start decompressing from there.

A much more difficult requirement of these systems is to allow files to be edited. (This is so difficult that some compressed file systems, such as cramfs, such as python executable zipfiles[1], don't allow it -- they must be created all-at-once from a directory of uncompressed files. After a cramfs system is created, it can only be mounted as read-only).

Several people have built what is effectively a virtual file system that can read and write into a standard ".tgz" or ".zip" file: TrueZIP,[2] Java Zip File System Provider,[3] etc.

FIXME: go into a little more detail on why unmodified file-oriented compression algorithms won't work for compressed file systems, and what techniques are used to either (a) modify those algorithms so they are suitable, or (b) other algorithms that are useful for compressed flash file systems, or (c) combinations of both.

FIXME: say a few words about w: SquashFS.

FIXME: say a few words about w: initramfs/w: initrd.

virtual memory compression[edit]

FIXME: say a few words about zswap and zRAM and similar ideas related to virtual memory compression.

... using lz4 (a variant of lzo) ...

  1. Radomir Dopieralski. "Your Python Application as a Single File"
  2. TrueZIP
  3. Java Zip File System Provider