Algorithm Implementation/Checksums

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Some people view checksums as a kind of hashing but collisions (different inputs, same result) are likely. Checksums are used when irrecoverable errors must be detected to prevent further data corruption in a system. If correction is needed, ECC (Error Correcting Codes such as Hamming, Reed Solomon or BCH) are required.

Collision rates are a compromise with size and speed of computation. The odds of failed detections for n bits of checksum is 2^-n. The choice of this size depends on the application's requirements and the expected error model (the types and probabilities of errors). 16-bit checksum fields are small but one false positive occurs for every 65536 random inputs (at best) so it is used when data blocks are small, the consequences are benign and/or the corruption is very unlikely. 32-bit checksums have double the size overhead but only 1 chance in 4 billions of collision so it is common for storing or transmitting large files if they may be altered.

A programmer writing a checksum algorithm could use one's complement addition if one is willing to sacrifice error detection effectiveness for very high speed (such as for EXE files or IP header packets), Fletcher-32 checksum for a balance of speed and error detection (though it can not distinguish between 0xFFFF and 0x0000), CRC-32 for better error detection but more complex implementations when performance is needed.[1] A programmer should use a MAC or digital signature—typically involving a cryptographically secure hash such as SHA-256 or SHA-3 -- if one is willing to run slower and allocate 256 bits or more in order to detect malicious alterations to data. Such hashes are covered in a later chapter, Hashing.

  • PEAC (Pisano with End-Around Carry) is as fast as Fletcher and Adler checksums, with better error detection (almost as good as CRC)[2] by using 1s complement addition (also called End Around Carry) in a Pisano (truncated Fibonacci) structure. A pair of 16-bit registers makes a checksum comparable to CRC32 with much less coding complexity. For a 16-bit only checksum, the two 16-bit registers can be added.
  • Verhoeff Algorithm
  • Damm Algorithm

Cyclic Redundancy Checks[edit | edit source]

  • CRC16 XModem
  • CRC32 / FCS32 - commonly used in Ethernet and PKZip-compatible archiving

Further reading[edit | edit source]

  1. Theresa C. Maxino. "The Effectiveness of Checksums for Embedded Networks" compares the error detection effectiveness of commonly used checksums: exclusive or (XOR), two's complement addition, ones' complement addition, Fletcher checksum, Adler checksum, and cyclic redundancy codes (CRC) ... briefly mentions that "weighted sum codes" might be just as good as CRC in error detection but with lower computational cost ... discusses common misconceptions about error check codes.
  2. Yann Guidon / YGDES. "PEAC Pisano with End-Around Carry algorithm".