Operating System Design/File Systems/Faults

From Wikibooks, open books for an open world
Jump to: navigation, search

In engineering in general, fault-tolerance refers to the ability of something to continue to function (though perhaps at reduced levels) after something as gone wrong. More specifically, in filesystem design, it refers to the ability of a filesystem to store data reliably, even in the face of hardware mistakes.

Many things can go wrong in a storage system, especially one with moving parts like hard disk drives. Bad sectors only prevent the use of a few sectors, while a head crash can permanently ruin a disk.

There are several ways of improving fault tolerance in filesystems:

  • RAID, which duplicates data.
  • Journaling, which helps avoid problems in case of a crash.
  • Dealing with bad blocks, which prevents use of corrupted disk sectors.