Evolution of Operating Systems Designs/Storage: Mechanisms
- 1 Storage: Mechanisms
- 1.1 Hydraulic Storage
- 1.2 Mechanical Storage
- 1.3 Analog Storage
- 1.4 Digital Storage
- 1.5 Information Theory
- 1.6 Bytes, Words, and Coding
- 1.7 Ram/Rom
- 1.8 Prom/Eprom/Flash
- 1.9 Secondary Storage
- 2 Storage from the Operating System Perspective
- 3 Further reading
- 4 references
Operating System Designs have often echoed the storage capabilities of their day. While the earliest storage mechanisms existed before the advent of operating systems they are included here for completeness.
The first known artificial storage techniques were probably hydraulic in nature. Few people realize that the Artistic fountains of the past were actually attempts to develop memory systems.
By Babbages Time, Mechanical devices were beginning to become capable of storing information. His difference engine although it did not work as well as he would have hoped, was capable of storing numbers and doing functions such as addition and subtraction on them. The mechanical comptrometer of my Fathers day, led to the electric calculator of my childhood.
At first electrical devices that stored data were analog in nature. However analog components tended to have a lot of variation unless they were quite carefully produced so analog computers never were mass produced.
The invention that made digital storage possible was the convention of the detection of a high voltage versus a low voltage. Despite the range of values of analog components, it was possible to standardize circuits that could detect voltage differences of about 5 volts. From this standard it was possible to define a stable 5 volt signal as being a 1 and any voltage lower than about 2 volts a 0. A range of storage circuits was developed that each had unique characteristics but could operate in the particular voltage range required for digital storage.
Once Digital Storage was invented, someone had to invent ways to encode information, and someone else had to come up with a theory of how to store things. In the end a theory called Information Theory told how to store just about anything that could be symbolized with numbers, letters, or pictures.
Bytes, Words, and Coding
Once there was a theory of how just about anything that could be symbolized numerically was storable, the next problem was what should be the standard storage size. Numerical digits could be stored in 4 bits, the whole English alphabet in 8 bits and almost all the alphabets in the world in 32 bits.
Many computers from the 1950s through the early 1980s used a 36-bit data bus.
The first microprocessors, beginning in 1971, had 4 bit data busses. Although early micro-computers like the Altair had 4 bit data busses, micro-computers began to become popular only once the architecture was widened to 8 bits. Today's Micro-computer architectures use a word of about 32 to 64 bits in order to speed up the throughput by moving more characters per fetch cycle. One problem that dates many operating systems, is that they set their word length to a datapath width, and eventually the market moves beyond that datapath width.
During the life of Windows, computers have doubled their datapath width 3 times. ( While the 8088 was based on a 16 bit chip, it actually had an effective datapath width of 8 bits. Care also should be taken in chips that have segregated Address Busses not to confuse the Databus with the Address Bus) Each time they do, a new version of Windows is written that takes advantage of the larger datapath width, and obsoletes the previous version, forcing people to upgrade. Part of the problem is that the languages that we write operating systems with, limit the size of data words, to the maximum available in that cpu architecture. When a new datapath width is created for a cpu, it takes a while for the languages to upgrade to deal with the larger datapath. Luckily a smaller word can be simulated in a larger datapath simply by padding it with zeros. However it just as easy to simulate a larger datapath width by using two separate words, and logically combining them. The reason this isn't done very often is simply marketing. Everyone gets to sell new versions of their software when the datapath width increases, making the former versions seem obsolete.
How you encode a particular element, determines how much memory it will take to store it. More efficient coding techniques store in less space and less efficient storage techniques store in more space. Although the original concept of storage was that it was expensive, and therefore should be minimized for any element, this viewpoint has been reconsidered more and more over time, as we find new uses for redundant storage techniques. On the other hand, digital storage only stores a symbol for what we want to store. Experience has shown us that the size of that symbol, depends partly on the amount of information we know about the symbol. Given that we need only to signal one message, and the content of the message is known, a single bit, is all that is needed. However the more general the storage technique and the less that is known about each signal, the larger the symbol set has to be to cover all the signals, and therefore the larger the code has to be.
In some cases, it really doesn't make sense to store data as efficiently as possible if only because we cannot fetch it in its most efficient form in a CPU with a data-path width of any size. An example of this, would be a small integer or the Logic symbols for true and false. Since we will usually want to move data in fetch cycles of a single data word, in order to reduce the word to the size of a small integer or a logic symbol, we would have to do more processing that is necessary if we simply symbolize the element using a larger storage size. On the other hand, if we want to squeeze as much storage into as small an area as possible we want the most compact storage scheme, and we might want to reduce storage redundancies using some sort of compression mechanism to make the records even smaller.
Once we have agreed on a symbolic format for the data, and this is not as insignificant as it might seem since computers tend to have Byte-Sex, and different forms of representation for negative integers and floating point numbers, the next problem comes about because storage itself depends on the technology of the memory. Different types of memory store data, in different circuits and the characteristics of those circuits are also different.
The main interest in memory types lies in the difference between RAM and Rom. Essentially ram can be written to easily, and must either be backed up with a battery, or must be constantly refreshed depending on whether it is Static or Dynamic Ram respectively. When the computer is shut off, Dynamic ram empties out within split seconds, and is lost. Static ram with a battery remains around, but tends to be more complex, and therefore more expensive. Rom on the other hand stays around without any power but isn't as easily written to.
As a result initialization of a computer usually depends on the ROM while everyday use is based in RAM.
Although the early ROM chips had to be built in a factory, a new type of ROM was developed called PROM where fuses could be burnt to program the chip after it had left the factory. Later these chips were replaced with Eproms, chips that had eraseable fuses, that could then be reprogrammed later. Although these were for a while replaced with EEproms or Electrically Eraseable Eproms, the advent of the Flash Rom made these awkward chips obsolete, since Flash Eproms could be written in circuit. Today Flash Rom chips are so inexpensive that Multi-gigabyte storage units using Flash technology can be carried on a keychain. This eliminates the need for a small boot sector to load the operating system, since an operating system would easily fit in a small bank of Flash Rom chips.
Operating systems can also be dated by their reliance on secondary storage. The first generation of operating systems, for instance didn't have very large storage capabilities and so they tended to offload storage onto some secondary medium between runs.
The earliest form of secondary storage was punch cards. Essentially hollerith punch-cards were an adaptation of the Jaquard Loom, which automated patterns in cloth by passing a card with holes through it, across a special read head, which mechanically detected the holes in the card, and set up the heddles of the large loom to match the pattern on the card, lifting or dropping the thread so that the warp either sat in front or behind the weft.
Holriths invention was the electrification of the concept, so that cards that passed through a reader activated different electrical contacts as the holes passed between two contacts and acted as an insulator, or allowed contact, resulting in activation of a circuit.
The next type of secondary storage was the Magnetic Tape. Often this was used to store batches of data and programs so that they could be run multiple times. One necessity of this type of storage was the ability to quickly determine near the start of the tape, how far down the tape to go, to get to a particular program. To do this Tape operating systems wrote a special set of records called a directory to the tape, and wrote timing marks along the tape to format it so that the right section of tape could be found.
An intermediate form between these two forms was the ticker tape, which was adopted especially for use in teletypes. Essentially long paper tapes could be created using punched holes similar to those punched in the hollerith cards.
One interesting variation on the tape was something called a Drum or Cylinder memory, which used multiple read write heads to store data on essentially a magnetic drum.
Of more interest were the Hard Disks, where disks of magnetic media were piled to form clusters each with a read head on the top and bottom of the disk. As these got smaller and smaller, the size of the storage that could be kept in a single disk increased until today's multiple hundreds of gigabytes, in a single 3.5 inch format device.
A similar storage capability was to be found in the floppy disk, where a slightly thicker film of magnetic material was used as temporary storage and to move data between computers. Although the 3.5 inch floppy is almost obsolete, it is still supplied with some computers to this day. Earlier versions were 8 inch and 5 1/4 inch floppies. Storage densities were low for the 8 inch floppies and increased for the 5 1/4 inch floppies but maximized at about 1.4 megabyte for the 3.5 inch floppies. Floppy disks were cheap and disposable.
The idea of an optical disk, is that data which is written and read by an optical device like a laser beam, can be stored in a smaller area if the frequency of the laser beam is high enough that its wavelength can be focussed on an area smaller than the minimal magnetic domain on a magnetic disk.
The 12 inch laser disk was invented to make it possible for people to have movies that they could buy like records. However the timing of this technology was wrong, because people were moving away from records as a storage media. As a result of the industry standardizing on the VHS video tape, for distribution of their product, the inventors of the Laser Disk were left with a novelty product.
Sony one of the developers of the Laser Disk, decided instead to make a smaller format laser disk called the Compact Disk because it was in the 5 1/4 inch format. Their idea was to promote this disk as a medium for the distribution of Music.
Then, using the cost savings of volume production, they would create a computer storage device called the CD ROM that could be used for the distribution of application programs. The Computer version stored 625 Megabytes of storage, and has become cheaper and cheaper to produce until now the blanks of a recordable version are less than a dollar a piece.
CD-R and CD-RW Disks
By making it inexpensive to produce CD burning decks and thus to store data of many formats on a CD, the CD manufacturers created a market for computer back-up storage and program distribution.
A different format disk was designed to have even better storage characteristics, so that it could be used to store a movie. Called a DVD, this disk could store 4.7 gigabytes of storage on a single track, and came in a two track configuration so that if the second track were programmed the disk could store 7 gigabytes.
Just as before, the DVD manufacturers designed a computer storage device, and used the cost savings from the volume manufacture of Movie distribution in the format to drop the price of the equipment down so that it became economical to install a DVD-Ram disk as standard equipment on a computer.
Storage from the Operating System Perspective
Frankly, it shouldn't mean anything to an operating system what storage technology is implemented, except for the driver that implements the control mechanisms for the device. What is interesting from the view of an operating system is instead how it uses the storage.
One of the reasons for using non-volatile memory is that it doesn't change. no matter what some malware hacker tries to do, usually the non-volatile memory is unaffected. The hacker would have to physically switch the chip to make a difference to the operation of the computer. Of course this assurance is not quite as effective with flash memory, as it was with ROM, but non-volatile secondary storage such as optical disks makes it practical to store programs without worrying about them being broken by a virus.
In fact in the days of Windows 98, recovery from most viruses no matter how virulent they were, was as easy as reloading all the software from compact disks. Sure you lost any data that was not backed up, but that is what back-ups were for.
On the other hand volatile memory is cheap, and getting cheaper. Every generation of motherboard can store more data, than the previous generation in dynamic ram. Sure you lose that memory in the case of a power outage, but with large hard drives, mostly the programs and data are ready for recovery with very little effort.
Databases and Memory Applications
Raw storage is therefore easy. However getting the data back out in a usable form with anything but the program that stored it, can be problematic. This is why a special class of data storage and retrieval engine called a database was created. The idea was that the program could store and retrieve data in more ways than the usual storage capability would allow. Databases have been around long enough that programs are now calling them for storage, or using their own proprietary database within the program to speed access to storage.
The problem is that database systems rely on people to feed them data, according to some form, or programs to feed them pre-formatted records. People without experience in database design, are not willing to try to understand them when they could store the memory in a less efficient storage mechanism and use it that way. A good example is Access, the Microsoft Office database engine. People consistently try to store databases in Excel the spreadsheet rather than using Access.
Archiving and Backup
One problem with modern storage devices is that they seldom are set up for best operation as Archives or Backup. What happens when you are done with some data, but don't want to throw it away? does it get stored in a manner that will allow it to be retrieved in a year or two, or maybe 10?
When it is important to be able to recover old files, you need a special type of storage capability, something that is relatively cheap, and something that can either automatically or at least periodically be used to create images of your data, so that they can be retrieved later.
This is what an archiving and Back-up utility does. Storing large amounts of data in a compressed format so that it costs less to store, but keeping intact data that has not been used for years, so that it can be recovered later despite intervening power outages, malware attacks, or upgrades to your system, or other changes that distort your memory trace.
In this vein, one way of achieving this sort of assurance of Persistence of Data, is to implement Transparent Persistence as a part of your operating system. Instead of forcing the operator to write to disk every so often to store their data, Transparent Persistence is how for instance a word processor saves most of your work, when the work is in process and a power outage happens. A similar process, might for instance save an article like this, on a wiki, without the author having to tell the wiki to save it. Recently operating systems have been coming out with journaled storage, where updates to a file are temporarily stored to a journal, where they can be recovered despite an power outage. This means that even though the latest changes might not have been recorded, most of the work is stored in the journal and can be recovered as part of starting up the program again.
In Linux, Journalization of the operating systems storage, means that mounting a volume that has had changes done since the last superblock update, does not require a lengthy recovery process since the changes are entered into the journal and can be updated fairly quickly. Because the operator does not explicitly tell the computer to store the data, Transparent Persistence is seen as an early model of implicit memory.
If transparent Persistence is seen as a form of implicit memory, then it should be no surprise that something called the Workspace, which is a name commonly used for a Blackboard system that is used as working memory might represents a sort of intermediate short term memory which can be used by multiple programs at once.
Douglas Hoffsteader designed a program called Copycat, that showed how a simple associative memory called slipnet could orient information creating relationships between different pieces of information so that inductive patterns in the information could be found. Of course his work was meant to find equivalent anagrams for simple text strings, but later Stan Franklin showed how such a program could be expanded to become part of a cognitive architecture called IDA. It relies on a Workspace to achieve associative memory.
The idea can be incorporated with transparent persistence by first placing something that is going to be persisted in the Workspace and letting slipnet build associative patterns for it, before its permanent storage. That way the original data plus the associations get both stored in persistent memory at the same time.
- Wiki: TransparentPersistence
- How To Backup Operating Systems
- UNIX Computing Security/Data security
- Minimizing hard disk drive failure and data loss