Evolution of Operating Systems Designs/HCI: beyond WIMP
Human-computer interaction(HCI) deals with how people interact with computers to accomplish tasks. In this chapter we look at how interaction with computers has evolved.
The Hollerith Punch Card was based on the concept of a Jaquard Loom, where patterns in the cloth were defined by cards with holes in them that triggered the heddles on the loom to change whether the warp was in front of or behind the weft. What Jaquard did with steam looms, Hollerith was able to do with electricity, thus making it possible for an operator to punch in text lines at a punch card unit, and have them converted into electrical signals at the card reader.
Computers began being connected to teletype machines in the 1950s. Teletype machines allowed users to perform tasks by typing in commands rather than using punch cards to input instructions. Soon afterwards, text-based monitors called "terminals" followed replacing printers as a means of communicating information. Today computers continue to emulate the teletype machine, while adding new functionality and features, in what is called a command-line interface (CLI).
WIMP (Windows, Icons, Menus and Pointer) was invented for Smalltalk in the late 1970s. Since that time, it has steadily colonized modern operating systems. Unfortunately, nearly all implementations of WIMP are broken variants without the full programmable power of CLIs. Some simple CLI concepts that could be easily added to WIMP include pipes, scripts and graphical location history. If these were incorporated, CLIs could be done away with.
Despite the decrepitude of WIMP, there has been no meaningful advance in mainstream OS' HCI since 1980. That's decades of stagnation punctuated by minor innovations like virtual desktops and context menus. The latest triviality to hit the market is the ever so gradual replacement of icons by object miniatures. Fortunately, despite the total incompetence of mainstream OS designers, research in HCI has managed to sustain a snail's pace.
The hardware exists, ready to be used, but operating systems don't take advantage of it. Voice recognition is the single most under-utilized opportunity in HCI.
Voice recognition suffers from a multitude of problems. Many environments are noisy, making recognition inaccurate and thus hazardous. Constant use of voice recognition can make people lose their voice. Voice recognition has privacy and annoyance problems. Imagine overhearing your boss say "Open personal. Copy barnyard-porn to CD-ROM. Delete barnyard-porn. Close personal." while you are trying to concentrate on your work.
Some of these problems can be partially avoided. For example, the user might say "Copy this to that", where this and that are the objects pointed to by the mouse at that particular point in time. The problem of voice overuse may be reduced by reducing the usage of voice recognition, letting voice be a supplementary input mode instead of a primary input mode.
Natural Language Interaction
The heritage to this idea can go as far back as the very early MUD program ADVENT or even ELIZA, a psychotherapist simulation that has evolved into ALICE and other natural language processors. The voice recognition software would really be built on top of this technology, although there would have to be other semantic rules to add with spoken instead of written language recognition.
This sort of interface is already implemented with websites like "Ask Jeeves" and "WikiAnswers" and other similar search tools where you ask the computer for information using a normal natural language question rather than some sort of structured query statement like SQL. This does eliminate the privacy issues but does require some more sophisticated AI research in order to succeed.
An interesting aspect of this HCI model is that it is usually conceived as a throwback to the CLI, it is imperative. Ironically enough, it is voice recognition that is strictly imperative and language processing that is supposed to move beyond this narrow interaction model. In any case, natural language interaction has been an eventual goal of many early operating system designers and even computer programming language designers.
This HCI model was explored in part by the Science Fiction author Frederick Pohl in his Gateway series of novels where the protagonist is married to a software engineer who does her programming using a modified natural language interface.
The primary issues involving this approach is that the raw database of semantic information to be able to process effectively most natural language queries needs to be quite large. In some ways even larger by an order of magnitude than many current GUI operating systems. A natural language software development environment will still have to have some sort of formalized structure to avoid ambiguous or even contradictory semantic meanings. It is also easier to write SQRT(SIN(X^3)/PI*7.55) than to describe a function as "The square root of the quantity sine of ex cubed which is then divided by the quantity Pi times seven point five five". Even here the semantic issues involved can mess up the equation but at least can be a starting point to see what natural language programming languages would be like.
Any interface will have a default resolution, probably linked to the screen resolution and size. However a significant portion of the population have problems with their eyesight, and need larger characters in order to read. The idea of a zoomable interface, is that the interface is not limited to the default resolution, and can be zoomed in, to enhance the size, or zoomed out to get an overview.
This concept is also useful with multi-screen implementations of an operating system. The ability to zoom out to take a look at the multi-screen view, and zoom back in to see a single screen view, would be quite useful in complex user environment like workstations, and Architectural programs where you might want to place different elevations on different screens, and page back and forth between them to see how changes affect the other elevations.
Despite its potential, 3D has never been used as anything more than a gimmick outside of, say, very specialized HCIs for biochemists. There are several issues that hold this interface concept back including a lack of a powerful model that is unique to a 3D environment such as the desktop model in a 2D environment. Most 3D environments are merely extensions of this desktop model, and have a very decided "gimick" quality to them that fails to justify the increased computing resources necessary to provide this sort of environment.
There are very few true 3D devices that computers can manipulate on a real-time basis, which also limits this direction. While you can get a true 3D hologram like you see in the movies, these display systems can cost in some cases over millions of dollars to set up and deploy and are very fragile at the moment, subject to frequent break downs. Mainly used for military and as gimicks for places where they can justify the costs like Disneyland, and even in these places they are just a demonstration device that has very limited practical use, with only a hint for what they may eventually become. For most people who use a 3D device environment, it is almost always on a normal 2D computer screen, which is sometimes called 2.5D to acknowledge that it is more than merely a pure 2-dimensional display, but that it hasn't achieved a full third dimension.
A notable model that seems to have caught quite a bit of interest was originally described in the book Snow Crash by Neal Stephenson. While this book covers many things, the "Metaverse" is a multi-user 3D environment that uses avatars to do social interaction through a virtual reality environment. This book spawned the development of VRML, which borrowed heavily from even earlier 3D rendering formats.
Perhaps remarkable was the virtual world known as Alpha World, which enjoyed a huge burst of popularity during the mid 1990's, but financial problems plagued the company that was hosting the servers running all of the background information. Alpha World acknowledged its roots from the Metaverse including its use of avatars and some of the environments that appeared within Alpha World, including some buildings that were originally described in Snow Crash. A notable feature of Alpha World is that users could "claim" real estate and build features and landmarks within the world, denying access in some cases to users they did not like. Maping out the various user projects make the appearance of Alpha World look very similar to Urban sprawl in modern cities throughout the world, from a far view. Alpha World is still technically under development, but the original development team that created it in the first place has long since left and the software ownership has changed hands several times.
The most likely use of 3D technology from a user standpoint is going to be acting within virtual worlds like the Metaverse, or in Multi-Player Roleplaying Games like Everquest or World of Warcraft. While possible to manipulate computer data directly through these sort of environments, there doesn't seem to be a compelling interest to develop a compiler that must have software developed in that nature.
Direct Manipulation, is a term coined in 1983 by Ben Schneiderman to cover the idea that interfaces should echo normal experience so that the interface is intuitively obvious to the user. His interest was in the design of the GUI, but this term has spread to mean any interface that is meant to be more intuitive than a WIMP GUI. There are a great number of different ways in which this type of interfacing can be implemented, and it includes a great number of subtopics because of that variety. Among other topics is the idea that an interface should give feedback to the user, like the real thing, so the steering wheel for instance should fight the user on a fast turn, and the brake pedal should indicate whether or not the brakes have locked, or throb with anti-lock device like realism. Another idea is that you should use a rod, for golphing or tennis instead of a mouse or joystick the Wii gamebox has introduced such a system. A further example is the accellerometer in a phone, that allows the user to control games by tilting the phone.
Direct Access to Hardware
User Choice: Manual Placement
People like to customize their computer systems. As a technician working on other computers can attest, sometimes it is difficult to find your bearings in a computer that has been over-customized. Familiar programs can take on new and confusing characteristics, with the result that you might not recognize them when you first sit down at someone else's computer.
Allowing the user to place their graphics components wherever they want, creates opportunities for the user to accidentally change the placement of the graphics components without knowing how they did it, and thus how to change them back. An operating system designer must carefully consider how they will implement manual placement. An example of how this can go wrong, is the drag and drop mechanism that was developed for WIMP. If every graphics element can be moved, then we get interesting effects like dragging a tool bar from the top of the screen to the side of the screen, and having it vertically aligned instead of horizontally aligned.
Many users don't understand drag and drop technology, and often hold the button on their mouse down, while they are moving it, thus dragging something they didn't mean to drag, or alternately, have to lift the mouse off the table in order to extend it's range, and accidentally drop the object in a location other than one that can be interpreted as being a valid placement location. For instance what configuration should a tool bar have, if it is placed halfway down the page?
Further, it is often possible to move something in the foreground and have it affect the placement of objects in the background, like for instance using two windows, and manipulating them vertically so that you can drag and drop between windows, sometimes scrambles the placement of icons on the desktop of Windows. In order to deal with effects like occluding one icon with another, it is often necessary to use a technique like snap to grid, that places the elements according to a grid vertex. However long icon names have been known to occlude other icons, which makes cropping the icon names useful, but eliminates information needed to understand what the icon does. Which then requires some sort of hover mechanism in order to expand the full filename without actually selecting the icon. Because of these considerations either every graphics component should be user adjustable, or careful consideration of the extremes of adjustment must be considered to decide which graphics elements will be snap to grid, which will have only valid locations accepted, and which will have complete flexibility to be placed wherever the user wants.
Although this type of input is still somewhat experimental, and the standards for it have not been set, The use of Self-Organizing Maps to interface between an implant and a robotic arm, suggests that it should be possible, at some future point in time, to manage a pointing device, as if it were a part of the human body, While this is not yet mind-reading, it could offer an interface for people like paraplegics that do not have the use of normal pointing devices.
- Teuvo Kohonen,Self-Organizing Maps Springer (2001) ISBN 3540679219