Choosing The Right File Format/Formats

From Wikibooks, open books for an open world
Jump to: navigation, search

Formats for storing electronic information[edit]

At any one time there is an enormous variety of file formats in use for various purposes, so how do you choose which one is best? There are three type of file formats:

  • Proprietary, closed specifications
  • Proprietary, open specifications
  • Non-proprietary, open specifications

Proprietary, closed specifications are used by some of the most common software, if you don't use them yourself you probably get sent them. However because these formats are not publically documented, you are held hostage to the company making the software. If they decide not to support old versions of their own format, suddenly you can't open your old files! Then your choice of software is greatly dependant on any new software ability to second-guess the format used by your old software. Examples of this type of format are those from the Microsoft Office Word doc format and Excels xls format, and Adobe Photoshop's Document (.psd).

Proprietary, open specifications are somewhat better in that although the format is still legally owned and developed purely for their commercial benefit by one company, they have undertaken to document the format openly. They can still choose to switch back to a closed specification, or they may make changes they choose not to document. In other words, a proprietary open specification is only open as long as the company wants to keep it that way. Examples of this type of format are Adobe's Portable Document Format (.pdf) (patented, although most of the patents are licensed on a royalty-free basis), Adobe TIFF format (.tiff) and Macromedia Shockwave Flash (.swf) (however, the documentation is under a non-disclosure agreement that requires readers not to contribute to any other implementations of Flash, so in practice it is still closed).

Non-proprietary, open specifications have been openly documented by some public body (or released to them) by developers. Once released these formats have a guaranteed reference point. Examples of this type of format are Portable Network graphic (.png) Joint Photographic Expert Group (.jpg / .jpeg) ( .mpeg2), eXtensible Markup Language (.xml) (the structure, more than the specific format), and Scalable Vector Graphic (.svg).

One special case is the Adobe Portable Document Format for archival (pdf-archive or PDF-A), a restricted application of the proprietary open specification PDF 1.4 format. It is a published ISO International Standard from 2002 [1] and developed by the PDF-Archive Committee in close partnership with the Administrative Office of the U.S. Courts [2]. I have not found any software which supports this format, so it is possibly only used in organisations where archival is their main concern.

One family of formats which could solve many issues is collectively called OpenDocument developed by OpenOffice.org, OASIS, and many others in the industry (but not Microsoft). OpenOffice.org 2.0, recently released, uses the OpenDocument family of formats. Of the software supporting OpenDocument, OpenOffice, AbiWord and Google Docs are cross-platform, and KOffice will be as of KDE 4.1 (around July 2008). Mac OS X 10.5's TextEdit can understand the format to some degree, and Microsoft states it will add native support for OpenDocument 1.1 (rather than plug-in converters) to MS Office, as of Spring 2009.

Unfortunately leading software often defaults to a format which is inherently unsuited to later retrieval. An example is Microsoft Word which defaults to their native .doc format rather than the better documented and more widely supported Rich Text Format (.rtf), Though you can change the default format. Microsoft products are also notable for their use of what Marshall Masters of the Independent Book Publishers Association calls 'upgrade blackmail' and describes as "Someone with a new version of your desktop application edits your file, and now your older version of the application cannot read it, which forces you to pay for an expensive upgrade if you want to continue working and playing well with others."[3] That's definitely something for anyone with a budget to avoid.

So, what's the next level of future-proofing? Read on...

Criteria in choosing future proof file formats[edit]

Formats for future proofing must:

  • Be supported by comprehensively, public documentation
  • Be stable, not under constant revision
  • Be supported by several software providers
  • Be supported on various hardware
  • Be supported by software on various operating systems (Windows/Macintosh/Unix/Linux)
  • Be free of legal restriction in its use (see PNG not GIF)

Additional consideration:

  • Popular formats are more likely to remain supported

Criteria in choosing suitable software[edit]

There are some software implications of the format criteria. Not all software uses open specifications correctly, so they only appear to be using a particular format. This is most common with Hyper Text Markup Language (HTML) editors and the notorious 'Save as html' option in Microsoft Word. So if you are buying software, make sure to check this. Open Source/ Free inherently tends to have strong support for open standards.