FOSS Open Standards/Comparison of File Formats
This section will list, compare and discuss the degrees of openness and/or lack of openness of several popular file formats. These include file formats for the following application areas:
- 1 Office Applications File Formats
- 2 Graphics/Image File Formats
- 3 Audio File Formats
- 4 Video Formats
- 5 Video Containers
- 6 Video Compression formats
- 7 Footnotes
- 8 Further reading
Office Applications File Formats
Microsoft Office Formats
Currently, the most popular office application is Microsoft Office (MS Office). This suite of office software comprises mainly (depending on the type of suite purchased) word processing (MS Word), spreadsheet (MS Excel) and presentation software (MS PowerPoint). Up till version 10 (MS Office 10), the file formats used were binary (i.e. non-plain text) in nature and not publicly published. MS Word, MS Excel and MS PowerPoint use the binary DOC, XLS and PPT formats, respectively, and these are proprietary formats, being owned and controlled entirely by Microsoft.
The file formats for these applications are widely used due to the popularity of MS Office. Other software not from Microsoft, e.g. OpenOffice.org or StarOffice, are able to read and write files using these proprietary formats but the compatibility is incomplete. Competing products cannot be totally compatible with MS Office unless they are provided with the file format specifications by Microsoft.
Some MS Office applications like Word and Excel can save their data in what is known as the Rich Text Format (RTF) file format. This is a non-binary file format that has been developed by Microsoft for crossplatform document interchange. Technical documentation on RTF is published by Microsoft and as many non-Microsoft software support the RTF file format well, it is widely used for document exchange between MS Office and other office applications. However, the RTF format does not completely support the more complicated and sophisticated features found in MS Office, and complex documents may not be properly represented using the RTF format.
With MS Office 11 (MS Office 2003), the option to use a new XML-based file format for Word and Excel was made available. However, these XML-based formats have been criticized in some quarters for being incomplete and immature. They were not available for all the software applications in the suite and some major functionalities were not supported in those available. As a result, the traditional binary MS Office file formats remained in use mainly. In June 2005, Microsoft announced that MS Office 12, due in 2006, will deliver support for a new set of XML file formats called the "Microsoft Office Open XML Formats". The applications that will use these formats by default are Word, Excel and PowerPoint.
Office XML Open Format is also being published by Microsoft on a royalty-free basis to the industry. While, potentially, this will make it possible and easier for third-party products to be compatible with MS Office, the file format will still be owned and controlled by Microsoft and, hence, is not open.
In an attempt to allay fears over this and to allow customers, notably corporations and national governments with long-term archival needs, to access the contents of their documents created with MS Office without being dependent on Microsoft, the Office XML formats have been submitted to ECMA International for standardization.
OpenOffice.org and StarOffice Formats
OpenOffice.org (OOo) is a full-fledged Open Source office application suite, comprising word processor, spreadsheet, presentation software, graphics editor and a database program (available in OOo version 2 only). The original file formats used by OOo were XML-based. As there were several files associated with a single document, all the files were compressed and stored as a single zip-compressed file. OpenOffice.org is available on multi-platforms, e.g. GNU/Linux, MS-Windows, Mac OS X, etc., and offers multi-lingual support. It is compatible with all other major office suites. In particular, it is able to read and write MS Office file formats. The degree of compatibility is very good though not complete.
The OpenOffice.org file format was submitted to OASIS to form the basis for a new standard for office applications and this resulted in OASIS coming up with the OpenDocument Format for Office Applications (OpenDocument) v1.0 in May 2005. The OpenDocument Format has also been accepted as an international ISO/IEC standard (ISO/IEC 26300).
New versions of OOo as well as other office suites like KOffice and StarOffice now support OpenDocument as their native file formats. This will significantly improve the interoperability of office software and enhance document exchange. What is most important though is that all these office applications now use a standard open file format for storing their data. The OpenDocument format is not owned or controlled by a single vendor, instead it falls under the ambit of OASIS, an open standards body. Users can, thus, be assured that they will have access to their documents and data from a variety of software.
StarOffice shares the same code base as OOo but it is released under a proprietary commercial license. In addition to the core functionalities of OOo, it also comes with some proprietary and third-party modules, e.g. the Adabas B database and some proprietary clip art galleries and templates. StarOffice uses and supports the same file formats as OpenOffice.org.
Adobe's Portable Document Format
PDF is a file format developed by Adobe Systems, Incorporated for secure and reliable electronic document distribution and exchange. The format is able to preserve the look and integrity of the original document, regardless of the application and platform used to create it even if it contains complex combinations of text, graphics and images. As such, the PDF format is very useful as a format for multiplatform document exchange and distribution and for sharing information. However, one major drawback of PDF is that it is an end-form format, i.e., it is not suitable for modifying or re-writing its contents.
The PDF format is a standard set and controlled by Adobe. It also contains several patents owned by Adobe but licensed royalty-free for use. Older versions and subsets of PDF (e.g. ver 1.4) have been adapted as ISO standards (e.g. PDF/X for printing and graphics, ISO 15930, and PDF/A for long term preservation of electronic documents, ISO 19005). However, the industry mainly makes use of the published PDF specifications from Adobe rather than the ISO standards in implementations of software to use PDF. The specifications for the PDF format is publicly published by Adobe and it can be implemented without restrictions by anyone (provided that there are no objections from Adobe). As a result, a variety of software on many different platforms is available that can read the PDF format, and a (smaller) number of applications that can write out the contents of a document in PDF.
|Office Document Formats|
|ODT (text)||OASIS, ISO/IEC||Yes||Yes||Yes|
|ODS (spreadsheet)||OASIS, ISO/IEC||Yes||Yes||Yes|
|ODP (presentation)||OASIS, ISO/IEC||Yes||Yes||Yes|
|PDF (text and presentation)||Adobe||Yes||No||Partial|
Due to its popularity and wide support, PDF can be considered a de facto standard as a file format for information exchange and sharing but since it is created, owned and controlled by Adobe Corporation, it does not meet the technical definition of an open standard. The PDF specification are actively being developed by Adobe with no means of open participation by interested parties and control of the specification always lies in the hands of Adobe. While the specs are openly available there are specific constraints in the implementation of the features in the specs. Thus, Adobe can, when it sees fit, impose specific constraints on another party attempting to make use of the specification. The recent decision by Adobe not to allow Microsoft to include as a native option in its MS-Office 12 software to enable a user to save or export the contents in PDF format is a very clear example of this!
Graphics/Image File Formats
A picture is worth a thousand words, as the saying goes. It is not surprising then that, with the advent of powerful desktop systems that are able to display high resolution graphics, images are being utilized more and more to convey information. Modern computer systems use what is known as raster graphics to display an image on the video screen. A raster graphics image, digital image, or bitmap, is a data file or structure representing a generally rectangular grid of pixels, or points of colour, on a computer display monitor. Each point or pixel on the screen is represented by a value denoting its colour and this bitmap is stored in memory. Using this bitmap, the entire screen is repainted 30 or more times per second by the video device resulting in the human eye seeing the image being displayed on the screen. There are many ways to create and store this raster graphics image file and so if we are to be able to exchange and share useful graphical information there is a need to have a format that is supported on multiple platforms and by various graphics software.
Many graphics file formats in use today are proprietary by nature, being derived and tied to the software used to create them. There are some formats that have gained wide acceptance as de facto standards and a few of these have emerged as open graphic file formats.
GIF is a bitmap image format that is widely used on the World Wide Web, especially in its early days as this format resulted in small graphic file sizes. Images stored as GIF files are generally limited to 256 colours. The GIF format makes use of the LZW compression algorithm that was patented in the USA by Unisys. After the GIF format found widespread use on the Web, Unisys asked for royalty payments for all software that utilizes GIF (this patent has since expired in the USA, in 2003). This led to the diminished use of GIF and also to the creation of alternatives to it, notably the PNG format.
GIF is still used for simple animated images as this is not supported by PNG.
The PNG format was created as an alternative to GIF when Unisys decided to enforce its software patent on LZW data compression that was used in the then popular GIF format. The PNG format, like the ZIP format, makes use of the unpatented DEFLATE compression algorithm. PNG is an extensible file format for the lossless, portable, well-compressed storage of raster images. It offers indexed-colour, grayscale, and true colour image support, plus an optional alpha channel for transparency. It is fully streamable with a progressive display option making it useful for online graphics display in Web pages. It also boasts robust features, providing both full file integrity checking and simple detection of common transmission errors.
PNG is supported by all major graphics software and is now very widely used. It has become an open file format standard and it is a W3C recommendation as well as an ISO international standard (ISO/IEC 15948).
The XPM (XPixMap) format is a de facto standard for creating icon pixmaps for use in GUIs based on the X Window System. It consists of an ASCII image format and a C library. The XPM format defines how to store colour images (X Pixmap) in a portable way while the associated library provides a set of functions to store and retrieve images to and from XPM format data.
The Tagged Image File Format (TIFF) is a file format for digital images. It is a specification that is now owned by Adobe Systems, Incorporated. TIFF is widely used in image applications in the publishing industry and also supported by most image scanning and editing software. The specifications for the TIFF format is publicly published by Adobe and it can be implemented without restrictions by anyone. As a result, there is available software on many different platforms that can read and write the TIFF format. It has become a de facto standard graphics format for high colour depth (32-bit) graphics.
TIFF/IT, which is based on TIFF, is a specification for the exchange of digital advertisements and complete pages (e.g., newspapers, magazines). This has been made an ISO standard (ISO 12639) as a media independent means for pre-press electronic data exchange.
JPEG is a standardized image compression mechanism from the Joint Photographic Experts Group (JPEG). The file format that employs this compression is JFIF (JPEG File Interchange Format) and JPEG JFIF is what people generally mean when they refer to "JPEG" The JFIF file format was created by the Independent JPEG Group (IJG) for the transport of single JPEG-compressed images.
The JPEG compression uses a lossy mechanism for compressing colour or greyscale images. It works well on natural, real-world scenes like photographs, naturalistic artwork and similar material but it does not fare too well on lettering, simple cartoons or line drawings. The basic JPEG format is the most common format used for storing and displaying photographic images on the Web. One reason for this popularity is that the amount of compression can be adjusted to achieve the desired trade-off between file size and visual quality. The JPEG compression is now an ISO standard – ISO/IEC 10918 Parts 1–4. There are potential patent issues with JPEG, especially with some of its optional features, namely arithmetic coding and hierarchical storage and so for this reason, these optional features are seldom used on the Web.
Unlike other file formats listed above that are meant for raster graphics, the SVG (Scalable Vector Graphics) format is meant for vector graphics, i.e. the use of geometrical primitives such as points, lines, curves, and polygons to represent images in computer graphics. SVG consists of an XML-based file format and a programming API for graphical applications. It is a W3C recommendation and is starting to become a popular choice for including graphics in XML documents. As an SVG document can include raster images such as JPEG and PNG, it can be used to add raster and mixed vector/raster graphics to XML documents.
The SVG format is important as it offers a way based on open standards to render graphics optimally on all types of devices. While currently the usage of SVG usage on the Web is somewhat limited, this should change in due course as more Web browsers support it natively. For the mobile phone industry, it has become the basis for its graphics platform with the publication of the SVG Mobile profile targeted at resource-limited devices such as mobile handsets and PDAs.
Audio File Formats
There are two major groups of audio file formats:
- those using lossless compression, e.g. like WAV, FLAC
- those using lossy compression, e.g. MP3, Ogg Vorbis, WMA, AAC
In the lossless compression of a piece of data, nothing is lost during the compression and the original data is restored upon uncompressing. In lossy compression, some data is lost during compression and upon uncompressing the data is not identical to the original but possibly close to it. Lossy compression is used mainly in the compression of multimedia data like audio or video where the loss of some details is tolerable under certain conditions, e.g., the human eye is unable to discern the loss in certain details of an image or video.
WAVEform audio format (WAV) is a Microsoft and IBM audio file format for storing audio on PCs. It is the main format used on Microsoft Windows systems for raw audio storage. The WAV format is most commonly used with an uncompressed, lossless storage method (pulse-code modulation) resulting in comparatively large audio files. Today, the WAV audio format is no longer popular being superseded by other more efficient means of audio storage.
Free Lossless Audio Codec (FLAC) is a popular lossless audio format with compression designed specifically for audio data streams, achieving compression rates of 30–50 percent. The format specification is publicly available and forms part of the FLAC Open Source project. It is supported by a growing list of audio software and devices.
MPEG-1 audio layer 3 (MP3) is a popular lossy compression audio format. The MP3 specification was set by the Motion Pictures Experts Group (MPEG), a working group of ISO/IEC charged with the development of video and audio encoding standards. The compression scheme and format for MP3 forms part of the MPEG-1 video and audio compression standard specifications and is an ISO standard, ISO/IEC 11172-3.
MP3 is one of the most popular audio file formats in use today. Music files encoded with MP3 are particularly popular on music exchange and download sites on the Internet due, in part, to the relatively small size of such files and the wide availability of free software on PCs that allow easy creation, sharing, collecting and playing of MP3 files.
MP3 makes use of patented technology and so software and devices that support it are subject to royalty payments in those countries that recognize software patents. This has led to the creation of alternatives to MP3, e.g. Ogg Vorbis and WMA.
Windows Media Audio (WMA) is a lossy compression audio file format developed by Microsoft. It is a proprietary format but is widely used and supported due to the popularity of the MS Windows platform.
Advanced Audio Coding (AAC) from MPEG is a lossy data compression scheme intended for audio streams. It was designed to provide better quality at the same bit-rate than MP3, or the same quality at lower bitrates (and hence smaller file sizes). The compression scheme and format for AAC forms part of the MPEG2 video and audio compression standard specifications and is an ISO standard, ISO/IEC 13818-7. This MPEG-2 AAC specification makes use of patents from several companies and a patent license is needed for products that make use of this standard.
RealAudio is a proprietary audio format developed by RealNetworks for low bandwidth usage. It was first introduced in 1995 and it became popular especially for streaming audio, i.e., the audio is being played in real time as it is downloaded. Many radio stations use RealAudio to stream their programmes over the Internet.
Ogg Vorbis is a compressed audio format that is believed to be free of patents and royalty payments. The format originated from the Xiph.Org Foundation, a non-profit organization dedicated to producing free and open protocols, formats and software for multimedia.
Ogg Vorbis uses the Vorbis lossy audio compression scheme. The audio data is wrapped up in the Ogg container format, the name of Xiph.org's container format for audio, video, and meta-data – hence the name Ogg Vorbis. The Ogg Vorbis specification is in the public domain and is completely free for commercial or non-commercial use. There is growing support for the Ogg Vorbis format from software and hardware devices as well as online audio services.
In order that a multimedia experience can be enjoyed properly by all without any discrimination, it is important that there exist multi-platform and multi-software support for it. This underlies the important role that open standards play in relation to video formats and technologies.
Video data's storage involves more than just finding an efficient means to store raw data; other data like tags, menus and possible media manipulation information need to be stored too. There may also be a need to store audio data as video frequently has sound associated with it. Also, the data stream is usually not stored in its raw form, it is transformed into a form more suitable for storage or transmission. A type of file called a container is used to store the data and associated information and a codec is utilized for encoding and decoding the data stream. It is important that the format of the container file as well as the codec that is supported by it follow open standards.
Almost all video containers popular today are proprietary. This is due to the popularity of Apple's QuickTime and Microsoft's Windows Media framework multimedia technologies. Some of these formats, through widespread usage, have emerged as de facto standards but remain proprietary formats all the same.
Audio Video Interleave (AVI) is a video container format from Microsoft containing both audio and video data. It is a Resource Interchange File Format (RIFF) file specification used with applications that capture, edit, and play back audio-video sequences. It enjoys widespread support and it is the most common container format for audio/video data on the PC.
Advanced Systems Format (ASF) is Microsoft's proprietary container designed for streaming. The codec is not specified in ASF but the most common ones are Windows Media Audio (WMA) and Windows Media Video (WMV). The ASF container structure is patented in the United States.
The MOV container is from Apple Computer's QuickTime multimedia architecture and technology. This video file format is openly documented and available for anyone to use royalty-free. As a result, there are several non-Apple video player software available which can play QuickTime video files. The proprietary Sorenson codec is usually used with QuickTime. The QuickTime format was used as the basis of the MPEG-4 MP4 container standard (see entry on MP4 below).
MPEG-4 Part 14 (MP4) is a container specified as part of the MPEG-4 international standard, ISO/IEC 14496-14. MP4 is designed to support streaming, editing, local playback, and interchange of content. Its design is based on the QuickTime format.
The Ogg container uses a bitstream format to encapsulate data from one or more sources. It can handle both audio and video data and while the codecs are not specified, there are several open codecs associated with the Ogg project, including Vorbis (see above) for lossy compressed audio, FLAC for lossless compressed audio, Speex for speech and Theora for video.
Video Compression formats
MPEG Compression formats
- MPEG-1 Part 2 (ISO/IEC 11172-2)
- MPEG-2 Part 2 (ISO/IEC 13818-2)
- MPEG-4 Part 2 (ISO/IEC 14496-2)
- MPEG-4 Part 10 (ISO/IEC 14496-10)
The MPEG-2 and MPEG-4 standards make use of numerous patented technologies and the vendors of commercial products and services that use them are expected to pay patent licensing royalties.
MPEG-1 Part 2
The MPEG-1 standard that specifies the MP3 audio codec also specifies a video codec for non-interlaced video signals. This codec can be used for compressing video sequences, both 625-line and 525-lines, to bit rates of about 1.5 Mbit/s. It is used in the Video CD (VCD) specifications and the picture quality is comparable to that found for the VHS video cassette recorder.
MPEG-2 Part 2
The MPEG-2 standard specifies a video codec for interlaced and non-interlaced video signals. MPEG-2 video is not optimized for low bit-rates (less than 1 Mbit/s), but outperforms MPEG-1 at 3 Mbit/s and above. The MPEG-2 video codec is backward compatible with the MPEG-1 codec. MPEG-2 is widely adopted for video broadcasting (e.g., direct broadcast satellite and cable TV), filmmaking, and DVD discs. MPEG-2 has a lot of market acceptance and a very large installed base.
MPEG-4 Part 10 (H.264/AVC)
This video coding standard is the same as the ITU-T H.264 recommendation and the technology is also known as Advanced Video Coding (AVC). It contains several innovative features that allow it to compress video more efficiently than earlier MPEG codecs. It also possesses more flexibility, which allows it to accommodate applications in a wide variety of environments.
This is a new standard and it represents the current state-of-the-art in the series of MPEG video compression standards. It is rapidly gaining adoption in a wide variety of applications and digital broadcasting and TV systems. Apple Computer has integrated H.264 into Mac OS X version 10.4 (Tiger), as well as QuickTime version 7 while x264 is a FOSS free library for encoding H.264/AVC video streams. H.264 decoders for Windows, GNU/Linux and Macintosh as well as video servers and authoring tools are available from a number of vendors.
Windows Media Video
This is a set of proprietary streaming video technologies developed by Microsoft as part of its Windows Media framework. It is the codec usually used in an AVI or ASF container and has support for digital rights management facilities. Microsoft has submitted WMV Version 9 to the Society of Motion Picture and Television Engineers (SMPTE) for approval as a standard under the name "VC-1".
This is a video codec from Xiph.org Foundation as part of the Ogg project. It is based on patented technology but it has been irrevocably given a royalty-free license to use the patents in the codec. The Theora codec is released under a Berkley Software Distribution (BSD) FOSS license and it is available freely for commercial or non-commercial use.
|Video Compression formats|
|Container – Compression format Commonly Used||Usage||Open/Close|
|AVI – WMV||Wide||Close|
|ASF – WMV||Wide||Close|
|MOV – Sorenson||Wide||Close|
|MP4 – MPEG-1, 2, 4||Wide||Open|
|Ogg – Theora||Limited||Open|
- Office 2003 XML Reference Schemas http://www.microsoft.com/Office/xml/default.mspx
- CNET News, 1 June 2005, "Microsoft adding XML files to Office 12" http://archive.is/20130628223835/http://news.com.com/Microsoft+adding+XML+files+to+Office+12/2100-7344_3-5728536.html?tag=st.ref.goo
- The OpenOffice.org Project http://www.openoffice.org
- Adobe Inc., "What is Adobe PDF?" http://www.adobe.com/products/acrobat/adobepdf.html
- Wikipedia (the free-content encyclopedia) entry on "Portable Document Format" http://en.wikipedia.org/wiki/Pdf
- Wikipedia (the free-content encyclopedia) entry on "Raster graphics" http://en.wikipedia.org/wiki/Raster_graphics
- Graphics Interchange Format Version 89a http://www.w3.org/Graphics/GIF/spec-gif89a.txt
- Portable Network Graphics (PNG) Recommendation http://www.w3.org/TR/PNG/
- Portable Network Graphics (PNG) Recommendation http://www.w3.org/TR/PNG/
- The XPM Format and Library http://koala.ilog.fr/lehors/xpm.html
- Adobe Inc., "TIFF Specifications" http://partners.adobe.com/public/developer/tiff/index.html
- The JPEG Homepage http://www.jpeg.org/jpeg/index.html
- JPEG JFIF http://www.w3.org/Graphics/JPEG/
- JPEG image compression FAQ, part 1 http://www.faqs.org/faqs/jpeg-faq/part1/
- JPEG JFIF http://www.w3.org/Graphics/JPEG/
- Wikipedia (the free-content encyclopedia) entry on "Vector graphics" http://en.wikipedia.org/wiki/Vector_graphics
- Scalable Vector Graphics (SVG) http://www.w3.org/Graphics/SVG/
- Wikipedia (the free-content encyclopedia) entry on "WAV" http://en.wikipedia.org/wiki/WAV
- The FLAC Project Page http://flac.sourceforge.net
- The Xiph.Org Foundation http://www.xiph.org
- Ogg Vorbis General FAQ http://www.vorbis.com/faq.psp
- Vorbis Wiki at Xiph.org http://wiki.xiph.org/index.php/Vorbis
- Microsoft Developer Network, "AVI RIFF File Reference" http://msdn.microsoft.com/archive/default.asp?url=/archive/en-us/dx81_c/directx_cpp/htm/avirifffilereference.asp
- Overview of the MPEG-4 Standard http://www.chiariglione.org/mpeg/standards/mpeg-4/mpeg-4.htm
- The Ogg Encapsulation Format Version 0 http://www.faqs.org/rfcs/rfc3533.html
- Xiph.org Wiki, "Projects/Formats" http://wiki.xiph.org/index.php/Main_Page
- Wikipedia (the free-content encyclopedia) entry on "H.264/MPEG-4 AVC" http://en.wikipedia.org/wiki/H.264
- Wikipedia (the free-content encyclopedia) entry on "WMV" http://en.wikipedia.org/wiki/WMV