Inside DVD-Video/MPEG Format
MPEG (Motion Picture Experts Group) is the name of a family of format specifications for digital video. DVD-Video is primarily based around MPEG-2, though it also allows MPEG-1 video. This book will not go into all the details of MPEG; it will cover just enough to make sense of how it is used in DVD-Video.
Information in an MPEG file is divided up into streams: the video images form one stream, while the audio soundtrack is kept in another stream. DVD-Video allows for multiple audio streams, which can be used for example for soundtracks in different languages (only one audio stream may be playing at one time). Each stream is identified by a number, of which different ranges are allotted to different stream types. MPEG also allows for two private streams, the format of which is not further defined in MPEG; DVD-Video uses these for various purposes:
- additional audio formats not defined by MPEG (AC-3, DTS, LPCM) (private stream 1)
- subpictures (private stream 1)
- menu buttons, time-display information (Presentation Control Information (PCI), private stream 2)
- navigation information for trick-play and multi-angle modes (Data Search Information (DSI), private stream 2)
DVD-Video discs come in two main flavours, corresponding to the main formats used around the world for analog video: NTSC and PAL. NTSC was the first colour TV system, developed in the USA, and used in North America, Japan and a few other places. PAL was a later German development, used in Europe and much of the rest of the world. (There is also a French-developed broadcast format called SECAM, but the format for recorded media is identical to PAL.)
In the NTSC format, each video frame is 720*480 pixels, displayed at a rate of 29.97 frames per second. In the PAL format, each frame is 720*576 pixels, and the display rate is 25fps. Frames may be displayed at an aspect ratio of 4:3 for narrowscreen footage, or 16:9 for widescreen footage. Note that there is no difference in the numbers of pixels per frame for narrowscreen versus widescreen; the image is simply stretched out for widescreen (this is called anamorphic widescreen).
The allowed resolutions in DVD-Video are
- for NTSC: 720*480, 704*480, 352*480 or 352*240
- for PAL: 720*576, 704*576, 352*576 or 352*288
Unfortunately, DVD-Video has to carry over the interlacing feature from broadcast TV. That means each video frame is split into two fields, one consisting of the odd-numbered scan lines, the other containing the even-numbered ones, which are displayed one after the other.
Packets and Headers
The contents of each stream are divided into packets, which are multiplexed—a packet for one stream is followed by that for another stream which is to be presented at close to the same time—to allow a player to read and decode the file sequentially. You will come across the term Packetized Elementary Stream (PES) for this representation. In particular, each packet begins with a header containing an identifying code indicating the type of packet, followed by a two-byte field indicating the length of the packet contents.
There are also additional “headers”, with different identifying codes, used to specify various additional information:
- a system header gives information about the number of streams in the movie file. There needs to be at least one of these, at the start of the file.
- a PACK header gives information about the data rate needed to decode the movie file, as well as giving a high-precision clock reference (in units of a 27MHz clock). The presence of one of these indicates the start of a “PACK”, which basically consists of the header plus all following PES packets until the next PACK header. DVD-Video requires that each pack be 2048 bytes in size.
GOP, I-Frame, B-Frame, P-Frame
Most video codecs rely heavily on interframe as well as intraframe compression to reduce data sizes. An I-Frame is a frame of video compressed by itself, without looking at other frames. The encoding scheme used is similar to JPEG compression. However, subsequent frames are quite likely to look similar (think of the common case of something or someone moving against a still background); therefore, instead of compressing them on their own as additional I-Frames, it makes sense to encode them as P-Frames which are differences from the preceding reference frame (which can be an I-Frame or a P-Frame) or as B-Frames which are differences from both preceding and following I- or P-frames.
The drawback with this is, if you try to start playback from some arbitrary point that is not at the beginning of the file, the player has to seek backwards until it hits an I-frame before it can start sensibly decoding the video. Thus, using fewer I-frames improves compression, at the expense of quick random access into the video stream. The DVD-Video specification requires at least one I-frame every 36 fields for NTSC or every 30 fields for PAL (i.e. at least every 0.6 seconds).
The sequence of frames starting from an I-frame until the last frame before the next I-frame (in other words, containing all the frames depending in some way on the starting I-frame) is called a Group of Pictures (GOP).
VOBU, Cell, Program, PGC
As previously mentioned, DVD-Video expects to see a PACK header every 2048 bytes. The contents of the first pack must be a PCI and a DSI packet; this is called a “NAV PACK”. Following this will be other PACKs containing video, audio or subpicture stream packets as appropriate, in no particular order, but the NAV information must be first, and the video packets should contain one or more complete GOPs. The NAV and following packs (up to the next NAV PACK) make up a Video Object Unit (VOBU), and is the smallest unit that the decoder can work with. The duration of a VOBU must be from 0.4 to 1.0 seconds.
DVD-Video defines additional levels of grouping beyond this; one or more VOBUs make up a cell; one or more cells make up a program, and one or more programs make up a program chain (PGC). Particular programs can each be identified in the file structure as a Part of Title (PTT), otherwise known to ordinary people as a chapter. An actual title on the disc can correspond to a single PGC, or sometimes to multiple PGCs.
Additional significance of the groupings is as follows:
- a cell is the smallest unit that can be targeted by a jump.
- a cell can have a single VM instruction attached to be executed when the cell finishes playing.
- skipping with the Next/Prev buttons on the player remote is done in units of programs, not necessarily chapters.
- only PTTs (chapters) can be the target of a jump from outside the containing PGC; jumps to cells and (non-chapter) programs can only happen from within the containing PGC.
- a PGC can have a sequence of VM commands to be executed prior to playing, and another to be executed at the end. The PGC also defines the 16-entry colour table from which subpictures within the PGC can choose 4 at a time to display.
There is a special “first-play” PGC (FPC) in the VMG which is automatically entered when the disc is put into the player; this has no MPEG data, but it is the commands here that are responsible for jumping to the initial menu (if any), playing all the preamble sequences etc.
Menus vs Titles
Menus and titles are very similar; both are PGCs, and both can have interactive buttons. However, only titles can have chapters, and only titles can be split across multiple VOB files (which imposes a maximum length on the duration of a menu). Also, only titles can have multiple audio and subpicture tracks for different languages, because a menu is already part of a language-specific menu group.
Certain menus can be marked as entries that can be directly invoked via dedicated buttons on the player remote.
In the VMG, one menu can be marked as the title entry, which means it can be brought up by the “Top Menu” button on the remote.
In a titleset, one menu each can be marked as root, PTT (chapter), audio, subtitle and angle; the root entry is brought up by the “Menu” button, the others by correspondingly-titled buttons on the remote.