The basic structure of the compression algorithm proposed by MPEG is akin to that of ITU-T H.261. In this algorithm, blocks measuring 8 × 8 pixels from an original frame or the difference between a frame and a motion-compensated prediction are transformed using the Discrete Cosine Transform (DCT). These blocks are then organized into macroblocks in a manner similar to the H.261 algorithm, with motion compensation performed at the macroblock level. The transform coefficients undergo quantization before being transmitted to the receiver, facilitated by a buffer that smooths the delivery of bits from the encoder and assists with rate control.
The MPEG-1 compression scheme mirrors the ITU-T H.261 video compression scheme substantially, though significant distinctions exist in their structural details. The primary application areas for the H.261 standard include videophone and videoconferencing, while MPEG, at least initially, focused on applications related to digital storage and retrieval. Although neither algorithm is restricted to its specified applications, understanding their targeted areas can enhance comprehension of their features. In videoconferencing, a call is established, conducted, and terminated sequentially, whereas accessing video from storage might not begin from the first frame, as viewers may wish to start at any arbitrary point in the sequence. This is also true for broadcast situations, where viewers do not always tune in from the beginning. In the case of H.261, frames following the first can contain blocks coded using predictions from the previous frame, necessitating decoding from the first frame to decode any specific frame in the sequence.
A significant advancement of MPEG-1 is its random access capability, achieved by periodically requiring frames that are coded independently of past frames, termed I frames. To minimize delay between a viewer switching on the TV and when a reasonable picture appears, I frames should occur regularly, despite their lower compression rates since they do not utilize temporal correlations. This setup thus presents a trade-off between compression efficiency and convenience regarding the number of frames between consecutive I frames.
The MPEG-1 algorithm incorporates two additional frame types to enhance compression efficiency: predictive coded frames (P frames) and bidirectionally predictive coded frames (B frames). P frames are coded using motion-compensated predictions from the nearest I or P frame, showing a marked improvement in compression efficiency over I frames. I and P frames can be viewed as anchor frames. To counteract the compression loss from frequent I frame usage, MPEG introduced B frames, allowing considerable compression by using motion-compensated predictions from both the most recent and upcoming anchor frames. This dual prediction approach generally results in better compression outcomes than utilizing only past frames, especially valuable in scenarios with significant changes between consecutive frames, such as those found in television advertisements.
B frames are uniquely generated only after the forthcoming anchor frame is rendered and do not contribute to the prediction for any other frame. Consequently, errors in B frames are less detrimental as they do not get propagated through the prediction process. All these different frame types are organized into groups known as Groups of Pictures (GOP), which serve as the smallest random access units in a video sequence. The GOP structure achieves a balance between the high compression efficiency of motion-compensated coding and the rapid access to pictures through periodic intra-only processing. Every GOP contains at least one I frame, which is either the first frame or preceded by B frames relying solely on motion-compensated predictions from the I frame.
The display order, the sequence in which the video is presented, diverges from the processing order, which represents the order of compression. Typically, the first frame is an I frame compressed independently of previous frames, followed by frames that are compressed based on predictions from other frames. For instance, the compression process may compress the fourth frame using motion-compensated predictions from the first frame, followed by frames that rely on both the first and fourth frames.
The MPEG documentation identifies this compression order as the bitstream order; in contrast to the adjacent frame dependency in ITU-T H.261, the MPEG standard allows frames to utilize variable numbers of frames between the encoded frame and its corresponding prediction frame. Consequently, when searching for the best matching block in a neighboring frame, the search region is contingent on the anticipated motion—greater motion necessitates a larger search area than lesser motion. While MPEG does not stipulate a specific motion compensation method, it recommends utilizing a search area that expands with the distance between the encoded frame and the prediction frame. After motion compensation, the block of prediction errors is then transformed using DCT, quantized, and the quantization labels encoded—procedures consistent with the JPEG standard. The quantization tables adjust dynamically during the encoding process, with rate control adjustable at both the sequence level and individual frame levels. At the sequence level, reductions in bit rate primarily affect B frames due to their lack of necessity for encoding other frames, while at the frame level, rate control comprises two steps, similar to the H.