H.261

19.5 ITU-T Recommendation H.261

The ITU-T H.261 standard is recognized as the earliest video coding standard based on Discrete Cosine Transform (DCT) technology. This innovative algorithm was primarily designed for low bit-rate video communications and supports two prevalent formats: Common Intermediate Format (CIF) and Quarter Common Intermediate Format (QCIF). The architecture of the H.261 video encoder is represented in Figure 19.10, illustrating its systematic approach to video compression.

Basic Mechanism

The core mechanism of H.261 involves segmenting an input image into 8 × 8 pixel blocks. Each block undergoes differencing with a prediction derived from a corresponding block in the prior frame. In scenarios where no suitable prior frame exists, or when the preceding frame is significantly different, the prediction may default to zero. The transformation that follows this differencing utilizes DCT, which converts spatial signal data into frequency components. These transform coefficients are then quantized, and the quantization labels are encoded using a variable-length coding scheme, with the goal of efficiently transmitting the resulting data.

19.5.1 Motion Compensation

Motion compensation is crucial in reducing temporal redundancy and requires considerable computational effort. When attempting to find a matching block for the reference 8 × 8 block, the process involves computing 64 differences for each comparison. Subsequent to this, the absolute values of these differences are summed to establish similarity. Assuming that the closest matching block is located within a 20-pixel vicinity (both horizontally and vertically), approximately 1681 comparisons arise.

Moreover, to optimize computation, strategies such as increasing block size can be employed. While larger blocks reduce the total number of blocks (thus lowering the frequency of motion compensation), they raise the risk that a single block will encompass objects with disparate motion directions, hampering effective matching and compression performance. This balance between block size and movement diversity must be carefully handled to ensure efficient encoding.

H.261 organizes these 8 × 8 pixel blocks into macroblocks, consisting of four luminance blocks and two chrominance blocks. Motion-compensated prediction occurs at the macroblock level, where only luminance blocks are utilized for standard matching processes—the chrominance motion vectors are derived by scaling luminance vectors by half.

19.5.2 The Loop Filter

Sharp edges within prediction blocks can lead to pronounced changes in prediction error, resulting in high-frequency coefficients which increase bandwidth usage. To mitigate this, the H.261 algorithm implements a two-dimensional spatial filter to smooth predictions prior to differencing. This filter incorporates coefficients of 1/4, 1/2, and 1/4 with special considerations at block boundaries to maintain edge fidelity.

19.5.3 The Transform

The DCT transform operates on 8 × 8 pixel blocks or the respective pixel differences, depending on the match quality achieved through motion compensation. If motion compensation yields a poor match, direct starting data from the image is used. The transformation allows the encoding of either the block or the difference, with the receiver performing the inverse operations to reconstruct the original frame, known as the reconstruction process.

19.5.4 Quantization and Coding

Quantization in H.261 is adaptable, utilizing 32 distinct quantizers that can switch dynamically, especially as motion and activity vary over sequences. One quantizer is set aside for the intra DC coefficient, with the other 31 tailored for remaining coefficients. The algorithm employs a midrise quantizer for the intra DC coefficient and midtread quantizers for others, facilitating efficient coding for different video content characteristics.

19.5.5 Rate Control

The binary output of the transform coder is funneled into a transmission buffer, which comprehensively manages the encoder's output rate. By monitoring the fill levels of the buffer, it can dictate adjustments to the quantization output. If there is a surplus of data, the buffer requests a decrease in the rate from the quantizer; conversely, if the buffer nears depletion, a request for increased output is issued. This feedback loop is vital to maintaining smooth video transmission and optimizing resource use in