06-03-2013, 09:31 AM
A Zerotree Wavelet Video Coder
A Zerotree Wavelet.pdf (Size: 452.72 KB / Downloads: 23)
Abstract
This paper describes a hybrid motion-compensated
wavelet transform coder designed for encoding video at very low
bit rates. The coder and its components have been submitted to
MPEG-4 to support the functionalities of compression efficiency
and scalability. Novel features of this coder are the use of
overlapping block motion compensation in combination with a
discrete wavelet transform followed by adaptive quantization and
zerotree entropy coding, plus rate control. The coder outperforms
the VM of MPEG-4 for coding of I-frames and matches the
performance of the VM for P-frames while providing a path to
spatial scalability, object scalability, and bitstream scalability.
INTRODUCTION
VERY low bit-rate video coding has received considerable
attention lately in academia and industry in terms of
both coding algorithms and standards activities. The recently
adopted ITU-T Recommendation H.263 provides a solution
for very low bit-rate video telephony [1]. Currently, MPEG-4
is working toward a standard for coding video in a way that
provides functionalities such as content-based access, contentbased
manipulation, content-based editing, combined natural
and synthetic data coding, robustness, content-based scalability,
as well as improved coding efficiency [2]. In this paper,
we present a coder developed at the David Sarnoff Research
Center intended to address the needs of MPEG-4, particularly
those in the areas of improved coding efficiency at very low
bit rates and content-based scalability [3]. The key element
of Sarnoff’s MPEG-4 coder is a new, efficient method for
encoding wavelet coefficients called zerotree entropy (ZTE)
coding.
GENERAL OVERVIEW AND FEATURES OF THE ALGORITHM
This section gives an overview of the proposed encoder.
The system block diagram is shown in Fig. 1.
Frames of video are encoded either as intra or inter, where
intra is used for the first frame and inter is used for the
remaining frames of the test sequences. The first intra frame
can be coded using either the EZW algorithm [9] or the ZTE
coding algorithm [8] newly developed for this codec. EZW is
recognized as one of the best ways to encode still images at a
target bit rate. The embedded feature of the algorithm makes
it possible to encode the first frame at exactly the chosen rate.
TECHNICAL DETAILS OF THE ALGORITHM
Motion Estimation and Compensation
1) Block Motion Estimation: The proposed coder uses the
block motion estimation technique of H.263 [1]. Motion
estimation is performed on the luminance 16 16 and 8
8 blocks. The distortion measure is the sum of absolute
difference (SAD). The full pel motion estimation is done using
the previous original frame. A full search is used and the
search area is up to 15 pixels in all four directions from the
center of the macroblock. The SAD for the zero translation
vector for the 16 16 block is reduced by a bias, set to
100 by default, to favor the zero motion vectors. The SAD
is calculated for each 16 16 macroblock and its four 8
8 blocks.
The motion estimation algorithm also has intra/inter mode
decision. For each macroblock, its mean value is also calculated.
The SAD between the macroblock and its mean value
is also calculated (call it MSAD). If this value (MSAD) is
smaller than SAD calculated by motion estimation by a set
margin, 500 by default, the intra mode is chosen and no motion
vectors are sent. In this case, the block is predicted by its mean
and the mean is sent as overhead. Otherwise, the inter mode is
chosen and the macroblock is estimated using motion vectors
as described in the previous paragraph.
Discrete Wavelet Transform
A two-dimensional DWT is at the core of the proposed
coder. The wavelet transform performs decomposition of video
frames or motion-compensated residuals into a multiresolution
subband representation. The DWT has been made extremely
flexible by allowing explicit specification of parameters such
as the number of decomposition levels, the filter coefficients to
use, what filters to use at each level of the decomposition, and
the filter-bank/wavelet-packet structure for the decomposition.
Quantization and Zerotree Coding
1) Embedded Zerotree Wavelet Coding: EZW coding is a
proven technique for coding wavelet transform coefficients.
Besides superior compression performance, the advantages of
EZW coding include simplicity, an embedded bitstream, scalability,
and precise bit-rate control. These features enable EZW
coding to address the MPEG-4 functionalities of improved
coding efficiency, scalability, and error robustness, as well as
providing other useful functionalities.
EZW coding is based on three key ideas: 1) exploiting the
self-similarity inherent in the wavelet transform to predict the
location of significant information across scales; 2) successive
approximation quantization of the wavelet coefficients; and 3)
universal lossless data compression using adaptive arithmetic
coding. We give here a brief description of the EZW coding
algorithm. Reference [9] describes the algorithm in further
detail.
Rate Control
The proposed coder includes an optional advanced rate
control scheme. The rate control scheme is implemented in
both the picture and wavelet tree level where a second-order
rate-distortion model is used for bit allocation. Based on a
linear regression (LR) analysis, a new formula is derived to
yield a smoother bit rate for each individual frame.
The rate control mechanism is performed in two stages:
determination of the number of bits to be spent for each
frame followed by the specific allocation of those bits within
the frame. The target bit rate for each frame is calculated
before the encoding of each frame. Assuming that each frame
of the same prediction type has strong correlation in coding
complexity, a target bit rate for the current frame is set as an
average of the bits used in the previous frame and the available
bits per frame. The weighting factor is the coefficient of the
first-order autoregressive (AR) model.
EXPERIMENTAL RESULTS
The proposed coder has been run to encode I and P frames
using block motion estimation of sizes 16 16 or 8 8,
overlapping block motion compensation, the discrete wavelet
transform implementing the Daubechies’ 9-3 tap filter bank
at the first level of the decomposition of luminance followed
by the 2-tap Haar filter for the remaining three levels of the
decomposition of luminance, the discrete wavelet transform
implementing the 2-tap Haar filter for all three levels of the
decomposition of chrominance, rate control to set the quant
for each frame, either EZW or ZTE coding for the first frame
coded intra, and ZTE coding for all P predicted frames.
CONCLUSION
This paper has presented a coder that represents a promising
solution to several requirements of the MPEG-4 standardization
effort. The major components of the coder are block
motion estimation, overlapping block motion compensation, an
adaptive discrete wavelet transform, the use of zerotrees and
an adaptive arithmetic coder for encoding quantized wavelet
coefficients, plus rate control.
The coder performs well, particularly on I-frames, and is
directly extensible to provide the scalability functionalities
sought by MPEG-4. Its components were submitted as tools
to MPEG-4 in November 1995. The complete coder was
submitted as an algorithm in January 1996 and was very well
received. The components were incorporated into the core
experiments as part of the MPEG-4 testing and standardization
process.