24-08-2012, 03:52 PM
Embedded Zerotree Wavelet Encoding
Embedded Zerotree.pdf (Size: 89.14 KB / Downloads: 30)
Introduction
In this tutorial I will try to explain the implementation of an Embedded Zerotree Wavelet encoder or EZW encoder,
which was presented in a paper by J. Shapiro [Sha93]. The reason for this tutorial is that I have never come across a
good explanation of this technique, yet many researchers claim that they have implemented it. Of course there is
Shapiro’s original paper, but when reading it carefully many details are not immediately clear or even missing. Since
I think that the approach of EZW encoding is a fruitful one (see for instance the work described in [Cre97]) I have
decided to present the details here. This might save others some work in the future.
EZW encoding
When searching through wavelet literature for image compression schemes it is almost impossible not to note
Shapiro’s Embedded Zerotree Wavelet encoder or EZW encoder for short [Sha93]. An EZW encoder is an encoder
specially designed to use with wavelet transforms, which explains why it has the word wavelet in its name. The EZW
encoder was originally designed to operate on images (2D-signals) but it can also be used on other dimensional
signals.
The EZW encoder is based on progressive encoding to compress an image into a bit stream with increasing accuracy.
This means that when more bits are added to the stream, the decoded image will contain more detail, a property
similar to JPEG encoded images. It is also similar to the representation of a number like p. Every digit we add
increases the accuracy of the number, but we can stop at any accuracy we like. Progressive encoding is also known
as embedded encoding, which explains the E in EZW.
How does it work?
Now that we have all the terms defined we can start compressing. Lets begin with the encoding of the coefficients in
decreasing order.
A very direct approach is to simply transmit the values of the coefficients in decreasing order, but this is not very
efficient. This way a lot of bits are spend on the coefficient values and we do not use the fact that we know that the
coefficients are in decreasing order. A better approach is to use a threshold and only signal to the decoder if the
values are larger or smaller than the threshold. If we also transmit the threshold to the decoder, it can reconstruct
already quite a lot. To arrive at a perfect reconstruction we repeat the process after lowering the threshold, until the
threshold has become smaller than the smallest coefficient we wanted to transmit. We can make this process much
more efficient by subtracting the threshold from the values that were larger than the threshold. This results in a bit
stream with increasing accuracy and which can be perfectly reconstructed by the decoder. If we use a predetermined
sequence of thresholds then we do not have to transmit them to the decoder and thus save us some bandwidth. If the
predetermined sequence is a sequence of powers of two it is called bitplane coding since the thresholds in this case
correspond to the bits in the binary representation of the coefficients. EZW encoding as described in [Sha93] uses
this type of coefficient value encoding.
One important thing is however still missing: the transmission of the coefficient positions. Indeed, without this
information the decoder will not be able to reconstruct the encoded signal (although it can perfectly reconstruct the
transmitted bit stream). It is in the encoding of the positions where the efficient encoders are separated from the
inefficient ones. As mentioned before, EZW encoding uses a predefined scan order to encode the position of the
wavelet coefficients (see figure 2). Through the use of zerotrees many positions are encoded implicitly. Several scan
orders are possible (see figure 3), as long as the lower subbands are completely scanned before going on to the higher
subbands. In [Sha93] a raster scan order is used, while in [Alg95] some other scan orders are mentioned. The scan
order seems to be of some influence of the final compression result.
The algorithm
The EZW output stream will have to start with some information to synchronize the decoder. The minimum
information required by the decoder is the number of wavelet transform levels used and the initial threshold, if we
assume that always the same wavelet transform will be used. Additionally we can send the image dimensions and the
image mean. Sending the image mean is useful if we remove it from the image before coding. After imperfect
reconstruction the decoder can then replace the imperfect mean by the original mean. This can increases the PSNR
significantly.