17-09-2012, 10:54 AM
SPEECH COMPRESSION USING WAVELETS
SPEECH COMPRESSION.pdf (Size: 155.53 KB / Downloads: 95)
ABSTRACT
With the growth of the multimedia and cellular technology
over the past decades, the demand for digital information
increases dramatically. This enormous demand poses
difficulties for the current technology to handle speech
compression. One approach to overcome this problem is to
compress the information by removing the redundancies
present in it. This is a lossy compression scheme that is
often used to compress information such as speech signals.
This paper presents a new lossy algorithm to compress
speech signals using Discrete Wavelet Transform (DWT)
Techniques to solve the limited bandwidth problem facing
the Palestinian cellular company, Jawwal. The
performance of the DWT for speech compression is very
good compared with other techniques such as μ-law
speech coder. Moreover, the compression ratio using
Wavelet can be varied easily, while other techniques have
fixed one.
INTRODUCTION
Speech is a very basic way for humans to convey
information to one another. With a bandwidth of only
4kHz, speech can convey information with the emotion of
a human voice. People want to be able to hear someone’s
voice from anywhere in the world-as if the person would
be in the same room. Speech can be defined such as the
response of the vocal tract to one or more excitation
signals.
Compression of signals is based on removing the
redundancy between neighboring samples and/or between
the adjacent cycles. In data compression, it is desired to
represent data by as small as possible number of
coefficients within an acceptable loss of visual quality.
Compression techniques can be classified into one of two
main categories: lossless and lossy.
FILTER BANK
A filter bank is a set of filters, which split up the
signal’s frequency components into different signals, each
with a subset of frequencies. The combined pass bands of
the filter cover the entire frequency range, so the filters are
complementary. A simple filter bank consists of one low
pass filter and one high pass filter, both having a cut off
frequency at half the frequency bandwidth. Applying this
filter bank to signal results into two new signals, one with
the lower half frequencies and one with the upper half
frequencies. A block diagram of this filter bank is
illustrated in Figure (1).
WAVELET FAMILIES
It is very important to briefly introduce wavelets’ families,
because they are the main tools. For any one who wants to
study wavelet theory must be familiar with these families,
in order to get the best performance of his work.
We can’t say that one type of wavelet is better than the
another, because every type has its own applications. So, a
wavelet family may be good for one application, but not
for another, this depends on the nature of the application.
We will introduce, for each wavelet family, the scaling
function Φ(t) in Eq. (3), the wavelet function Ψ(t) in Eq.
(4) and its filters' values (for a two-channel filter back) in
both of time-domain and frequency-domain
representations.
CHOICE OF WAVELET
The choice of the mother-wavelet function used in
designing high quality speech coders is of prime
importance. Choosing a wavelet that has compact support
in both time and frequency in addition to a significant
number of vanishing moments is essential for an optimum
wavelet speech compressor.
Several different criteria can be used in selecting an
optimal wavelet function. The objective is to minimize
reconstructed error variance and maximize signal to noise
ratio (SNR). In general optimum wavelets can be selected
based on the energy conservation properties in the
approximation part of the wavelet coefficients.
In [8] it was shown that the Battle-Lemarie wavelet
concentrates more than 97.5% of the signal energy in the
approximation part of the coefficients. This is followed
very closely by the Daubechies D20, D12, D10 or D8
wavelets, all concentrating more than 96% of the signal
energy in the Level 1 approximation coefficients.
Wavelets with more vanishing moments provide better
reconstruction quality, as they introduce less distortion
into the processed speech and concentrate more signal
energy in a few neighboring coefficients. However the
computational complexity of the DWT increases with the
number of vanishing moments and hence, for real time
applications it is not practical to use wavelets with an
arbitrarily high number of vanishing moments [6].
WAVELET DECOMPOSITION
Wavelets work by decomposing a signal into different
resolutions or frequency bands, and choosing the wavelet
function and computing the Discrete Wavelet Transform
(DWT) carries out this task [9]. Signal compression is
based on the concept that selecting a small number of
approximation coefficients (at a suitably chosen level) and
some of the detail coefficients can accurately represent
regular signal components.
Choosing a decomposition level for the DWT usually
depends on the type of signal being analyzed or some
other suitable criterion such as entropy. For the processing
of speech signals decomposition up to scale 5 is adequate
[8], with no further advantage gained in processing beyond
scale 5.
ENCODING ZERO-VALUED
COEFFICIENTS
After zeroing wavelet coefficients with negligible
values based on either calculating threshold values or
simply selecting a truncation percentage, the transform
vector needs to be compressed. In this implementation,
consecutive zero valued coefficients are encoded with two
bytes. One byte is used to specify a starting string of zeros
and the second byte keeps track of the number of
successive zeros.
Due to the scarcity of the wavelet representation of the
speech signal, this encoding method leads to a higher
compression ratios than storing the non-zero coefficients
along with their respective positions in the wavelet
transform vector, as suggested in the “Wavelet Speech
Compression Techniques (Section 8.4)”. This encoding
scheme is the primary means of achieving signal
compression.
CONCLUSIONS
Data compression is the technology of representing
information with lowest number of bits (minimum size).
This technology is needed in the field of speech to satisfy
transfer requirements of huge speech signals via
communication companies and Internet, decreasing storage
equipment is another need. The limited number of
channels available to the Palestinian cellular company,
Jawwal, and the high demand for mobile telephone
services, put on a lot of pressure on the company to find a
way to solve this problem. This paper proposed a method
of using wavelet compression to give more room for more
users to access Jawwal networks.
A simple lossy compression algorithm for onedimensional
signals (as speech signal) based on wavelet
transform coding is developed. It compacts as much of
the signal energy into as few coefficients as possible.
These coefficients are preserved and the other
coefficients are discarded with little loss in signal
quality.