Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: AN AUDITORY-BASED TRANSFROM FOR AUDIO SIGNAL PROCESSING
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
AN AUDITORY-BASED TRANSFROM FOR AUDIO SIGNAL PROCESSING


[attachment=65609]



ABSTRACT


An auditory-based transform is presented in this paper. Through an
analysis process, the transform coverts time-domain signals into a
set of filter bank output. The frequency responses and distributions
of the filter bank are similar to those in the basilar membrane of the
cochlea. Signal processing can be conducted in the decomposed
signal domain. Through a synthesis process, the decomposed signals
can be synthesized back to the original signal through a simple
computation. Also, fast algorithms for discrete-time signals are
presented for both the forward and inverse transforms. The transform
has been approved in theory and validated in experiments.
An example on noise reduction application is presented. The proposed
transform is robust to background and computational noises
and is free from pitch harmonics. The derived fast algorithm can
also be used to compute continuous wavelet transform.


INTRODUCTION



The Fourier transform (FT) is the most popularly used transform
to convert signals from the time domain to frequency domain;
however, it has fixed time-frequency resolution and the frequency
distribution is restricted to be linear. These limitations generate
problems in audio and speech processing such as the pitch harmonics,
computational noise, and sensitivity to background noise.
On the other hand, the wavelet transform (WT) provides flexible
time-frequency resolution, but also has notable problems. First,
no existing wavelet is capable of mimicking the impulse responses
of the basilar membrane closely, so it cannot be directly used to
model the cochlea or carry out related computation. Additionally,
even though forward and inverse continuous wavelets transforms
are defined for continuous variables, to the best of our knowledge,
there is no numerical computational formula for real inverse continuous
wavelet transforms (ICWT). No such function exists even
in a commercial wavelet package. Discrete wavelet transform has
been applied in speech processing, but the frequency distribution
is limited to the dyadic scale which is different from the scale in
the cochlea.
Motivated by the fact that the human auditory system outperforms
current machine-based systems for acoustic signal processing,
we developed an auditory-based transform to facilitate our future
research in developing high performance systems. The traveling
waves of the basilar membrane in the cochlea and its impose
response have been measured and reported in the literature, such as


DEFINITION OF THE PROPOSED TRANSFORM

When sound enters the human ear, acoustic energy from the outer
ear is converted to mechanical energy via the middle ear which
consists of three small bones. When the last bone in the middle
ear, the stapes, moves, it sets the fluid inside the cochlea in motion
creating traveling waves on the basilar membrane.the impulse
response of the basilar membrane (BM) in the cochlea can be represented
by function ? ? ?. The function satisfies the
following conditions


THE INVERSE TRANSFORM


Just as the Fourier transform requires an inverse transform a similar
inverse transform is also needed for the proposed transform.
2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 18-21, 2009, New Paltz, NY
The need arises when the processed frequency decomposed signals
need to be converted back real signals, such as speech and
music synthesis and noise reduction; and second, to prove that no
information is lost through the proposed forward transforms. This
is necessary to a transform.


EXPERIMENTS AND DISCUSSIONS

Transform Validation: In addition to the theoretical proof, we also
validated the proposed transform via real audio data. One of our
experiments is to use the speech waveform of a males voice saying
the words: two, zero, five, as shown in Fig. 3 (A) as the original
data. In the forward transform using (17), we used frequencies
between 80 Hz to 5KHz.    and   
to decompose the
original data into multiple frequency bands in the Bark scale. In
the inverse transform using (18), we synthesized the multiple band
output back to speech. When plotted both waveforms before and
after the transform, we cannot visualize any difference. We then
use the correlation coefficients ?
? as the measurement to compute


CONCLUSIONS


The concept of the proposed transform is to mimic the impulse responses
of the basilar membrane and its nonlinear frequency distribution
characteristics. As the results show, the proposed transform
has significant advantages in its noise robustness and its freedom
from harmonic distortion and computational noise. These advantages
can lead to many new applications, such as robust features
for speech and speaker recognition, new algorithms for noise reduction
and denosing, speech and music synthesis, audio coding,
hearing aid, audio signal processing, etc. In summary, the proposed
transform is a new time-frequency transform for audio and
speech processing and many other applications