Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: Design and Implementation of Acoustic Noise Suppression for Speech Enhancement
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Design and Implementation of Acoustic Noise Suppression for Speech Enhancement
[attachment=25814]
INTRODUCTION
The speech processing systems used to communicate or store speech are usually
designed for noise-free environments, but in real world, the presence of
background interference in the form of acoustic noise drastically degrades the
performance of these systems. The study of the effect of noise on the spectrum of
speech is very essential in speech processing.
Restoring the desired speech signal from a mixture of speech and background
noise is amongst the oldest, still elusive goals in speech processing and
communication system research. Speech enhancement algorithms attempt to
improve the performance of communication systems when their input or output
signals are corrupted by noise. It is usually difficult to reduce noise without
distorting speech and thus, the performance of speech enhancement systems is
limited by the tradeoff between speech distortion and noise reduction.
In general, the situation wherein the noise and speech are in the same channel
(single channel systems) is the most common scenario and is one of the most
difficult situations to deal with. The complexity and ease of implementation of
any proposed scheme is another important criterion especially since the majority
of the speech enhancement and noise reduction algorithms find applications in
real-time portable systems like cellular phones, hearing aids etc.
Noise reduction is a very challenging and complex problem due to several
reasons. First of all, the nature and the characteristics of the noise signal change
significantly from application to application, and moreover vary in time. It is

ER&DCI-IT, Thiruvananthapuram 2
therefore very difficult to develop a versatile algorithm that works in diversified
environments. Secondly, the objective of a noise reduction system is heavily
dependent on the specific context and application. In some scenarios, for example,
objective is to increase the intelligibility or improve the overall speech perception
quality, while in other scenarios; it may be to improve the accuracy of automatic
speaker recognition system, or simply reduce the listeners’ fatigue. It is very hard
to satisfy all objectives at the same time.
Among the noise reduction algorithms, the spectral subtraction algorithm is best
suited for acoustic noise reduction [1]. The spectral subtraction technique
estimates the power spectrum of clean speech by explicitly subtracting the noise
power spectrum from the noisy speech power spectrum. This method has minimal
complexity and is relatively easy to implement. This approach generally produces
a residual noise commonly called musical noise. The spectral subtraction can be
modified to allow better and more suppression of the noise.
The modified spectral subtraction method includes, finding an over-estimate of
each frequency component of the noise, by multiplying each component by some
over-estimation factor (or coefficient) [2]-[3]. These over-estimates are then
subtracted from the noisy speech components. This minimizes the residual noise,
since all the noise frequency components will snow be significantly attenuated.
After subtraction, some of the signal magnitudes will have the values less than
zero. To avoid further problems due to this, a noise threshold level or spectral
floor is introduced. Any noise spectral components with magnitudes below this
level are brought up to it. The over-estimation technique has its own drawbacks,
in that it is very much a trial and error method, with coefficients and threshold
levels selected on an adhoc basis. Classical algorithms are usually limited to the
use of fixed optimized parameters, which are difficult to choose for all speech and
noise conditions.

ER&DCI-IT, Thiruvananthapuram 3
CHAPTER 2
LITERATURE REVIEW
2.1. Fundamentals of noise reduction techniques
The problem of noise reduction has attracted a considerable amount of research
attention over the past several decades. Since we are living in a natural
environment where noise is inevitable and ubiquitous, speech signals are generally
immersed in acoustic ambient noise and can seldom be recorded in pure form.
Therefore, it is essential for speech processing and communication systems to
apply effective noise reduction or speech enhancement techniques in order to
extract the desired speech signal from its corrupted observations.
However, in some situations, such as mobile communications, only one
microphone is available. In this case, noise reduction techniques need to rely on
assumptions about the speech and noise signals. A common assumption is that the
noise is additive and slowly varying, so that the noise characteristics estimated in
the absence of speech can be used subsequently in the presence of speech. If in
reality this premise does not hold, or only partially holds, the system will either
have less noise reduction, or introduce more speech distortion. Even with the
limitations outlined, single-channel noise reduction has attracted a tremendous
amount of research attention because of its wide range of applications and
relatively low cost.
This chapter presents an overview of different speech enhancement methods, with
greater emphasis on single channel subtractive type algorithms and provides a
review of some of the major aspects and approaches in this category. An
overview of speech characteristics and noise characteristics are presented as the
opening subject.

ER&DCI-IT, Thiruvananthapuram 4
2.2. Speech Characteristics
The information contained in the spoken word is conveyed by the speech signal.
The speech can be classified into voiced and unvoiced, based on the concept of
the energy of a sequence. This is accomplished by dividing a speech signal into
short frames and by computing the average power of each frame. The speech in a
particular frame is then declared to be voiced if its average power exceeds a
threshold level that is chosen by the user. Otherwise it is declared unvoiced. The
voiced waveform is quasi-periodic (where any one period is virtually identical to
its adjacent periods but not necessarily similar to periods much farther away in
time), with slightly over four periods. This corresponds to a fundamental
frequency or pitch frequency. The unvoiced waveform is more noise-like. In
normal speech, the vocal tract changes shape relatively slowly with time as the
tongue and lips perform the gestures of speech. Thus it can be modeled as a slow
time varying filter that imposes its frequency-response properties on the spectrum
of the excitation. Hence the speech waveform is extremely nonstationary [4].
2.3. Noise Characteristics
Noise can be defined as the contamination of the desired signal or the unwanted
signal. The nature of the noise is an important factor in deciding on a speech
enhancement method to be adopted. The acoustic noise emanates from moving,
vibrating, or colliding sources and is the most familiar type of noise present in
various degrees in everyday environments. Acoustic noise is generated by such
sources as moving cars, air-conditioners, computer fans, traffic, people talking in
the background, wind, rain, etc.
Therefore, a good model of noise is important for the performance of speech
enhancement system and vice-versa, if it is important to analyze how well a
speech enhancement algorithm works with different types of noise. Noise can be
characterized based on various statistical, spectral or spatial properties.

ER&DCI-IT, Thiruvananthapuram 5
Table 2.3.Noise properties in comparison with speech
2.3.1 Signal Structure
Similar to the speech signal, the noise is also continuous in time domain. So the
speech along with the noise can be observed as a continuous signal. The noise
present in various environment scenarios like cars, fans, machines, factory
environment, cockpits etc are continuous in nature.
The impulsive type of noise is present for shorter duration and is not continuous.
Example of the impulsive type noise is slamming of door, keyboard typing, foot
step etc.
2.3.2 Type of interaction with speech
There are many sources of acoustic distortion that can degrade the speech. The
most important source of acoustical distortion is the additive noise. Hence it can
be said that the background noise is acoustically added to the speech.
Examples of non-additive noise include noise due to non-linearities of
microphones, speakers and channel distortion (speech on transmission lines).
The convolutive noise, in which convolution corresponds to convolution in time
domain. For instance, changes in speech signal due to changes in room acoustics
or changes in microphones etc.
Property Types
Structure Continuous /Impulsive
Type of interaction Additive /Multiplicative /Convolutive
Temporal behavior Stationary /Non-stationary
Frequency range Broadband /Narrowband
Signal dependency Correlated /Uncorrelated

ER&DCI-IT, Thiruvananthapuram 6
The multiplicative noise is created by the signal distortion due to fading in cellular
channels. These noises are usually harder to deal with, as compared to additive
noise.
2.3.3 Temporal behavior of noise
Although the amplitude of a signal fluctuates with time, the parameters of the
signal may be time-invariant (stationary) or time-varying (non-stationary). A
signal is stationary if the statistical parameters of the signal are time-invariant;
otherwise it is nonstationary. The background noise remains locally stationary as
compared with the speech. For example, if the speech is stationary for 20 ms, then
the noise remains stationary for 100ms.
In general, it is more difficult to deal with non-stationary noise, where there is no
priori knowledge available about the characteristics of noise. Since non-stationary
noise is time varying, the conventional method of estimating the noise from initial
intervals assuming no speech signal is not suitable for estimation.
2.3.4 Frequency range of noise
The broadband noise is present in the whole frequency range of the speech. But
the narrowband noise affects only some frequencies of the speech. The acoustic
noise can be approximated as white noise, which is a broad band signal.
2.3.5 Noise signal dependency with speech signal
The acoustic noise is usually uncorrelated with the signal and is present in various
environment scenarios like cars, fans, machines, factory environment, cockpits
etc. Examples of noise correlated with the signal are reverberations and echoes. In
practice, there are various degrees of dependencies; it may be that one set of the
statistics of a signal is dependent on time whereas other is independent. For
example, a random signal may have independent mean but dependent variance.
ER&DCI-IT, Thiruvananthapuram 7
If the speech signal s[n] is affected by uncorrelated noise n[n], then the observed
signal in time domain can be expressed as
x[n] = s[n] + n[n]