29-10-2012, 02:29 PM
MMSE BASED NOISE PSD TRACKING WITH LOWCOMPLEXITY
MMSE BASED NOISE.pdf (Size: 337.97 KB / Downloads: 38)
ABSTRACT
Most speech enhancement algorithms heavily depend on the noise
power spectral density (PSD). Because this quantity is unknown in
practice, estimation from the noisy data is necessary.
We present a low complexity method for noise PSD estimation.
The algorithm is based on a minimum mean-squared error estimator
of the noise magnitude-squared DFT coefficients. Compared to
minimum statistics based noise tracking, segmental SNR and PESQ
are improved for non-stationary noise sources with 1 dB and 0.25
MOS points, respectively. Compared to recently published algorithms,
similar good noise tracking performance is obtained, but at a
computational complexity that is in the order of a factor 40 lower.
INTRODUCTION
An often used strategy to increase listening comfort, pleasantness
and robustness of speech communication systems is to apply
single-channel noise reduction. Often, these algorithms estimate the
clean signal by applying a discrete Fourier transformation (DFT)
to the noisy signal on a frame-by-frame basis and then estimate the
noise-free DFT coefficients by applying Bayesian estimators, e.g.,
[1][2][3]. These algorithms depend on the noise power spectral
density (PSD), which is in general unknown and must be estimated.
For rather stationary noise sources, the noise PSD can be estimated
using explicit voice activity detection [4], or through more
advanced methods based on minimum statistics [5] (MS). However,
these methods are less suitable when the noise is fast varying and
speech is continuously present at a certain frequency.
To track the PSD of non-stationary noise sources, more advanced
methods could be used, e.g., the classified codebook [6]
(CC) or the DFT-subspace (DFT-SS) approach [7]. However, the
CC approach [6] works best for noise-types for which the algorithm
is trained, and both methods might be too complex for applications
with very low-complexity constraints like mobile phones, hearing
aids, etc. Therefore, in [8], a high-resolution DFT (HR-DFT) approach
was presented with similar performance as the method in [7],
but with reduced computational complexity.
EVALUATION
The proposed algorithm is compared to five reference methods,
namely, MS [5], DFT-SS [7], HR-DFT [8], WNE [9] and the method
by Yu [10]. Evaluations are performed using a data-base of more
than 7 minutes of Danish speech spoken by 9 female and 8 male
speakers. The signals were degraded by noise sources at input SNRs
of 0, 5, 10, and 15 dB. The noise sources are circle saw noise, passing
train noise, passing car noise, and white noise modulated by the
function.
Performance Evaluation
Fig. 3 shows an example of noise PSD tracking for a frequency bin
centered at 900 Hz. In this example we compare noise PSD estimation
using MS, HR-DFT and WNE, for a female speech signal
degraded by modulated white noise at an overall SNR of 5 dB. Together
with the estimated noise PSDs we also show the ideal noise
PSD σ2W
(k, i) obtained using Eq. (10). Fig. 3(a) shows the noisy
signal. Fig. 3(b) shows the noise PSD estimated by the proposed
method, MS and the true noise PSD and subplot Fig. 3© shows
the noise PSD estimated by the HR-DFT approach, WNE and the
true noise PSD. It is clear that WNE underestimates the noise PSD,
independent of how slow the true noise PSD changes. For a rather
slowly changing noise PSD, e.g., in the time-span from 0 - 7 seconds,
the proposed method, MS and the HR-DFT approach lead to similar
estimates of the noise PSD. When the noise level shows faster variations,
MS is not able to follow this. The proposed and the HR-DFT
method are still able to track these fast changes rather accurately.
Complexity
The computational complexity in terms of processing time of matlab
implementations of all six algorithms is given in Table 1. Notice that
the numbers given in Table 1 are rough estimates that are meant as an
indication. In general they depend on implementational details. For
the proposed method, the numbers in Table 1 reflect all processing
steps outlined in Sec. 4 including the safety-net adopted from [13].
The evaluation in the preceding section reveals that the performance
of the proposed approach is similar to the previously presented
DFT-SS and HR-DFT method. However, as we see from Table
1, computational complexity of the proposed approach is much
lower compared to the DFT-SS approach and also somewhat lower
than that of the HR-DFT approach. The lower computational complexity
is mainly determined by the fact that no additional spectral
transforms are needed as is the case for the HR-DFT and DFT-SS
approach. From Table 1 we see that compared to MS, the proposed
approach has a complexity that is about a factor 7 lower. Compared
to WNE, the computational complexity is in the same order
of magnitude, while both MS and WNE have a worse noise tracking
performance than the proposed approach. Compared to the method
from [10], performance is improved, while computational complexity
is slightly higher. This slightly higher computational complexity
is mainly due to the safety-net that is proposed in Sec. 4.
CONCLUSIONS
We proposed a low-complexity MMSE estimator of the noise power
spectral density. In comparison to reference methods like minimum
statistics and weighted noise estimation, both noise tracking and
speech enhancement performance is improved. Compared to previously
presented DFT-subspace and high resolution DFT based noise
PSD estimation, the proposed method has similar performance, but
achieves this at a much lower computational complexity.