08-05-2013, 03:23 PM
Estimators of the Magnitude-Squared Spectrum and Methods for Incorporating SNR Uncertainty
Estimators of the Magnitude.pdf (Size: 1.85 MB / Downloads: 44)
Abstract
Statistical estimators of the magnitude-squared
spectrum are derived based on the assumption that the magnitude-
squared spectrum of the noisy speech signal can be computed
as the sum of the (clean) signal and noise magnitude-squared
spectra. Maximum a posterior (MAP) and minimum mean square
error (MMSE) estimators are derived based on a Gaussian statistical
model. The gain function of the MAP estimator was found
to be identical to the gain function used in the ideal binary mask
(IdBM) that is widely used in computational auditory scene analysis
(CASA). As such, it was binary and assumed the value of 1 if
the local signal-to-noise ratio (SNR) exceeded 0 dB, and assumed
the value of 0 otherwise. By modeling the local instantaneous SNR
as an F-distributed random variable, soft masking methods were
derived incorporating SNR uncertainty. The soft masking method,
in particular, which weighted the noisy magnitude-squared spectrum
by the a priori probability that the local SNR exceeds 0 dB
was shown to be identical to the Wiener gain function. Results
indicated that the proposed estimators yielded significantly better
speech quality than the conventional minimum mean square error
spectral power estimators, in terms of yielding lower residual
noise and lower speech distortion.
INTRODUCTION
ANUMBER of estimators of the signal magnitude spectrum
have been proposed for speech enhancement (see
review in [1, Ch. 7]). The minimum mean square error (MMSE)
estimators [2], [3] of the magnitude spectrum, in particular, have
been found to perform consistently well, in terms of speech
quality, in a number of noisy conditions [4]. Several MMSE
estimators of the power spectrum [5]–[7] or more general the
th-power magnitude spectrum [8] have also been proposed. In
some applications such as speech coding [6], where the autocorrelation
coefficients might be needed, the optimal power-spectrum
estimator might be more useful than the magnitude estimator.
Maximum a Posterior (MAP) Estimator
The a posterior probability density (14) function is monotonic,
and when (expressed in dB) changes its sign, the density
changes its direction (increasing versus decreasing). This
simplifies the maximization a great deal.
Soft Masking by Incorporating a Priori SNR Uncertainty
Assuming independence between the clean speech and noise
magnitude-squared spectra, we can easily use (12) and (13) to
model the hypothesis probability given the a priori SNR . As
we do not use any other constraint or assumption, we refer to
this hypothesis probability as the a priori SNR uncertainty.
Soft Masking Based on Posteriori SNR Uncertainty
Clearly the above SMPR estimator did not incorporate information
about the noisy observations, as it relied solely on
a priori information about the instantaneous SNR . It is reasonable
to expect that a better estimator could be developed by
incorporating posteriori information about the SNR at each frequency
bin. In this case, we incorporate the assumption given in
(11) to compute the hypothesis probability, which is referred to
as a posteriori SNR uncertainty.
CONCLUSION
Statistical estimators of the magnitude-squared spectrum
were derived based on the assumption that the magnitude-
squared spectrum of the noisy speech signal can be
computed as the sum of the clean signal and noise magnitude-
squared spectrum. Aside from the two traditional
estimators, based on MAP and MMSE principles, two additional
soft masking methods were derived incorporating
SNR uncertainty. Overall, when compared to the conventional
MMSE spectral power estimators [6], [7]