Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: HIERARCHCIAL GAUSSIAN MIXTURE MODEL FOR SPEAKER VERIFICATION
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
HIERARCHCIAL GAUSSIAN MIXTURE MODEL FOR SPEAKER VERIFICATION

[attachment=36704]

ABSTRACT

A novel type of Gaussian mixture model for text-independent
speaker verification, Hierarchical Gaussian Mixture Model
(HGMM) is proposed in this paper. HGMM aims at maximizing
the efficiency of MAP training on the Universal Background
Model (UBM). Based on the hierarchical structure, the
parameters of one Gaussian component can also be adapted by
the observation vectors of neighboring Gaussian components.
HGMM can also be considered as a generalized GMM which
replaces the Gaussian component in the GMM models with a
local GMM. The hierarchical Gaussian mixture description of
the local observation space is better than one single Gaussian
distribution. Experiment on NIST 99 Evaluation corpus shows
that the HGMM achieves an 18% relative reduction in EER
compared with the conventional GMM.

INTRODUCTION

Security and authentication in speech-driven telephony
applications can be achieved effectively through speaker
verification (SV). Among the most successful approaches to
robust text-independent speaker verification is the Gaussian
Mixture Model (GMM) employed in many state-of-the-art
systems [1, 2, 3]. For example, the system employing Bayesian
adaptation of speaker models from a Universal Background
Model (UBM) and handset-based score normalization
(HNORM) has been the basis of the top performing systems in
the past several NIST Speaker Recognition Evaluations [3].
In current UBM, there are a large number of Gaussian
components (such as 2048 mixtures), and using a large number
of mixtures requires more training and adaptation data. But in
speaker adaptation scenario, the amount of adaptation data is
always limited. So increasing the adaptation efficiency is a key
problem for the performance of UBM+MAP[3] system. In
speaker adaptation of speech recognition system, a different
type of augmented MAP approach to tackle the speed problem
is termed structural MAP(SMAP)[5]. The Gaussians in the
system are organized in a tree structure.



CONCLUSION AND FUTURE WORK
This paper introduces hierarchical Gaussian Mixture Model and
compared the new model with GMM for a text-independent
speaker verification task. The 1999 NIST Speaker Recognition
Evaluation corpus is used to conduct the experiments. It is
shown in the experiments that the performance of HGMM was
better than that of GMM. This hierarchical structure can cluster
a larger number of mixtures into fewer general clusters. In our
two layers implementation, the cluster number equals the
number of nodes in the first hierarchy. Then the adaptation
process is conducted within each cluster, so the training
efficiency is better than conventional GMM.