Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: Converted Lattice-based Chinese Spontaneous Speech Retrieval
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Abstract—Nowadays rapid and accurate speech retrieval
techniques based on semantics are desired for the
overwhelming amounts of speech data. In this paper we mainly
study a converted lattice-based approach for Chinese
spontaneous speech retrieval. A new confidence measure
method is proposed based on context mutual information. In
our knowledge, it is firstly used in a lattice construction for
speech indexing. The method can take full advantage of the
mutual information between words in order to describe the
language model more exactly. Our experiment results show
that the proposed method in this paper outperforms both
posterior probability based method and N-best based method.
And our best system achieves a FOM of 81.2% on a task of
spontaneous Chinese speech retrieval.
Keywords- converted lattice; mutual information; speech
indexing; confidence mesure; language model; ASR
I. INTRODUCTION
With the rapid development of science and technology of
computer and multi-media, more and more spoken document
data are recorded and stored in our computers. Therefore,
rapid and accurate speech retrieval approaches based on
semantics are required in order to efficiently manage and
utilize theses speech resources. The task of spoken document
retrieval (SDR) can be described like this: according to the
queries given by a user, all the files or pieces including
relevant speech contexts are found and listed from a large
collection of multimedia documents [1]. Speech retrieval
techniques are mainly applied in the fields of broadcast
recording, conference recording, telephone recording, speech
E-mail, audio reading and speech data in audio files, etc.
At present, the indexing strategy based on lattice
undoubtedly a mainstream in speech retrieval research [2, 3].
A lattice is a directed acyclic graph, in which each node
represents a being recognized candidate, while an arc
represents not only a candidate but also the relationship
between these candidates. Lattice may contain more
available information using the structural specialty of which
to preserve useful information such as the unit candidate,
acoustic model score, language model score, boundary time
of candidates and the results of ASR, etc. It can compensate
the ASR errors caused by environment noise and the
mismatching of models, thus it is more available for the task
of speech indexing. In addition lattice structure is logical,
thus we can gain adequate recognized candidate and find the
top score path as result by dynamic programming method,
which can’t be obtained by simply keeping 1-best or N-best
result.
Now most research on lattice-based speech indexing
system mainly focus on the choice of unit or the amelioration
of lattice construction etc. For the choice of unit, [4] found
the best system is with toneless-syllable lattices converted
from word lattices. Reference [5] showed that the use of
multiple units in ASR also had a good performance. Both of
them ignored research on algorithm in Lattice. In our opinion,
more efforts should be made in the respect of optimal
algorithm toward the characteristics of language itself. Here
Chinese is our object to be examined.
A speech retrieval system can be divided into four
sections: off-line indexing construction, on-line query search,
calculation of confidence measure, calculation of
relationship. This paper aims at the improvement on the
stage of confidence measures. We proposed a new method
based on mutual information to calculate confidence
measure, which make full use of mutual information between
words. To Chinese, it should be taken into account. Our
experiment results proved that our proposed approach
outperforms both posterior probability method and the Nbest
method with a higher FOM in the same speech retrieval
task.