Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: APPLICATIONS OF SPEECH RECOGNITION IN THE AREA OF TELECOMMUNICATI0:NS
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
APPLICATIONS OF SPEECH RECOGNITION IN THE AREA OF TELECOMMUNICATI0N

[attachment=28903]


INTRODUCTION

Speech recognition technologyh as evolved for more than 40 years, spurred on by
advances in signal processing, algorithms, architectures, and hardware. During
that time it has gone from a laboratory curiosity, to an art, and eventually to a full
fledged technology thati s practiced and understood by a widrea nge of‘ engineers,
scientists, linguists, psychologists, and systems designers. Over those 4 decades
the technology of speech recognition has evolved, leading to a steady stream of
increasingly more difficultasks which have been tackled and solved. The
hierarchy of speech recognition problems which have been attacked, and the
resulting application tasks which became viable as a result, includes the following

isolated word recogn.ition.-both speaker trained and speaker independent.
This technology opened up a class of applications called ‘comlmand-andcontrol’
applications in which the system wasc apable of recognizing a single
word command (ftom a small vocabulary of single word commmds), and
appropriately responding to the recognized command. One key problem
with this technology was the sensitivity to background noises (which were
often recognized as spurious spoken words) and extraneous speech which
was inadvertently spoken along with the command word. Various types of
‘keyword spotting’ algorithms evolved to solve these types of problems.
comected word recognition,-both speaker trained and speaker independent.
This technology was built on top of word recognition technology, choosing
to exploit the word models that were successfuiln isolated word recognition,

and extend the modeling to recognize a concatenated sequence (a string) of
such word models as a word string. This technology opened up a class of
applications based on recognizing digit strings and alphanumeric strings, and
led to a variety of systems for voice dialing, credit card authorization,
directory assistance lookups, and catalog ordering.
con.tin.uous or fluent speech recognition.-both speaker trained and speaker
independent. This technology led to the first large vocabulary recognition
systems which were used to access databases (the DARPA Resource
Management Task), to do constrained dialogue access to information (the
DARPA ATIS Task), to handle very large vocabulary read speech for
dictation (the DARPA NAB Task), and eventually were used for desktop
dictation systems for PC environment[s2 ].
speech un.derstan.ding systems (so-called unconstrained dialogue systems)
which are capable of determining the underlying message embedded within
the speech, rather than just recognizing the spoken words [3]. Such systems,
which are only beginning to appear recently, enable services like customer
care (the AT&T How May I Help You System), and intelligent agent
systems which provide access to information sources by voidceia logues (the
AT&T Maxwell Task).
spontaneous con.versation, s.ystenzs which are able to both recognize the
spoken material accurately and understand the meaning of the spoken
material. Such systems, which are currently beyond the limits of the existing
technology, will enable new services such as ‘Conversation Summarization’,
‘Business Meeting Notes’, ‘Topic Spottingi’n fluent speech (e.g., ftom radio
or TV broadcasts), and ultimately even language translation services between
any pair of existing languages.
Generic Speech Recognition System [4]
Figure 1 shows a block diagram of a typical integrated continuous speech
recognition system. Interestingly enough, this generic block diagram can be made
to work on virtually any speech recognition task that has been devised in the past
40 years, Le., isolated word recognition, connected word recognition, continuous
speech recognition, etc.
The feature analysis module provides the acoustic feature vectors used to
characterize the spectral properties of the time varying speech signal. The wordlevel
acoustic match module evaluates the similarity between the input feature
vector sequence (corresponding to a portion of the input speech) and a set of
acoustic word modelsf or all words in the recognition task vocabularyt o determine
which words were most likely spoken. The sentence-level match module uses a
language model (i.e., a model of syntax and semantics) to determine the most
likely sequence of words. Syntactic and semantic rules can be specified, either
manually, based on task constraints, or with statistical models such as word and
class N-gram probabilities. Search and recognition decisions are made by

considering all likely word sequences and choosing the one with the best matching
score as the recognized sentence.
Recognized

Block diagram of a typical integrated continuous speech recognizer.
Almost every aspect of the continuous speech recognizer of Figure 1 has been
studied and optimized over the years. As a result, we have obtained a great deal of
knowledge about how to design the feature analysis module, how to choose
appropriate recognition units, how to populate the word lexicon, how to build
acoustic word models, how to model language syntax and semantics, how to
decode word matches against word models, how to efficiently determine a
sentence match, and finally how to eventually choose the best rtecognized
sentence. Among the things we have learneda re the following:
the best spectral features to use are LPC-based cepstral coefficients (either on a
linear or a me1 frequency scale) and their first and second order derivatives,
along with log energies and their derivatives.
the continuous density hidden Markov model ( H " ) with state: mixture
densities is the best model for the statistical properties of the spectra.1 features
over time.
the most robust set of speech units is a set of context dependent triphme units
for modeling both intraword and interword linguistic phenonema.
although maximum likelihood training of unit models is effective for many
speech vocabularies, the use of discriminative training methods (e.g., MMI
training or Global Probabilistic Descent (GPD) methods) is more effkctive for
most tasks.

the most effective technique for making the unit models be robust to varying
speakers, microphones, backgrounds, and transmission environments is through
the use of signal conditioning methods such as Cepstral Mean Subtraction
(CMS) or some type of Signal Bias Removal (SBR).
the use of adaptive learning increases performance for new talkers, new
backgrounds, and new transmission systems.
0 the use of utterance verification provides improved rejectiono f improper speech
or background sounds.
H ” s can be made very efficient in terms of computation speed, memory size,
and performance through the use of subspace and parameter tieing methods.
efficient word and sentence matches can be obtained through thues e of efficient
beam searches, tree-trellis coding methods, and through proper determinization
of the Finite State Network (FSN) that is being searched and decoded. Such
procedures also lead to efficient methods for obtaining the N-best sentence
matches to the spoken input.
the ideas of concept spotting can be used to implement semantic constraints of a
task in an automatically trainable manner.

Building Good Speech-Based Applications [5]

In addition to having good speech recognition technology, effective speechbased
applications heavily depend on several factors, including:
good user interfaces which make the application easy-to-use and robust to the
good models of dialogue that keep the conversation moving forward, even in
matching the task to the technology.
kinds of confusion that arise in human-machine communications by voice.
periods of great uncertainty on the parts of either the user or the machine.
We now expand somewhat on each of these factors.
User In.terface Destp-In order to make a speech interface as simple and as
effective as Graphical User Interfaces (GUI), 3 key design principles should be
followed as closely as possible, namely:
provide a con.tinuous represmtation of the objects and actions of interest.
provide a mechanism for rapid, in.crernen.ta1, and reversible operations whose
use physical actions or labeled button presses instead of text commands,
For Speech Interfaces (SI), these GUI principles are preserved in the following
user design principles:
remindteach users what can be said at any point in the interaction.
maintain consistency across features using a vocabulary that is ‘almost always
available’.
design for error.
provide the ability to barge-in over prompts.
use implicit confirmation of voice input.
impact on the object of interest is immediately visible.
whenever possible.

0 rely on ‘earcons’ to orient users as to where they are in an interaction with the
0 avoid information overload by aggregation or pre-selection of a subset of the
These user interface design principles are applied in different ways in the
applications described later in this paper.
machine.
material to be presented.
Dialogue Design. Principles-For many interactions between a person and a
machine, a dialogue is needed to establish a complete interaction with the
machine. The ‘ideal’ dialogue allows either the user or the machine: to initiate
queries, or to choose to respond to queries initiated by the other side. (Such
systems are called ‘mixed initiative’ systems.) A complete set of design principles
for dialogue systems has not yet evolved (it is far too early yet). However, much
as we have learned good speech interface design principles, many of the same or
similar principles are evolving for dialogue management. The key principles that
have evolved are the following: