15-06-2013, 12:49 PM
APPLICATIONS OF SPEECH RECOGNITION IN THE AREA OF TELECOMMUNICATION
APPLICATIONS OF SPEECH1.pdf (Size: 230.79 KB / Downloads: 101)
Abstract
Advances in speech recognition technology, over the past 4 decades, have
enabled a wide range of telecommunications and desktop services to become ‘voiceenabled’.
Early applications were driven by the need to automaaten d thereby reduce
the cost of attendant services, or by the need to create revenue generatinnge w services
which were previously unavailable because of cost, or the inability to adequately
provide such a service with the available work force. As we move towards the future
we see a new generation of voice-enabled service offerings emerging including
intelligent agents, customer care wizards, call center automated attendants, voice
access to universal directories and registries, unconstrained dictation capability, and
finally unconstrained language translation capability. In this paper we review the
current capabilities of speech recognition systems, show how they have been exploited
in today’s services and applications, and show how they will evolve over time to the
next generationo f voice-enabled services.
INTRODUCTION
Speech recognition technologyh as evolved for more than 40 years, spurred on by
advances in signal processing, algorithms, architectures, and hardware. During
that time it has gone from a laboratory curiosity, to an art, and eventually to a full
fledged technology thati s practiced and understood by a widrea nge of‘ engineers,
scientists, linguists, psychologists, and systems designers. Over those 4 decades
the technology of speech recognition has evolved, leading to a steady stream of
increasingly more difficultasks which have been tackled and solved. The
hierarchy of speech recognition problems which have been attacked, and the
resulting application tasks which became viable as a result.
Generic Speech Recognition System [4]
Figure 1 shows a block diagram of a typical integrated continuous speech
recognition system. Interestingly enough, this generic block diagram can be made
to work on virtually any speech recognition task that has been devised in the past
40 years, Le., isolated word recognition, connected word recognition, continuous
speech recognition, etc.
The feature analysis module provides the acoustic feature vectors used to
characterize the spectral properties of the time varying speech signal. The wordlevel
acoustic match module evaluates the similarity between the input feature
vector sequence (corresponding to a portion of the input speech) and a set of
acoustic word modelsf or all words in the recognition task vocabularyt o determine
which words were most likely spoken. The sentence-level match module uses a
language model (i.e., a model of syntax and semantics) to determine the most
likely sequence of words. Syntactic and semantic rules can be specified, either
manually, based on task constraints, or with statistical models such as word and
class N-gram probabilities.
Current Capabilities ofS peech Recognizers
Table 1 provides a summary of the performance of modern speech recognition
and natural language understanding systems. Shown in the table are the Task or
Corpus, the Type of speech input, the Vocabulary Size and the resulting Word
Error Rate. It can be seen that the technology is more than suitable for connected
digit recognition tasks, for simple data retrieval tasks (like the Airline Travel
Information System), and, with a well-designed user interface, can even be used
for dictation like the Wall Street Journal Task. However, the word error rates
rapidly become prohibitive for tasks like recognizing speech from a radio
broadcast (with all of the cross-announcer banter, commercials, etc), from
listening in an conversationalt elephone calls off a switchboard,o r even in the case
of familiarity of families calling each other over a switched telephone line.
Summary
The world of telecommunications is rapidly changing and evolving. The world of
speech recognition is rapidly changing and evolving. Early applications of the
technology have achieved varying degrees of success. The promise for the future
is significantly higher performance for almost every speech recognition
technology area, with more robustness to speakers, background noises etc. This
will ultimately lead to reliable, robust voice interfaces to every
telecommunications service that is offered, thereby making them universally
available.