Project on Speech Recognition using Neural Networks

**seminar flower** · 22-09-2012, 12:48 PM

Speech Recognition using Neural Networks

.pdf

1Speech Recognition.pdf (Size: 970.74 KB / Downloads: 41)

Abstract

This thesis examines how artificial neural networks can benefit a large vocabulary, speaker
independent, continuous speech recognition system. Currently, most speech recognition
systems are based on hidden Markov models (HMMs), a statistical framework that supports
both acoustic and temporal modeling. Despite their state-of-the-art performance, HMMs
make a number of suboptimal modeling assumptions that limit their potential effectiveness.
Neural networks avoid many of these assumptions, while they can also learn complex functions,
generalize effectively, tolerate noise, and support parallelism. While neural networks
can readily be applied to acoustic modeling, it is not yet clear how they can be used for temporal
modeling. Therefore, we explore a class of systems called NN-HMM hybrids, in which
neural networks perform acoustic modeling, and HMMs perform temporal modeling. We
argue that a NN-HMM hybrid has several theoretical advantages over a pure HMM system,
including better acoustic modeling accuracy, better context sensitivity, more natural discrimination,
and a more economical use of parameters. These advantages are confirmed
experimentally by a NN-HMM hybrid that we developed, based on context-independent
phoneme models, that achieved 90.5% word accuracy on the Resource Management database,
in contrast to only 86.0% accuracy achieved by a pure HMM under similar conditions.

Introduction

Speech is a natural mode of communication for people. We learn all the relevant skills
during early childhood, without instruction, and we continue to rely on speech communication
throughout our lives. It comes so naturally to us that we don’t realize how complex a
phenomenon speech is. The human vocal tract and articulators are biological organs with
nonlinear properties, whose operation is not just under conscious control but also affected
by factors ranging from gender to upbringing to emotional state. As a result, vocalizations
can vary widely in terms of their accent, pronunciation, articulation, roughness, nasality,
pitch, volume, and speed; moreover, during transmission, our irregular speech patterns can
be further distorted by background noise and echoes, as well as electrical characteristics (if
telephones or other electronic equipment are used). All these sources of variability make
speech recognition, even more than speech generation, a very complex problem.
Yet people are so comfortable with speech that we would also like to interact with our
computers via speech, rather than having to resort to primitive interfaces such as keyboards
and pointing devices. A speech interface would support many valuable applications — for
example, telephone directory assistance, spoken database querying for novice users, “handsbusy”
applications in medicine or fieldwork, office dictation devices, or even automatic
voice translation into foreign languages. Such tantalizing applications have motivated
research in automatic speech recognition since the 1950’s. Great progress has been made so
far, especially since the 1970’s, using a series of engineered approaches that include template
matching, knowledge engineering, and statistical modeling. Yet computers are still
nowhere near the level of human performance at speech recognition, and it appears that further
significant advances will require some new insights.

Speech Recognition

What is the current state of the art in speech recognition? This is a complex question,
because a system’s accuracy depends on the conditions under which it is evaluated: under
sufficiently narrow conditions almost any system can attain human-like accuracy, but it’s
much harder to achieve good accuracy under general conditions. The conditions of evaluation
— and hence the accuracy of any system — can vary along the following dimensions:
• Vocabulary size and confusability. As a general rule, it is easy to discriminate
among a small set of words, but error rates naturally increase as the vocabulary
size grows. For example, the 10 digits “zero” to “nine” can be recognized essentially
perfectly (Doddington 1989), but vocabulary sizes of 200, 5000, or 100000
may have error rates of 3%, 7%, or 45% (Itakura 1975, Miyatake 1990, Kimura
1990). On the other hand, even a small vocabulary can be hard to recognize if it
contains confusable words. For example, the 26 letters of the English alphabet
(treated as 26 “words”) are very difficult to discriminate because they contain so
many confusable words (most notoriously, the E-set: “B, C, D, E, G, P, T, V, Z”);
an 8% error rate is considered good for this vocabulary (Hild & Waibel 1993).

Neural Networks

Connectionism, or the study of artificial neural networks, was initially inspired by neurobiology,
but it has since become a very interdisciplinary field, spanning computer science,
electrical engineering, mathematics, physics, psychology, and linguistics as well. Some
researchers are still studying the neurophysiology of the human brain, but much attention is now being focused on the general properties of neural computation, using simplified neural
models. These properties include:
• Trainability. Networks can be taught to form associations between any input and
output patterns. This can be used, for example, to teach the network to classify
speech patterns into phoneme categories.
• Generalization. Networks don’t just memorize the training data; rather, they
learn the underlying patterns, so they can generalize from the training data to new
examples. This is essential in speech recognition, because acoustical patterns are
never exactly the same.
• Nonlinearity. Networks can compute nonlinear, nonparametric functions of their
input, enabling them to perform arbitrarily complex transformations of data. This
is useful since speech is a highly nonlinear process.
• Robustness. Networks are tolerant of both physical damage and noisy data; in
fact noisy data can help the networks to form better generalizations. This is a valuable
feature, because speech patterns are notoriously noisy.

Fundamentals of Speech Recognition

Speech recognition is a multileveled pattern recognition task, in which acoustical signals
are examined and structured into a hierarchy of subword units (e.g., phonemes), words,
phrases, and sentences. Each level may provide additional temporal constraints, e.g., known
word pronunciations or legal word sequences, which can compensate for errors or uncertainties
at lower levels. This hierarchy of constraints can best be exploited by combining
decisions probabilistically at all lower levels, and making discrete decisions only at the
highest level.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	A Novel Data Embedding Method Using Adaptive Pixel Pair Matching Report	project girl	3	4,489	15-01-2018, 01:56 PM Last Post: dhanabhagya
	SURA Project Monitors Report	seminar projects maker	1	1,177	21-09-2017, 01:30 PM Last Post: jaseela123
	THREE DIMENSIONAL PASSWORD FOR MORE SECURE AUTHENTICATION A MAIN PROJECT REPORT	study tips	1	1,300	21-09-2017, 12:56 PM Last Post: jaseela123
	CONSUMER PERCEPTION TOWARDS ONLINE GROCERY STORES” PROJECT REPORT	project maker	1	1,966	21-09-2017, 12:41 PM Last Post: jaseela123
	Detecting False Data in Wireless Sensor Network using Efficient Becan Scheme	seminar tips	1	3,235	20-09-2017, 01:03 PM Last Post: jaseela123
	Color Image Indexing Using BTC	seminar tips	1	1,436	19-09-2017, 02:52 PM Last Post: jaseela123
	Mobile Messenger Using Ad-hoc Networks	seminar code	1	682	19-09-2017, 02:50 PM Last Post: jaseela123
	System Analysis (Modeling of the Existing and Proposed System using OOD)	seminar flower	1	2,459	15-09-2017, 03:39 PM Last Post: jaseela123
	DESIGN AND PERFORMANCE ANALYSIS OF OPTICAL CDMA SYSTEM USING NEWLY DESIGNED MULTIWAVE	project girl	1	1,270	15-09-2017, 01:34 PM Last Post: jaseela123
	A PROJECT REPORT ON Lost Articles and Letters Reconciliation System	project girl	1	3,355	15-09-2017, 10:59 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.