Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: IMPLEMENTATION OF SPEECH RECOGNITION IN RESOURCE CONSTRAINED ENVIRONMENTS
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
IMPLEMENTATION OF SPEECH RECOGNITION IN RESOURCE CONSTRAINED ENVIRONMENTS


[attachment=32093]

A BRIEF INTRODUCTION TO SPEECH RECOGNITION

Real time continuous speech recognition is a computationally demanding task, and one
which tends to benefit from increasing the available computing resources.
A typical speech recognition system starts with a preprocessing stage, which takes a
speech waveform as its input, and extracts from it feature vectors or observations which
represent the information required to perform recognition. This stage is efficiently
performed by software. The second stage is recognition, or decoding, which is performed
using a set of phoneme-level statistical models called hidden Markov models (HMMs).
Word-level acoustic models are formed by concatenating phone-level models according
to a pronunciation dictionary. These word models are then combined with a language
model, which constrains the recognizer to recognize only valid word sequences. The
decoder stage is computationally expensive.



Unit Matching System:

First a choice of speech recognition unit must be made.
Possibilities include linguistically based sub-word units such as phones (or phone-like
units), diphones, demisyllables, and syllables, as well as derivative units such as
phenemes, phenones, and acoustic units. Other possibilities include whole word units,
and even units which correspond to a group of 2 or more words (e.g., and an, in the, of a,
etc). Generally, the less complex the unit (e.g., phones), the fewer of them there are in the
language, and the more complicated (variable) their structure in continuous speech. For
large vocabulary speech recognition (involving 1000 or more words), the use of sub-word
speech units is almost mandatory as it would be quite difficult to record an adequate
training set for designing HMMs for units of the size of words or larger. However, for
specialized applications (e.g., small vocabulary, constrained task), it is both reasonable
and practical to consider the word as a basic speech unit. Independent of the unit chosen
for recognition, an inventory of such units must be obtained via training.



DISCRETE MARKOV PROCESSES

Consider a system which may be described at any time as being in one of a set of N
distinct states, S1, S2 . . . , SN, as illustrated in Fig. 1 (where N = 5 for simplicity). At
regularly spaced discrete times, the system undergoes a change of state (possibly back to
the same state) according to a set of probabilities associated with the state. We denote the
time instants associated with state changes as t = 1, 2, . . . , and we denote the actual state
at time t as qt.



CONCLUSIONS
1. Speech signal which changes its characteristics with time can be modeled
stochastically in the form of HMM parameters and codebook.
2. Usage of an FPGA over MATLAB has enhanced the overall recognition time by 6
times.
3. Choice of number of states N=6 is satisfactory and there is no simple relationship
existing between recognition accuracy, the number of sounds in the word and the number
of states needed in an HMM.
4. Single model per word should be adequate and effects of different random starts
(initial model) are also insignificant.
The above conclusions point out that the FPGA approach is very attractive for speech
recognition and is inexpensive in terms of storage and speed.
Following are the suggestions of future work in this field.