08-05-2012, 12:42 PM
An Advanced Method for Speech Recognition
v49-177.pdf (Size: 685.13 KB / Downloads: 33)
INTRODUCTION
SPEECH is one of the most important tools for
communication between human and his environment.
Therefore manufacturing of Automatic System Recognition
(ASR) is desire for him all the time [1]. In a speech
recognition system, many parameters affect the accuracy of
the Recognition System. These parameters are: dependence or
independence from speaker, discrete or continues word
recognition, size of vocabulary book, language constrains,
colloquial speeches and recognition environment conditions.
Problems such as noisy environment, incompatibility between
train and test conditions, dissimilar expressing of one word by
two different speakers and different pronouncing of one word
by one person in several times, is led to made system without
complete recognition; So resolving each of these problems is a
good step toward this aim. A speech recognition algorithm is
consisted of several stages that the most significant of them
are feature extraction and pattern recognition. In feature
extraction category, best presented algorithms are zero
crossing rate, permanent frequency, cepstrum coefficient and
liner prediction coefficient [2].
ARCHITECTURE OF SYSTEM
The overall architecture of our speech recognition system
has been shown in the figure below. Our speech recognition
process contains four main stages:
1- Acoustic processing that main task of this unit is filtering of
the white noise from speech signals and consists of three parts,
Fast Fourier Transform, Mels Scale Bank pass Filtering and
Cepstral Analysis.
PREPROCESSING
The digitized sound signal contains relevant, the data, and
irrelevant information, such as white noise; therefore it
requires a lot of storage space [11]. Most frequency
component of speech signal is below 5KHz and upper ranges
almost include white noise that directly impact on system
performance and training speed, because of its chromatic
nature. So speech data must be preprocessed.
MLP NEURAL NETWORK
A neural network (NN) is a massive processing system that
consists of many processing entities connected through links
that represent the relationship between them [15]. A
Multilayer Perceptron (MLP) network consists of an input
layer, one or more hidden layers, and an output layer. Each
layer consists of multiple neurons. An artificial neuron is the
smallest unit that constitutes the artificial neural network. The
actual computation and processing of the neural network
happens inside the neuron [16].
CONCLUSION
Currently development of speech recognition is widely used
in industrial software market. In this paper, we presented a
new method that developed an automatic Persian speech
recognition system performance. Using UTA algorithm
redounded to increase system learning time from 18000 to
6500 epoch and system accuracy average value to 98%.
Considerable specifications of this system are excellent
performance with minimum training samples, fast learning
and wide range of recognition and online classification of the
receiving signals.