20-09-2014, 02:10 PM
SPEECH RECOGNITION USING WAVELET TRANSFORM
SPEECH RECOGNITION.pdf (Size: 1.13 MB / Downloads: 31)
.INTRODUCTION
Automatic speech recognition (ASR) aims at converting spoken language to text.
Scientists all over the globe have been working under the domain, speech recognition for last
many decades. This is one of the intensive areas of research. Recent advances in soft
computing techniques give more importance to automatic speech recognition. Large variation
in speech signals and other criteria like native accent and varying pronunciations makes the
task very difficult. ASR is hence a complex task and it requires more intelligence to achieve a
good recognition result.
Speech recognition is currently used in many real-time applications, such as cellular
telephones, computers, and security systems. However, these systems are far from perfect in
correctly classifying human speech into words. Speech recognizers consist of a feature
extraction stage and a classification stage. The parameters from the feature extraction stage
are compared in some form to parameters extracted from signals stored in a database or
template. The parameters could be fed to a neural network.
Speech word recognition systems commonly carry out some kind of classification
recognition based on speech features which are usually obtained via Fourier Transforms
(FTs), Short Time Fourier Transforms (STFTs), or Linear Predictive Coding techniques.
However, these methods have some disadvantages. These methods accept signal stationarity
within a given time frame and may therefore lack the ability to analyze localized events
correctly. The wavelet transform copes with some of these problems. Other factors
influencing the selection of Wavelet Transforms (WT) over conventional methods include
their ability to determine localized features. Discrete Wavelet Transform method is used for
speech processing.
Database collection
Database collection is the most important step in speech recognition. Only an
efficient database can yield a good speech recognition system. As we know different people
say words differently. This is due to the difference in the pitch, slang, pronunciation. In this
step the same word is recorded by different persons. All words are recorded at the same
frequency 16KHz. Collection of too much samples need not benefit the speech recognition.
Sometimes it can affect it adversely. So, right number of samples should be taken. The same
step is repeated for other words also.
Discrete Wavelet Transform
The transform of a signal is just another form of representing the signal. It does
not change the information content present in the signal. For many signals, the low-frequency
part contains the most important part. It gives an identity to a signal. Consider the human
voice. If we remove the high-frequency components, the voice sounds different, but we can
still tell what‟s being said. In wavelet analysis, we often speak of approximations and details.
The approximations are the high- scale, low-frequency components of the signal. The details
are the low-scale, high frequency components. The DWT is defined by the following
equation:
CONCLUSION
Speech recognition is one of the advanced areas. Many research works has been
taking place under this domain to implement new and enhanced approaches. During the
experiment we experienced the effectiveness of Daubechies4 mother wavelet in feature
extraction. In this experiment we have only used a limited number of samples. Increasing the
number of samples may give better feature and a good recognition result for Malayalam word
utterances. The performance of Neural Network with wavelet is appreciable. We have used
software with some limitations, if we increase the number of samples as well as the number
iterations (training), it can produce a good recognition result.
We also observed that, Neural Network is an effective tool which can be embedded
successfully with wavelet. The effectiveness of wavelet based feature extraction with other
classification methods like neuro-fuzzy and genetic algorithm techniques can be used to do
the same task.
From this study we could understand and experience the effectiveness of discrete
wavelet transform in feature extraction. Our recognition results under different kind of noise
and noisy conditions, show that choosing dyadic bandwidths have better performance than
choosing equal bandwidths in sub-band recombination. This result adapts to way which
human ear recognizes speech and shows a useful benefit of dyadic nature of multi-level
wavelet transform for sub-band speech recognition.
The wavelet transform is a more dominant technique for speech processing
than other previous techniques. ANN has proved to be the most successful classifier
compared to HMM.