25-08-2017, 09:32 PM
Voice Verification System Using Wavelets
Voice Verification System Using Wavelets.pdf (Size: 652.53 KB / Downloads: 9)
Abstract
This paper presents a novel voice verification system using
wavelet transforms. The conventional signal processing techniques assume
the signal to be stationary and are ineffective in recognizing non stationary
signals such as the voice signals. Voice signals which are more dynamic
could be analyzed with far better accuracy using wavelet transform. The
developed voice recognition system is word dependant voice verification
system combining the RASTA and LPC. The voice signal is filtered using
the special purpose voice signal filter using the Relative Spectral Algorithm
(RASTA). The signals are denoised and decomposed to derive the wavelet
coefficients and thereby a statistical computation is carried out. Further the
formant or the resonance of the voices signal is detected using the Linear
Predictive Coding (LPC). With the statistical computation on the
coefficients alone, the accuracy of the verifying sample individual voice to
his own voice is quite high (around 75% to 80%). The reliability of the
signal verification is strengthened by combining entailments from these two
completely different aspects of the individual voice. For voice comparison
purposes four out five individuals are verified and the results show higher
percentage of accuracy. The accuracy of the system can be improved by
incorporating advanced pattern recognition techniques such as Hidden
Markov Model (HMM).
INTRODUCTION
Speech is a very basic way for humans to convey information to
one another. With a bandwidth of only 4 kHz, speech can
convey information with the emotion of a human voice. People want to be
able to hear someone’s voice from anywhere in the world. As if the person
was in the same room. As a result a greater emphasis is being placed on the
design of new and efficient speech coders for voice
communication and transmission.
Today applications of speech coding and compression have
become very numerous. Many applications involve the real time coding of
speech signals, for use in mobile satellite communications, cellular
telephony, and audio for videophones or video teleconferencing systems.
Other applications include the storage of speech for speech synthesis
and playback, or for the transmission of voice at a later time. Some
examples include voice mail systems, voice memo wristwatches,
voice logging recorders and interactive PC software. Traditionally speech
coders can be classified into two categories: waveform coders and
analysis/synthesis vocoders (from .voice coders.). Waveform coders attempt
to copy the actual shape of the signal produced by the microphone and its
associated analogue circuits a popular waveform coding technique is
pulse code modulation (PCM),
This is used in telephony today. Vocoders use an entirely different
approach to speech coding, known as parameter coding,
or analysis/synthesis coding where no attempt is made at reproducing the
exact
WAVELETS
The fundamental idea behind wavelets is to analyse according to
scale. The wavelet analysis procedure is to adopt a wavelet prototype
function called an analysing wavelet or mother wavelet. Any signal can then
be represented by translated and scaled versions of the mother
wavelet. Wavelet analysis is capable of revealing aspects of data that
other signal analysis techniques such as Fourier analysis miss aspects
like trends, breakdown points, discontinuities in higher derivatives, and
self-similarity. Furthermore, because it affords a different view of data than
those presented by traditional techniques, it can compress or de-noise a
signal without appreciable degradation.
Examples of Wavelets
The different families make trade-offs between how compactly the
basis functions are localized in space and how smooth they are. Within each
family of wavelets (such as the Daubechies family) are wavelet subclasses
distinguished by the number of filter coefficients and the level of iteration.
Wavelets are most often classified within a family by the number of
vanishing moments. This is an extra set of mathematical relationships
for the coefficients that must be satisfied. The extent of compactness of
signals depends on the number of vanishing moments of the wavelet
function used. A more detailed discussion is provided in the next section.
Multilevel Decomposition
The decomposition process can be iterated, with
successive approximations being decomposed in turn, so that one
signal is broken down into many lower resolution Components. This is
called the wavelet decomposition tree.
The wavelet decomposition of the signal s analysed at level j has
the following structure [cAj, cDj, ..., cD1]. Looking at a signals
wavelet decomposition tree can reveal valuable information. The
diagram below shows the wavelet decomposition to level 3 of a sample
signal S
Formant Estimation
Formant is one of the major components of speech. The
frequencies at which the resonant peaks occur are called the formant
frequencies or simply formants . The formant of the signal can be obtained
by analyzing the vocal tract frequency response
Figure 2 shows the vocal tract frequency response. The x-axis
represents the frequency scale and the yaxis represents the magnitude of the
signal. As it can be seen, the formants of the signals are classified as F1, F2,
F3 and F4. Typically a voice signal will contain three to five formants. But
in most voice signals, up to four formants can be detected. In order to obtain
the formant of the voice signals, the LPC (Linear Predictive Coding)
method is used. The LPC (Linear Predictive Coding) method is
derived from the word linear prediction. Linear prediction as the term
implies is a type of mathematical operation. This mathematical function
which is used in discrete time signal estimates the future values based
upon a linear function of previous samples [8].
VOICE SIGNAL ANALYSIS
STA or Relative Spectral Algorithm as it is known is a
technique that is developed as the initial stage for voice recognition. This
method works by applying a band-pass filter to the energy in each frequency
sub-band in order to smooth over short-term noise variations and to remove
any constant offset. In voice signals, stationary noises are often
detected. Stationary noises are noises that are present for the full period of
a certain signal and does not have diminishing feature . Their property
does not change over time. The assumption that needs to be made is that
the noise varies slowly with respect to speech. This makes the RASTA a
perfect tool to be included in the initial stages of voice signal filtering
to remove stationary noises . The stationary noises that are identified are
noises in the frequency range of 1Hz - 100Hz
SYSTEM IMPLEMENTATION
In order to implement the system, a certain methodology is
implemented by decomposing the voice signal to its approximation
and detail. From the approximation and detail coefficients that are extracted,
the methodology is implemented in order to carry out the recognition
CONCLUSION
The Voice Recognition Using Wavelet Feature Extraction employ
wavelets in voice recognition for studying the dynamic properties and
characteristics of the voice signal. This is carried out by estimating
the formant and detecting the pitch of the voice signal by using LPC
(Linear Predictive Coding). The voice recognition system that is developed
is word dependant voice verification system used to verify the
identity of an individ