29-06-2012, 01:43 PM
Project on Speaker Recognition
peaker Recognition.pdf (Size: 458.96 KB / Downloads: 266)
Introduction:
Speaker recognition is basically divided into two-classification: speaker recognition
and speaker identification and it is the method of automatically identify who is
speaking on the basis of individual information integrated in speech waves. Speaker
recognition is widely applicable in use of speaker’s voice to verify their identity and
control access to services such as banking by telephone, database access services,
voice dialling telephone shopping, information services, voice mail, security control
for secret information areas, and remote access to computer AT and T and TI with
Sprint have started field tests and actual application of speaker recognition
technology; many customers are already being used by Sprint’s Voice Phone Card.
Speaker recognition technology is the most potential technology to create new
services that will make our every day lives more secured. Another important
application of speaker recognition technology is for forensic purposes. Speaker
recognition has been seen an appealing research field for the last decades which still
yields a number of unsolved problems.
Speech Feature Extraction:
In this project the most important thing is to extract the feature from the speech signal.
The speech feature extraction in a categorization problem is about reducing the
dimensionality of the input-vector while maintaining the discriminating power of the
signal. As we know from the above fundamental formation of speaker identification
and verification systems, that the number of training and test vector needed for the
classification problem grows exponential with the dimension of the given input
vector, so we need feature extraction.
But extracted feature should meet some criteria while dealing with the speech signal.
Such as:
Easy to measure extracted Speech features.
Distinguish between speakers while being lenient of intra speaker variability’s.
It should not be susceptible to mimicry.
It should show little fluctuation from one speaking environment to another.
It should be stable over time.
It should occur frequently and naturally in speech.
Hamming Window:
Hamming window is also called the raised cosine window. The equation and plot for the
Hamming window shown below. In a window function there is a zero valued outside of some
chosen interval. For example, a function that is stable inside the interval and zero elsewhere is
called a rectangular window, that illustrate the shape of its graphical representation. When
signal or any other function is multiplied by a window function, the product is also zerovalued
outside the interval. The windowing is done to avoid problems due to truncation of the
signal. Window function has some other applications such as spectral analysis, filter design,
and audio data compression such as Vorbis.
Vector Quantization:
A speaker recognition system must able to estimate probability distributions of the
computed feature vectors. Storing every single vector that generate from the training
mode is impossible, since these distributions are defined over a high-dimensional
space. It is often easier to start by quantizing each feature vector to one of a relatively
small number of template vectors, with a process called vector quantization. VQ is a
process of taking a large set of feature vectors and producing a smaller set of measure
vectors that represents the centroids of the distribution.
Conclusion:
The goal of this project was to create a speaker recognition system, and apply it to a
speech of an unknown speaker. By investigating the extracted features of the
unknown speech and then compare them to the stored extracted features for each
different speaker in order to identify the unknown speaker.