07-06-2013, 04:17 PM
Speaker Recognition Using MFCC and VQ Technique
Speaker Recognition.zip (Size: 1.49 MB / Downloads: 38)
General Introduction:-
Human voice is produced by vocal chords and its frequency ranges from 60-7000Hz.
Voice may differ while talking, singing, laughing, crying, screaming, etc.
Females usually have higher pitched voices than males.
Male voice freq ranges b/w 100-150Hz and Female voice freq ranges b/w 170-220Hz.
Men tend to speak in a more monotonous tone,while women tend to use a wide range of tones when speaking.
Speech Characteristics:-
Articulation:-Formation of clear and distinct sounds in speech.
Intonation:-Variation of pitch when speaking.
Pronunciation:-The way a word or a language is usually spoken.
Speech pitch:-The Pitch is the fundamental frequency of the vocal cords vibration.
Speech Rate:-Rate at which speech is produced.
Formant:-A high frequency speech signal with meaning full information.
Scope of our project:-
Scope of the project is not only concerned with developing an efficient speaker recognition algorithm but also associated with achieving it in extreme environment conditions and dealing with time constants.
Project begins with achieving speaker recognition using Matlab processing of speaker samples.
Then it can be extended to achieve using a standalone system such as TMS320C3x processor.
Principles of Recognition:-
Speaker recognition can be classified into
1. Identification
2. Verification
Identification is the process of determining which registered speaker provides a given utterance.
Verification is the process of accepting or rejecting identity claim of a speaker.
Feature Extraction:-
The purpose of this module is to convert the speech waveform, using digital signal processing (DSP) tools, to a set of features for further analysis. This often referred as the signal-processing front end.
A wide range of possibilities exist for representing the speech signal for the speaker recognition task, such as Linear Prediction Coding (LPC), Mel-Frequency Cepstrum Coefficients (MFCC), and others.
Clustering the training Vectors:-
After the enrolment session, the acoustic vectors extracted from input speech of each speaker provide a set of training vectors for that speaker.
The next important step is to build a speaker-specific VQ codebook for each speaker using those training vectors.
In our project , the K means algorithm is used since it is the most popular, simplest and the easiest one to implement.