24-04-2014, 12:25 PM
Design of a Speaker Recognition Code using MATLAB
Design of a Speaker Recognition.pdf (Size: 232.52 KB / Downloads: 513)
ABSTRACT
This project entails the design of a speaker recognition code using MATLAB. Signal
processing in the time and frequency domain yields a powerful method for analysis.
MATLAB’s built in functions for frequency domain analysis as well as its
straightforward programming interface makes it an ideal tool for speech analysis projects.
For the current project, experience was gained in general MATLAB programming and
the manipulation of time domain and frequency domain signals. Speech editing was
performed as well as degradation of signals by the application of Gaussian noise.
Background noise was successfully removed from a signal by the application of a 3rd
order Butterworth filter. A code was then constructed to compare the pitch and formant
of a known speech file to 83 unknown speech files and choose the top twelve matches.
INTRODUCTION
Development of speaker identification systems began as early as the 1960s with
exploration into voiceprint analysis, where characteristics of an individual’s voice were
thought to be able to characterize the uniqueness of an individual much like a fingerprint.
The early systems had many flaws and research ensued to derive a more reliable method
of predicting the correlation between two sets of speech utterances. Speaker
identification research continues today under the realm of the field of digital signal
processing where many advances have taken place in recent years.
PITCH ANALYSIS
The file recorded with my slower speech (a17.wav) was found from the ordered
list of speakers. Pitch analysis was conducted and relevant parameters were extracted.
The average pitch of the entire wav file was computed and found to have a value of
154.8595 Hz. The graph of pitch contour versus time frame was also created to see how
the pitch varies over the wav file, Figure (3). The results of pitch analysis can be used in
speaker recognition, where the differences in average pitch can be used to characterize a
speech file. The code for this process can be found in Appendix D.
RESULTS
Results of speech editing are shown in Figure (5). As can be seen, the phrase “ECE-
310,” the second half of the first plot, has clearly been moved to the front of the
waveform in the second plot.
Speech degradation by the application of Gaussian noise can be seen in Figure (6).
The upper plot shows the signal from wav file a18.wav in the time domain. The middle
plot yields a frequency domain view of the same wav file. The bottom plot allows for a
comparison between the clean signal (middle plot) and one with Gaussian noise added to
it. Results of the speech enhancement routine can be seen in Figure (7). The upper plot
shows the file a71.wav with natural background noise.
Conclusion
A crude speaker recognition code has been written using the MATLAB
programming language. This code uses comparisons between the average pitch of a
recorded wav file as well as the vector differences between formant peaks in the PSD of
each file. It was found that comparison based on pitch produced the most accuracy,
while comparison based on formant peak location did produce results, but could likely be
improved. Experience was also gained in speech editing as well as basic filtering
techniques. While the methods utilized in the design of the code for this project are a
good foundation for a speaker recognition system, more advanced techniques would have
to be used to produce a successful speaker recognition system.