08-08-2012, 05:03 PM
Gender Classification by Speech Analysis
1Gender Classification.docx (Size: 618.47 KB / Downloads: 87)
Abstract
This project is about a comparative investigation on speech signals to devise a gender classifier.Gender classification by speech analysis basically aims to predict the gender of the speaker by analyzing different parameters of the voice sample. This comparative investigation mainly concentrates on short-time analysis of the speech signals. The analysis includes comparison of short-time average magnitude, short-time energy, short-time zero crossing rate and short-time auto-correlation values of male and female voice samples. This quantitative comparison is implemented through MATLAB programming. A database consisting of voice samples collected from many students, both male and female, of our college was created. The short-time analysis was performed on all the collected voice samples and the parameters were compared to establish a working principle for the gender classifier from speech. Along with the above mentioned parameters, comparative investigation on pitch analysis methods is done which would increase the accuracy of the gender classifier. The pitch analysis methods are based on auto-correction and cepstral analysis.
Introduction
HUMAN SPEECH
Speech is the vocalized form of human communication. It is based upon the syntactic combination of lexicals and names that are drawn from very large (usually to about 10,000 different words) vocabularies. Each spoken word is created out of the phonetic combination of a limited set of vowel and consonant speech sound units. These vocabularies, the syntax which structures them and their set of speech sound units differ, creating the existence of many thousands of different types of mutually unintelligible human languages. Human speakers (polyglots) are often able to communicate in two or more of them. The vocal abilities that enable humans to produce speech also provide humans with the ability to sing.
At a linguistic level, speech can be viewed as a sequence of basic sound units called phonemes. A phoneme is a sound or group of different sounds perceived to have the same function by the speakers of a language. An example of a phoneme is /k/ sound in the words kit and skill. The same phoneme may give rise to many different sounds or allophones at the acoustic level, depending on the phonemes which surround it. Different speakers producing the same string of phonemes convey the same information yet sound different as a result of differences in dialect and vocal tract length and shape.
TYPES OF HUMAN SOUND
Some sounds cannot be considered to fall into any one of the three classes above, but are a mixture. For example voiced fricatives result when both vocal cord vibration and a constriction in the vocal tract are present.
Although there are many possible speech sounds which can be produced, the shape of the vocal tract and its mode of excitation change relatively slowly.So speech can be considered to be quasi-stationary over short periods of time (of the order of 20 milliseconds). Speech signals show a high degree of predictability, due sometimes to the quasi-periodic vibrations of the vocal cords and also due to the resonances of the vocal tract. Speech coders attempt to exploit this predictability in order to reduce the data rate necessary for good quality voice transmission.
Voiced Sound
Voiced sounds are produced when the vocal cords vibrate open and closed, thus interrupting the flow of air from the lungs to the vocal tract and producing quasi-periodic pulses of air as the excitation. The rate of the opening and closing gives the pitch of the sound. This can be adjusted by varying the shape of, and the tension in, the vocal cords, and the pressure of the air behind them. Voiced sounds show a high degree of periodicity at the pitch period, which is typically between 2 and 20 milliseconds.
Speech Analysis
The techniques used to process speech signals that can be broadly classified as either time-domain or frequency-domain analysis. In time-domain analysis, the measurements are performed directly on the speech signal to extract information. In frequency-domain analysis, the information is extracted after the frequency content of the speech signal computed to form the spectrum.
SHORT-TIME ANALYSIS
Properties of a speech signal changes relatively slowly with time. Thus allows examination of a short-time duration of speech to extract parameters that are assumed to remain same for that time duration. This forms the basis of the short-time analysis. The speech signal is divided into many sub-signals of short-time duration by means of “windowing” technique. After splitting the large signal into many analysis frames with use of appropriate “windows”, each frame is analyzed and then a cumulative result is obtained.