22-02-2009, 02:03 AM
1. Introduction
Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals, while generating a smooth transition between them. Speech morphing is analogous to image morphing. In image morphing the in-between images all show one face smoothly changing its shape and texture until it turns into the target face. It is this feature that a speech morph should possess. One speech signal should smoothly change into another, keeping the shared characteristics of the starting and ending signals but smoothly changing the other properties. The major properties of concern as far as a speech signal is concerned are its pitch and envelope information. These two reside in a convolved form in a speech signal. Hence some efficient method for extracting each of these is necessary. We have adopted an uncomplicated approach namely cepstral analysis to do the same. Pitch and formant information in each signal is extracted using the cepstral approach. Necessary processing to obtain the morphed speech signal include methods like Cross fading of envelope information, Dynamic Time Warping to match the major signal features (pitch) and Signal Re-estimation to convert the morphed speech signal back into the acoustic waveform.
This report has been subdivided into seven chapters. The second chapter gives an idea of the various processes involved in this project in a concise manner. A thorough analysis of the procedure used to accomplish morphing and the necessary theory involved is presented in an uncomplicated manner in the third chapter. Processes like pre processing, cepstral analysis, dynamic time warping and signal re-estimation are vividly described with necessary diagrams. The fourth chapter gives a deep insight into the actual morphing process. The conversion of the morphed signal into an acoustic waveform is dealt in detail in the fifth chapter. Chapter six summarizes the whole morphing process with the help of a block diagram. Chapter seven lists the conclusions that have been drawn from this project.
2. An Introspection of the Morphing Process
We had undertaken this work, which sounded quite challenging and interesting. We were eager to know whether a venture like speech morphing will be feasible using the cepstral approach. Processes like cepstral analysis and the re estimation of the morphed speech signal into an acoustic waveform involve much intricacy and challenge. Also this project digs deep into the basics of digital signal processing or speech processing rather. This project covers a lot of ground as far as speech processing is concerned.
Speech morphing can be achieved by transforming the signalâ„¢s representation from the acoustic waveform obtained by sampling of the analog signal, with which many people are familiar with, to another representation. To prepare the signal for the transformation, it is split into a number of 'frames' - sections of the waveform. The transformation is then applied to each frame of the signal. This provides another way of viewing the signal information. The new representation (said to be in the frequency domain) describes the average energy present at each frequency band.
Further analysis enables two pieces of information to be obtained: pitch information and the overall envelope of the sound. A key element in the morphing is the manipulation of the pitch information. If two signals with different pitches were simply cross-faded it is highly likely that two separate sounds will be heard. This occurs because the signal will have two distinct pitches causing the auditory system to perceive two different objects. A successful morph must exhibit a smoothly changing pitch throughout. The pitch information of each sound is compared to provide the best match between the two signals' pitches. To do this match, the signals are stretched and compressed so that important sections of each signal match in time. The interpolation of the two sounds can then be performed which creates the intermediate sounds in the morph. The final stage is then to convert the frames back into a normal waveform.
However, after the morphing has been performed, the legacy of the earlier analysis becomes apparent. The conversion of the sound to a representation in which the pitch and spectral envelope can be separated loses some information. Therefore, this information has to be re-estimated for the morphed sound. This process obtains an acoustic waveform, which can then be stored or listened to.
(Download Full Report And Abstract)
Download