06-04-2016, 03:24 PM
voice morphing (Download Full Report And Abstract)
Voice Morphing which is also referred to as voice transformation and voice conversion is a technique to modify a source speaker's speech utterance to sound as if it was spoken by a target speaker. There are many applications which may benefit from this sort of technology. For example, a TTS system with voice morphing technology integrated can produce many different voices. In cases where the speaker identity plays a key role, such as dubbing movies and TV-shows, the availability of high quality voice morphing technology will be very valuable allowing the appropriate voice to be generated (maybe in different languages) without the original actors being present.
There are basically three inter-dependent issues that must be solved before building a voice morphing system. Firstly, it is important to develop a mathematical model to represent the speech signal so that the synthetic speech can be regenerated and prosody can be manipulated without artifacts. Secondly, the various acoustic cues which enable humans to identify speakers must be identified and extracted. Thirdly, the type of conversion function and the method of training and applying the conversion function must be decided.
Voice Morphing which is also referred to as voice transformation and voice conversion is a technique to modify a source speaker's speech utterance to sound as if it was spoken by a target speaker. There are many applications which may benefit from this sort of technology. For example, a TTS system with voice morphing technology integrated can produce many different voices. In cases where the speaker identity plays a key role, such as dubbing movies and TV-shows, the availability of high quality voice morphing technology will be very valuable allowing the appropriate voice to be generated (maybe in different languages) without the original actors being present.
There are basically three inter-dependent issues that must be solved before building a voice morphing system. Firstly, it is important to develop a mathematical model to represent the speech signal so that the synthetic speech can be regenerated and prosody can be manipulated without artifacts. Secondly, the various acoustic cues which enable humans to identify speakers must be identified and extracted. Thirdly, the type of conversion function and the method of training and applying the conversion function must be decided.