20-04-2011, 09:39 AM
Presented By
SNEHA G GAJBHIYE
New_Microsoft_Word_Document (1).doc (Size: 95 KB / Downloads: 89)
ABSTRACT
A supply chain is a network of facilities and distribution options that performs the functions of procurement of materials, transformation of these materials into intermediate and finished products, and the distribution of these finished products to customers. Supply chains exist in both service and manufacturing organizations, although the complexity of the chain may vary greatly from industry to industry and firm to firm. Supply chain management (SCM) is the term used to describe the management of the flow of materials, information, and funds across the entire supply chain, from suppliers to component producers to final assemblers to distribution (warehouses and retailers), and ultimately to the consumer .The SCM consists basically 3 types of flows as given below:
Product Flow: It includes the movements of goods from manufacturer to customer and raw material from supplier to manufacturer.
Information Flow: The Information flow involves transmitting order and updating the status of delivery. Financial Flow: The financial flow consists of credit terms, payments, schedules & consignment & title ownership arrangements. In fact, it often includes after-sales service and returns or recycling. SCM typically involves coordination of information and materials among multiple firms. Firms are increasingly thinking in terms of competing as part of a supply chain against other supply chains, rather than as a single firm against other individual firms. Also, as firms successfully streamline their own operations, the next opportunity for improvement is through better coordination with the suppliers and customers. The costs of poor coordination can be extremely high. With the recent explosion of inexpensive information technology, it seems only natural that business would become more supply chain focused. However, while technology is clearly an enabler of integration, it alone can not explain the radical organizational changes in both individual firms and whole industries. Changes in both technology and management theory set the
INTRODUCTION
Voice morphing which is also referred to as voice transformation and voice conversion is a technique for modifying a source speaker’s speech to sound as if it was spoken by some designated target speaker. There are many applications of voice morphing including customising voices for TTS systems, transforming voice-overs in adverts and films to sound like that of a well-known celebrity, and enhancing the speech of impaired speakers such as laryngectomees. Two key requirements of many of these applications are that firstly they should not rely on large amounts of parallel training data where both speakers recite identical texts, and secondly the high audio quality of the source should be preserved in the transformed speech.
The core process in a voice morphing system is the transformation of the spectral envelope of the source speaker to match that of the target speaker and various approaches have been proposed for doing this such as codebook mapping [1], [2], formant mapping [3] and linear transformations [4], [5], [6]. Codebook mapping, however, typically leads to discontinuities
in the transformed speech. Although some discontinuities can be resolved by some form of interpolation technique [2], the conversion approach can still suffer from a lack of robustness as well as degraded quality. On the other hand, formant mapping is prone to formant tracking errors. Hence, transformation-based approaches are now the most popular.
In particular, the continuous probabilistic transformation approach introduced by Stylianou et al. [4] provides the baseline for modern systems. In this approach, a Gaussian mixture model (GMM) is used to classify each incoming speech frame, and a set of linear transformations weighted by the continuous GMM probabilities are applied to give a smoothly
varying target output. The linear transformations are typically estimated from time-aligned parallel training data using least mean squares. More recently, Kain has proposed a variant of this method in which the GMM classification is based on a joint density model[5]. However, like the original Stylianou approach, it still relies on parallel training data. Although the
requirement for parallel training data is often acceptable, there are applications which require voice transformation for nonparallel training data. Examples can be found in the entertainment and media industries where recordings of unknown speakers need to be transformed to sound like well-known personalities. Further uses are envisaged in applications where the provision of parallel data is impossible such as when the
source and target speaker speak different languages. This paper begins by expressing the continuous probabilistic transform of Stylianou as a simple interpolated linear transform. Expressed in a compact form, this representation then leads straightforwardly to the realisation of the conventional training and conversion algorithms. In analogy to the transform-based adaptation methods used in recognition
[7], [8], the estimation of the interpolated transform is then extended to a maximum likelihood formulation which does not require that the source and training data be parallel. Although interpolated linear transforms are effective in transforming speaker identity, the direct transformation of successive source speech frames to yield the required target speech will result in a number artifacts. The reasons for this are as follows. Firstly, the reduced dimensionality of the spectral vector used to represent the spectral envelope and the averaging effect of the linear transformation result in formant broadening and a loss of spectral detail. Secondly, unnatural phase dispersion in the target speech can lead to audible artifacts and this effect is aggravated when pitch and duration are modified. Thirdly, unvoiced sounds have very high variance and are typically not transformed. However, in that case, residual voicing from the source is carried over to the target speech resulting in a disconcerting background whispering effect.
To achieve high quality of voice conversion, all these issues have to be taken into account and in this paper, we identify and present solutions for each of them. These include a spectral refinement approach to compensate the spectral distortion, a phase prediction method for natural phase coupling and an unvoiced sounds transformation scheme. Each of these techniques is assessed individually and the overall performance of the complete solution evaluated using listening tests. Overall it is found that the enhancements significantly improve speaker identification scores and perceived audio quality.
LITERATURE SURVEY
2.1.WHAT IS MORPHING
We hear the word morphing in day to day life. The word morphing stands for alteration or change. This means changing of the source to our desired target. We have heard of video morphing, which stands for alterting the vedio slides to suit our requirement. Audio morphing or Voice morphing is a technique for modifying a source speaker's speech to sound as if it was spoken by some designated target speaker. Most of the recent approaches to voice morphing apply a linear transformation to the spectral envelope and pitch scaling to modify the prosody.this techniques have revolutionized the entire entertainment world, business world ,security systems and sorry to say the criminal world.
2.2. WHAT IS VOICE MORPHING
Voice Morphing which is also referred to as voice transformation and voice conversion is a technique to modify a source speaker's speech utterance to sound as if it was spoken by a target speaker. There are many applications which may benefit from this sort of technology. For example, a TTS system with voice morphing technology integrated can produce many different voices. In cases where the speaker identity plays a key role, such as dubbing movies and TV-shows, the availability of high q uality voice morphing technology will be very valuable allowing the appropriate voice to be generated (maybe in different languages) without the original actors being present.
There are basically three inter-dependent issues that must be solved before building a voice morphing system. Firstly, it is important to develop a mathematical model to represent the speech signal so that the synthetic speech can be regenerated and prosody can be manipulated without artifacts. Secondly, the various acoustic cues which enable humans to identify speakers must be identified and extracted. Thirdly,
the type of conversion function and the method of training and applying the conversion function must be decided.
2.3. A DEMONSTRATION TABLE
Table below shows some examples of Voice Morphing Technology. The "Source Speech" column indicates the utterances of the source speaker, and the "Target Speech" column is the target speaker's utterances. The utterances in both these two columns are not included in the training data for the estimation of the conversion function. The next two columns, "Converted Speech 1" and "Converted Speech 2", are the results regenerated using the Voice Morphing technology. The difference between these two column is that the "Converted Speech 1" applies the target prosody extracted from the target utterance, but the "Converted Speech 2" still applies the original prosody of the source utterances. The reason to convert with different prosody is for the evaluation of prosody influence on speaker identification.
SNEHA G GAJBHIYE
New_Microsoft_Word_Document (1).doc (Size: 95 KB / Downloads: 89)
ABSTRACT
A supply chain is a network of facilities and distribution options that performs the functions of procurement of materials, transformation of these materials into intermediate and finished products, and the distribution of these finished products to customers. Supply chains exist in both service and manufacturing organizations, although the complexity of the chain may vary greatly from industry to industry and firm to firm. Supply chain management (SCM) is the term used to describe the management of the flow of materials, information, and funds across the entire supply chain, from suppliers to component producers to final assemblers to distribution (warehouses and retailers), and ultimately to the consumer .The SCM consists basically 3 types of flows as given below:
Product Flow: It includes the movements of goods from manufacturer to customer and raw material from supplier to manufacturer.
Information Flow: The Information flow involves transmitting order and updating the status of delivery. Financial Flow: The financial flow consists of credit terms, payments, schedules & consignment & title ownership arrangements. In fact, it often includes after-sales service and returns or recycling. SCM typically involves coordination of information and materials among multiple firms. Firms are increasingly thinking in terms of competing as part of a supply chain against other supply chains, rather than as a single firm against other individual firms. Also, as firms successfully streamline their own operations, the next opportunity for improvement is through better coordination with the suppliers and customers. The costs of poor coordination can be extremely high. With the recent explosion of inexpensive information technology, it seems only natural that business would become more supply chain focused. However, while technology is clearly an enabler of integration, it alone can not explain the radical organizational changes in both individual firms and whole industries. Changes in both technology and management theory set the
INTRODUCTION
Voice morphing which is also referred to as voice transformation and voice conversion is a technique for modifying a source speaker’s speech to sound as if it was spoken by some designated target speaker. There are many applications of voice morphing including customising voices for TTS systems, transforming voice-overs in adverts and films to sound like that of a well-known celebrity, and enhancing the speech of impaired speakers such as laryngectomees. Two key requirements of many of these applications are that firstly they should not rely on large amounts of parallel training data where both speakers recite identical texts, and secondly the high audio quality of the source should be preserved in the transformed speech.
The core process in a voice morphing system is the transformation of the spectral envelope of the source speaker to match that of the target speaker and various approaches have been proposed for doing this such as codebook mapping [1], [2], formant mapping [3] and linear transformations [4], [5], [6]. Codebook mapping, however, typically leads to discontinuities
in the transformed speech. Although some discontinuities can be resolved by some form of interpolation technique [2], the conversion approach can still suffer from a lack of robustness as well as degraded quality. On the other hand, formant mapping is prone to formant tracking errors. Hence, transformation-based approaches are now the most popular.
In particular, the continuous probabilistic transformation approach introduced by Stylianou et al. [4] provides the baseline for modern systems. In this approach, a Gaussian mixture model (GMM) is used to classify each incoming speech frame, and a set of linear transformations weighted by the continuous GMM probabilities are applied to give a smoothly
varying target output. The linear transformations are typically estimated from time-aligned parallel training data using least mean squares. More recently, Kain has proposed a variant of this method in which the GMM classification is based on a joint density model[5]. However, like the original Stylianou approach, it still relies on parallel training data. Although the
requirement for parallel training data is often acceptable, there are applications which require voice transformation for nonparallel training data. Examples can be found in the entertainment and media industries where recordings of unknown speakers need to be transformed to sound like well-known personalities. Further uses are envisaged in applications where the provision of parallel data is impossible such as when the
source and target speaker speak different languages. This paper begins by expressing the continuous probabilistic transform of Stylianou as a simple interpolated linear transform. Expressed in a compact form, this representation then leads straightforwardly to the realisation of the conventional training and conversion algorithms. In analogy to the transform-based adaptation methods used in recognition
[7], [8], the estimation of the interpolated transform is then extended to a maximum likelihood formulation which does not require that the source and training data be parallel. Although interpolated linear transforms are effective in transforming speaker identity, the direct transformation of successive source speech frames to yield the required target speech will result in a number artifacts. The reasons for this are as follows. Firstly, the reduced dimensionality of the spectral vector used to represent the spectral envelope and the averaging effect of the linear transformation result in formant broadening and a loss of spectral detail. Secondly, unnatural phase dispersion in the target speech can lead to audible artifacts and this effect is aggravated when pitch and duration are modified. Thirdly, unvoiced sounds have very high variance and are typically not transformed. However, in that case, residual voicing from the source is carried over to the target speech resulting in a disconcerting background whispering effect.
To achieve high quality of voice conversion, all these issues have to be taken into account and in this paper, we identify and present solutions for each of them. These include a spectral refinement approach to compensate the spectral distortion, a phase prediction method for natural phase coupling and an unvoiced sounds transformation scheme. Each of these techniques is assessed individually and the overall performance of the complete solution evaluated using listening tests. Overall it is found that the enhancements significantly improve speaker identification scores and perceived audio quality.
LITERATURE SURVEY
2.1.WHAT IS MORPHING
We hear the word morphing in day to day life. The word morphing stands for alteration or change. This means changing of the source to our desired target. We have heard of video morphing, which stands for alterting the vedio slides to suit our requirement. Audio morphing or Voice morphing is a technique for modifying a source speaker's speech to sound as if it was spoken by some designated target speaker. Most of the recent approaches to voice morphing apply a linear transformation to the spectral envelope and pitch scaling to modify the prosody.this techniques have revolutionized the entire entertainment world, business world ,security systems and sorry to say the criminal world.
2.2. WHAT IS VOICE MORPHING
Voice Morphing which is also referred to as voice transformation and voice conversion is a technique to modify a source speaker's speech utterance to sound as if it was spoken by a target speaker. There are many applications which may benefit from this sort of technology. For example, a TTS system with voice morphing technology integrated can produce many different voices. In cases where the speaker identity plays a key role, such as dubbing movies and TV-shows, the availability of high q uality voice morphing technology will be very valuable allowing the appropriate voice to be generated (maybe in different languages) without the original actors being present.
There are basically three inter-dependent issues that must be solved before building a voice morphing system. Firstly, it is important to develop a mathematical model to represent the speech signal so that the synthetic speech can be regenerated and prosody can be manipulated without artifacts. Secondly, the various acoustic cues which enable humans to identify speakers must be identified and extracted. Thirdly,
the type of conversion function and the method of training and applying the conversion function must be decided.
2.3. A DEMONSTRATION TABLE
Table below shows some examples of Voice Morphing Technology. The "Source Speech" column indicates the utterances of the source speaker, and the "Target Speech" column is the target speaker's utterances. The utterances in both these two columns are not included in the training data for the estimation of the conversion function. The next two columns, "Converted Speech 1" and "Converted Speech 2", are the results regenerated using the Voice Morphing technology. The difference between these two column is that the "Converted Speech 1" applies the target prosody extracted from the target utterance, but the "Converted Speech 2" still applies the original prosody of the source utterances. The reason to convert with different prosody is for the evaluation of prosody influence on speaker identification.