30-08-2017, 11:19 AM
Linear Predictive Coding (LPC) is a tool used mainly in processing of audio signals and voice processing to represent the spectral envelope of a digital voice signal in compressed form, using information from a linear predictive model. It is one of the most powerful voice analysis techniques and one of the most useful methods for encoding good quality speech at a low bit rate and provides extremely accurate estimates of speech parameters.
LPC begins with the assumption that a voice signal is produced by a buzzer at the end of a tube (sound sounds), with occasional whistles and pops sounds (sibilants and plosive sounds). Although apparently crude, this model is actually a close approximation of the reality of speech production. The glottis (the space between the vocal folds) produces tinnitus, which is characterized by its intensity (intensity) and frequency (tone). The vocal tract (throat and mouth) forms the tube, which is characterized by its resonances, which give rise to formants, or improved frequency bands in the sound produced. Whistles and pops are generated by the action of the tongue, lips and throat during sibilants and plosives.
LPC analyzes the speech signal by estimating the formants, eliminating their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The formant removal process is called reverse filtering, and the signal remaining after subtraction of the filtered modeled signal is called a residue.
Numbers describing the intensity and frequency of buzzing, formants and residue signal can be stored or transmitted elsewhere. LPC synthesizes the speech signal by reversing the process: use the buzz and residue parameters to create a source signal, use the formants to create a filter (representing the tube) and run the source through the filter, resulting in the speaks.
Because voice signals vary over time, this process is performed on short chunks of the voice signal, which are called frames; generally 30 to 50 frames per second give intelligible speech with good compression.
LPC begins with the assumption that a voice signal is produced by a buzzer at the end of a tube (sound sounds), with occasional whistles and pops sounds (sibilants and plosive sounds). Although apparently crude, this model is actually a close approximation of the reality of speech production. The glottis (the space between the vocal folds) produces tinnitus, which is characterized by its intensity (intensity) and frequency (tone). The vocal tract (throat and mouth) forms the tube, which is characterized by its resonances, which give rise to formants, or improved frequency bands in the sound produced. Whistles and pops are generated by the action of the tongue, lips and throat during sibilants and plosives.
LPC analyzes the speech signal by estimating the formants, eliminating their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The formant removal process is called reverse filtering, and the signal remaining after subtraction of the filtered modeled signal is called a residue.
Numbers describing the intensity and frequency of buzzing, formants and residue signal can be stored or transmitted elsewhere. LPC synthesizes the speech signal by reversing the process: use the buzz and residue parameters to create a source signal, use the formants to create a filter (representing the tube) and run the source through the filter, resulting in the speaks.
Because voice signals vary over time, this process is performed on short chunks of the voice signal, which are called frames; generally 30 to 50 frames per second give intelligible speech with good compression.