24-07-2012, 03:56 PM
COMPRESSION TECHNOLOGY
COMPRESSION TECHNOLOGY.doc (Size: 149 KB / Downloads: 26)
INTRODUCTION
Normally all audio program material is limited in its quality by the capacity of the channel it has to pass through. In the case of analog signals, the bandwidth and the signal to noise ratio limit the channel. In the case of digital signals the limitation is the sampling rate and the sample word length, which when multiplied together give the bit rate. Compression is a technique, which tries to produce a signal, which is better than the channel it has passed through, would normally allow.
What is compression ?
Compression can be defined as a means of reducing the size of blocks of signal by removing unused and redundant material. An example of unused material is a silence period in a telephone call. Fig.1.1 shows that in all compression schemes a compressor, or coder is required at the transmitting end and an expander or decoder is required at the receiving end of the channel. The combination of a coder and a decoder is called a codec. There are 2 ways in which compression can be used. Firstly, we can improve the quality of an existing channel. An example is the Dolby system; codecs, which improve the quality of analog audio, tape recorders. Secondly we can maintain the same quality as usual but use an inferior channel, which will be cheaper.
Bear in mind that the word compression has a double meaning. In audio, compression can also mean the deliberate reduction of the dynamic range of a signal, often for radio broadcast purposes. Such compression is single ended; there is no intention of a subsequent decoding stage and consequently the results are audible. Here we will be dealing with digital codecs which accept & output digital audio and video signals at the source bit rate and pass them through a channel having a lower bit rate. The ratio between the source and channel bit rates is called the compression factor.
How does compression work ?
In all conventional digital audio and video systems the sampling rate, the word length and the bit rate are all fixed. Whilst this bit rate puts an upper limit on the information rate, most real program material does not reach that limit. As Shannon said, any signal, which is predictable, contains no information. Take the case of a sine wave: one cycle looks the same as the next and so a sine wave contains no information. This is consistent with the fact that it has no bandwidth. The goal of a compressor is to identify and send on the useful part of the input signal, which is known as the entropy. The remaining part of the input signal is called the redundancy. It is redundant because it can be predicted from what the decoder has already been sent.
Some caution is required when using compression because redundancy can be useful to reconstruct parts of the signal, which are lost due to transmission errors. Clearly if redundancy has been removed in a compressor the resulting signal will be less resistant to errors. Unless a suitable protection scheme is applied. Fig.1.2 a) shows that if a codec sends all of the entropy in the input signal and it is received without error, the result will be indistinguishable from the original. However, if some of the entropy is lost, the decoded signal will be impaired in comparison with the original. One important consequence is that you can’t just keep turning up the compression factor. Once the redundancy has been eliminated, any further increase in compression damages the information as Fig.1.2 b) shows. So it’s not possible to say whether compression is a good or a bad thing. The question has to be qualified: how much compression on what kind of material and for what audience?
As the entropy is a function of the input signal, the bit rate out of an ideal compressor will vary. It is not always possible or convenient to have a variable bit rate channel; so many compressors have a buffer memory at each end of a fixed bit rate channel. This averages out the data flow, but causes more delay. For applications such as video-conferencing the delay is unacceptable and so fixed bit rate compression is used to avoid the need for a buffer.
So far we have only considered an ideal compressor, which can perfectly sort the entropy from the redundancy. Unfortunately such a compressor would have infinite complexity and have an infinite processing delay. In practice we have to use real, affordable compressors, which must fail to ideal by some margin. As a result the compression factors we can use have to be reduced because if the compressor can’t decide whether a signal is entropy or not it has to be sent just in case. As Fig.1.2 c) shows, the entropy is surrounded by a “gray area” which may or may not be entropy. The simpler and cheaper the compressor, and the shorter it’s encoding delay, the larger this gray area becomes. However, the decoder must be able to handle all of these cases equally well. Consequently compression schemes are designed so that all of the decisions are taken at the coder. The decoder then makes the best of whatever it receives. Thus the actual bit rate sent is determined at the coder and the decoder needs no adjustment.
CHAPTER 2
DIGITAL AUDIO
Digital Basics
Digital is just another way of representing an existing audio or video waveform. Fig.2.1 shows that in digital audio the analog waveform is represented by evenly spaced samples whose height is described by a whole number, expressed in binary. Digital audio requires a sampling rate between 32 and 48 kHz and samples containing between 14 and 20 bits, depending on the quality. Consequently the source data rate may be anywhere from one half to one million bits per second per audio channel.
In professional applications, digital audio is transmitted over the AES/EBU interface which can send two audio channels as a multiplex down one cable. Standards exist for balanced working with screen twisted pair cables and for unbalanced working using co-axial cable. A variety of sampling rates and word lengths can be accommodated. The master bit clock is 64 times the sampling rate in use. In video installations, a video-synchronous 48 kHz-sampling rate will be used. Different word lengths are handled by zero-filling the word. Two’s complement samples are used, with the MSB sent in the last bit position. Fig.2.5 shows the AES/EBU frame structure. Following the sync. Pattern, needed for deserializing and demultiplexing, there are four auxiliary bits. The main audio sample of up to 20 bits can be seen in the centre of the sub-frame.
CHAPTER 3
AUDIO COMPRESSION
When to compress audio?
The audio component of uncompressed television only requires about one percent of the overall bit rate. In addition human hearing is very sensitive to audio distortion, including that caused by clumsy compression. Consequently for many television applications, the audio need not be compressed at all. For example, compressing video by a factor of two means that uncompressed audio now represents two percent of the bit rate. Compressing the audio is simply not worthwhile in this case. However, if the video has been compressed by a factor of fifty, then the audio and video bit rates will be comparable and compression of the audio will then be worthwhile.
Audio compression principles
Audio compression relies on an understanding of the hearing mechanism and so is a form of perceptual coding. The ear is only able to extract a certain proportion of the information in a given sound. This could be called the perceptual entropy, and all additional sound is redundant. The basilar membrane in the ear behaves as a kind of spectrum analyzer; the part of the basilar membrane, which resonates as a result of an applied sound, is a function of frequency. The high frequencies are detected at the end of the membrane nearest to the eardrum and the low frequencies are detected at the opposite end. The ear analyses with frequency bands, known as critical bands, about 100 Hz wide below 500 Hz and from one sixth to one-third of an octave wide, proportional to frequency, above this. The ear fails to register energy in some bands when there is more energy in a nearby band. The vibration of the membrane in sympathy with a single frequency cannot be localized to an infinitely small area, and nearby areas are forced to vibrate at the same frequency with amplitude that decreases with distance. Other frequencies are excluded unless the amplitude is high enough to dominate the local vibration of the membrane. Thus the membrane has an effective Q factor, which is responsible for the phenomenon of auditory masking, in other words the decreased audibility of one sound in the presence of another. The threshold of hearing is raised in the vicinity of the input frequency. As shown in Fig.4.1, above the masking frequency,
masking is more pronounced, and its extent increases with acoustic level. Below the masking frequency, the extent of masking drops sharply.
Because of the resonant nature of the membrane, it cannot start or stop vibrating rapidly; masking can take place even when the masking tone begins after and ceases before the masked sound. This is referred to as forward and backward masking. Audio compressors work by raising the noise floor at frequencies where the noise will be masked. A detailed model of the masking properties of the ear is essential to their design. The greater the compression factor required, the more precise the model must be. If the masking model is inaccurate, or not properly implemented, equipment may produce audible artifacts. There are many different techniques used in audio compression and these will often be combined in a particular system. Predictive coding uses circuitry, which uses knowledge of previous samples to predict the value of the next. It is then only necessary to send the difference between the prediction and the actual value. The receiver contains an identical predictor to which the transmitted difference is added to give the original value. Predictive coders have the advantage that they work on the signal waveform in the time domain and need a relatively short signal history to operate. They cause a relatively short delay in the coding and decoding stages. Sub-band coding splits the audio spectrum up into many different frequency bands to exploit the fact that most bands will contain lower level signals than the loudest one.
In spectral coding, a transform of the waveform is computed periodically. Since the transform of an audio signal changes slowly, it need be sent much less often than audio samples. The receiver performs an inverse transform. Most practical audio coders use some combination of sub-band or spectral coding. Re-quantizing of sub-band samples or transform coefficients causes increased noise which the coder places at frequencies where it will be masked. If an excessive compression factor is used, the coding noise will exceed the masking threshold and become audible. If a higher bit rate is impossible, better results will be obtained by restricting the audio bandwidth prior to the encoder using a pre-filter. Reducing the bandwidth with a given bit rate allows a better signal to noise ratio in the remaining frequency range. Many commercially available audio coders incorporate such a pre-filter.