28-09-2013, 02:30 PM
Digital Media Processing
Digital Media .doc (Size: 438 KB / Downloads: 78)
Digitising speech:
Traditional telephone channels normally restrict speech to a band-width of 300 to 3400 Hz. This band-pass filtering is considered not to cause serious loss of intelligibility or quality, although it is easily demonstrated that the loss of signal power below 300 Hz and above 3400 Hz has a significant effect on the naturalness of the sound. Once band-limited in this way the speech may be sampled at 8 kHz, in theory without incurring aliasing distortion. The main “CCITT standard” for digital speech channels in traditional non-mobile telephony (i.e. in the “plain old fashioned telephone system” POTS) is an allocation of 64000 bits/sec to accommodate an 8 kHz sampling rate with each sample quantised to 8 bits per sample. This standard is now officially known as the “ITU-T G711” speech coding standard. Since the bits are transmitted by suitably shaped voltage pulses, this is called "pulse-code modulation" (PCM).
International standards for speech coding:
The CCITT which stands for “Comite Consultif International de Telephonie and Telegraphie” was, until 1993, an international committee responsible for setting global telecommunication standards. This committee existed up to 1993 as part of the “International Telecommunications Union” (ITU) which was, and still is, part of the “United Nations Economic Scientific & Technical Organisation (UNESCO)”. Since 1993, the CCITT has become part of what is now referred to as the “ITU Telecommunications Standards Sector (ITU-T)”. Within ITU-T are various “study groups” which include a study group responsible for speech digitisation and coding standards.
With the advent of digital cellular radio telephony, a number of national and international standardisation organisations have emerged for the definition of all aspects of particular cellular mobile telephone systems including the method of digitising speech. Among the organisations defining standards for telecommunications and telephony the three main ones are the following:
“TCH-HS”: part of the “European Telecommunications Standards Institute (ETSI)”. This body originated as the “Groupe Special Mobile (GSM)” and is responsible for standards used by the European “GSM” digital cellular mobile telephone system.
“TIA” Telecommunications Industry Association. The USA equivalent of ETSI.
“RCR” Research and Development Centre for Radio Systems” the Japanese equivalent of ETSI.
Further reducing the bit-rate for digitised speech
A conventional PCM system encodes each sample of the input waveform independently and is capable of encoding any wave-shape so long as the maximum frequency component is less than one half of the sampling frequency. Analysis of speech waveforms, however, shows that they have a degree of predictability and general trends may be identified allowing one to make estimates as to which sample value is likely to follow a given set of samples. The existence of this predictability means that part of the information transmitted by conventional PCM is redundant and that savings in the required bit-rate can be achieved.
Speech has 'voiced' and 'unvoiced' parts corresponding to spoken 'vowels' and 'consonants' respectively. The predictability lies mostly in the voiced speech portions, and these are the loudest (in telephone speech) and the most critical. Voiced speech has periodicity which means that a 'characteristic waveform', looking like a decaying sine-wave, is repeated periodically (or approximately so) when a vowel is being spoken.
Digitisation of Video
According to the CCIR-601 ITU standard, digital television comparable to American analogue NTSC television with 486 lines would require 720 pixels per line, each pixel requiring 5 bits per colour i.e. about 2 bytes per pixel. Scanned at 30 frames per second, this would require a bit-rate of about 168 Mb/s or 21 Mbytes per second. A normal CD-Rom would hold only about 30 seconds of TV video at this bit-rate.
A similar calculation for high definition TV (HDTV) gives a requirement of about 933 Mb/s and for film quality the required bit-rate has been estimated at 2300Mb/s. An SVGA computer screen with 800 by 600 pixels requires 3 x 8 = 24 bits per pixel and therefore 288 Mbits/second if refreshed at 25Hz with interlacing. The need for video digitisation standards achieving compression is clear.
The MPEG-1 and MPEG-2 standards for moving video and the FCC standard for HDTV both use a version of the 2-D Fourier transform known as the “2-D discrete cosine transform (DCT)” applied to 8 by 8 (or 10 by 10) pixel “tiles” extracted from the image. The red green and blue colour measurements for each pixel are first transformed to a “luminance”(brightness) and two “chrominance” (colour) measurements so that advantage can be taken of the fact that the eye is more sensitive to differences in luminance than to variations in chrominance. The three measurements for each pixel produce three separate images, one for luminance and two for chrominance, which are now dealt with separately. The number of bits required for the two chrominance images can be reduced by averaging the chrominance measurements for sets of four adjacent pixels to produce images with fewer pixels.