Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: The Adaptive Multi-Rate Wideband Speech Codec (AMR-WB)
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
The Adaptive Multi-Rate Wideband Speech Codec (AMR-WB)

[attachment=31646]

Abstract

This paper describes the Adaptive Multi-Rate Wideband (AMR-WB) speech codec recently
selected by the Third Generation Partnership Project (3GPP) and European Telecommunication
Standards Institute (ETSI) for GSM and the third generation mobile communication WCDMA
system for providing wideband speech services. The paper describes AMR-WB standardization
history, algorithmic description including novel techniques for efficient ACELP wideband speech
coding, and subjective quality performance. The AMR-WB speech codec algorithm was selected in
December 2000, and the corresponding specifications were approved in March 2001. In July 2001,
the AMR-WB codec was also selected by ITU-T in the standardization activity for wideband speech
coding around 16 kbit/s. The adoption of AMR-WB by ITU-T is of significant importance since for
the first time that the same codec is adopted for wireless as well as wireline services. AMR-WB
uses an extended audio bandwidth from 3.4 kHz to 7 kHz and gives superior speech quality and
voice naturalness compared to existing 2nd and 3rd generation mobile communication systems. The
wideband speech service provided by the AMR-WB codec will give mobile communication speech
quality that also substantially exceeds (narrowband) wireline quality.

Introduction

The current second and third generation (3G) mobile communication systems operate with
narrow audio bandwidth limited to 200-3400 Hz. As wireless systems are evolving from voicetelephony
dominated services to multimedia and high-speed data services, the introduction of a
wider audio bandwidth of 50-7000 Hz provides substantially improved speech quality and
naturalness. Compared to narrowband telephone speech, the low-frequency enhancement from 50 to
200 Hz contributes to increased naturalness, presence and comfort. The high-frequency extension
from 3400 to 7000 Hz provides better fricative differentiation and therefore higher intelligibility. A
bandwidth of 50 to 7000 Hz not only improves the intelligibility and naturalness of speech, but adds
also a feeling of transparent communication and eases speaker recognition.
Recent advances in speech coding have made wideband coding feasible in the bit-rates
applicable for mobile communication. Since 1999 3GPP and ETSI have carried out development
and standardisation of a wideband speech codec for the WCDMA 3G and GSM systems. A
feasibility study phase of wideband coding preceded the launch of standardisation.

Evolution of GSM and WCDMA 3G speech coding

The 13 kbit/s Full-Rate (FR) codec was the first voice codec defined for GSM. The codec
was standardised in 1989 and it is used in the GSM full-rate channel whose gross bit-rate is 22.8
kbit/s. FR is the default codec to provide speech service in GSM. The 5.6 kbit/s Half-Rate (HR)
codec was standardised in 1995 to provide channel capacity savings through operation in the halfrate
channel at a gross bit-rate of 11.4 kbit/s. The HR codec provides the same level of speech
quality as the FR codec, except in background noise and in tandem (two encodings in MS-to-MS
calls) where the performance is somewhat lower.
The 12.2 kbit/s Enhanced Full-Rate (EFR) codec was the first GSM codec to provide voice
quality equivalent to that of a wireline telephony [1]. The EFR codec brought a substantial quality
improvement over the two previous GSM codecs. EFR provides wireline speech quality across all
typical radio conditions down to carrier-to-interference ratio (C/I) of approximately 10 dB [1].
Wireline quality was required since GSM had become increasingly used in communication
environments where it started to compete directly with fixed line or cordless systems. To be
competitive also with respect to speech quality, GSM needed to provide wireline speech quality
which is robust to typical usage conditions such as background noise and transmission errors. The
EFR was standardised first for the GSM-based PCS 1900 system in North-America during 1995 and
was adopted to GSM in 1996 through a competitive selection process. In addition to voice quality
performance, the advantage of using the same voice codec in PCS 1900 and in GSM was one factor
in favour of this particular codec. The EFR codec was jointly developed by Nokia and the
University of Sherbrooke.

Algorithmic description of the speech encoder

The AMR-WB codec is based on the algebraic code excited linear prediction (ACELP)
technology [4]. The ACELP technology has been very successful in encoding telephone-band
speech signals, and several ACELP-based standards are being deployed in a wide range of
applications including digital cellular applications and VoIP (e.g. AMR-NB 3GPP (TS 26.090) [2],
ETSI EFR codec (TS 06.60) [1], NA-TDMA IS-641, NA-CDMA IS-127, and ITU-T G.729 and
G.723.1 codecs).
Although ACELP technology gives good performance on narrow-band signals, some
difficulties arise when applying the telephone-band optimized ACELP model to wideband speech,
therefore additional features need to be added to the model for obtaining high quality on wideband
signals. The ACELP model will often spend most of its encoding bits on the low-frequency region,
which usually has higher energy contents, resulting in a low-pass output signal. To overcome this
problem, the perceptual weighting filter has to be modified in order to suit wideband signals.
Further, the pitch contents in the spectrum of voiced segments in wideband signals do not extend
over the whole spectrum range, and the amount of voicing shows more variation compared to
narrow-band signals. Thus, it is important to improve the closed-loop pitch analysis to better
accommodate the variations in the voicing level. For the same reasons, it is also important to
improve periodicity enhancement techniques at the decoder. Another important issue that arises in
coding wideband signals is the need to use very large excitation codebooks. Therefore, efficient
codebook structures that require minimal storage and can be rapidly searched become essential.

Codec overview

The AMR-WB speech codec consists of nine speech coding modes with bit-rates of 23.85,
23.05, 19.85, 18.25, 15.85, 14.25, 12.65, 8.85 and 6.6 kbit/s [5]. AMR-WB includes also a
background noise mode which is designed to be used in discontinuous transmission (DTX)
operation in GSM and as a low bit-rate source dependent mode for coding background noise in
other systems. In GSM the bit-rate of this mode is 1.75 kbit/s.
The 12.65 kbit/s mode and the modes above it offer high quality wideband speech. The two
lowest modes 8.85 and 6.6 kbit/s are intended to be used only temporarily during severe radio
channel conditions or during network congestion.
The block diagrams of the encoding and decoding algorithms are shown in Figure 1 and
Figure 2, respectively. The bit allocation of the codec at different bit rates is shown in Table 1. The
AMR-WB codec operates at a 16 kHz sampling rate. Coding is performed in blocks of 20 ms. Two
frequency bands, 50 – 6400 Hz and 6400 – 7000 Hz, are coded separately in order to decrease
complexity and to focus the bit allocation into the subjectively most important frequency range.
Note that already the lower frequency band (50-6400 Hz) goes far beyond narrowband telephony.
The input signal is down-sampled to 12.8 kHz and pre-processed using a high-pass filter and
a pre-emphasis filter of the form P(z)=1-μz-1 with μ = 0.68. The ACELP algorithm is then applied to
the down-sampled and pre-processed signal. Linear Prediction (LP) analysis is performed once per
20 ms frame. The set of LP parameters is converted to immittance spectrum pairs (ISP) [6] and
vector quantized using split-multistage vector quantization with 46 bits. The speech frame is divided
into four subframes of 5 ms each (64 samples). The adaptive and fixed codebook parameters are
transmitted every subframe. The pitch lag is encoded with 9 bits in odd subframes and relatively
encoded with 6 bits in even subframes.