09-08-2012, 04:03 PM
Design and Implementation of a Parallel Turbo-Decoder ASIC for 3GPP-LTE
Design and Implementation of a Parallel.pdf (Size: 2.7 MB / Downloads: 62)
Abstract
Turbo-decoding for the 3GPP-LTE (Long Term
Evolution) wireless communication standard is among the most
challenging tasks in terms of computational complexity and
power consumption of corresponding cellular devices. This paper
addresses design and implementation aspects of parallel turbodecoders
that reach the 326.4 Mb/s LTE peak data-rate using
multiple soft-input soft-output decoders that operate in parallel.
To highlight the effectiveness of our design-approach, we realized
a 3.57mm2 radix-4-based 8× parallel turbo-decoder ASIC in
0.13 m CMOS technology achieving 390 Mb/s. At the more
realistic 100 Mb/s LTE milestone targeted by industry today, the
turbo-decoder consumes only 69mW.
Index Terms—3G mobile communication, LTE, parallel turbodecoder,
ASIC implementation, low-power, radix-4
I. INTRODUCTION
DURING the last few years, 3G wireless communication
standards, such as HSDPA [2], firmly established themselves
as an enabling technology for data-centric communication.
The advent of smart-phones, netbooks, and other mobile
broadband devices finally ushered in an era of throughputintensive
wireless applications. The rapid increase in wireless
data traffic now begins to strain the network capacity and
operators are looking for novel technologies enabling even
higher data-rates than those achieved by HSDPA. Recently,
the new air interface standard LTE (Long Term Evolution) [3]
has been defined by the standards body 3GPP and aims at
improving the data-rates by more than 30× (compared to that
of HSDPA) in the next few years. Theoretically, LTE supports
up to 326.4 Mb/s [4], whereas the industry plans to realize the
first milestone at about 100 Mb/s in 1-or-2 years.
LTE specifies the use of turbo-codes to ensure reliable communication.
Parallel turbo-decoding, which deploys multiple
soft-input soft-output (SISO) decoders operating concurrently,
will be the key for achieving the high data-rates offered by
LTE. However, the implementation of such will be among
the main challenges in terms of computational intensity and
This paper was presented in part at the IEEE International Solid-State
Circuits Conference (ISSCC), San Francisco, CA, USA, Feb. 2009 [1].
C. Studer is with the Communication Technology Laboratory (CTL),
ETH Zurich, 8092 Zurich, Switzerland, (e-mail: studerc[at]nari.ee.ethz.ch).
C. Benkeser and Q. Huang are with the Integrated Systems Laboratory
(IIS), ETH Zurich, 8092 Zurich, Switzerland (e-mail: benkeser[at]iis.ee.ethz.ch;
huang[at]iis.ee.ethz.ch).
The authors would like to thank S. Schläpfer and F. Gürkaynak for
their assistance during the ASIC design. Furthermore, the authors gratefully
acknowledge the support of H. Bölcskei, A. Burg, N. Felber, W. Fichter, and
H. Kaeslin.
Digital Object Identifier XXX-XXX-XXX
power consumption. The fact that none of the recently reported
parallel turbo-decoders [5]–[7] achieves the LTE peak data-rate
or provides desirable power consumption for battery-powered
devices of less than 100mW at the 100 Mb/s milestone,
indicates that the architecture design for such decoders is a
challenging task.
1) Contributions: In this work, we discuss concepts and
architectures which allow for the power-efficient implementation
of high-throughput parallel turbo-decoding for LTE.
To this end, we investigate the associated throughput/area
tradeoffs for the identification of the key design parameters
and optimize the most crucial design blocks. To alleviate the
design-inherent interleaver bottleneck, we describe a memory
architecture that supports the bandwidth required by LTE and
present a general architecture solution—referred to as Master-
Slave Batcher network—suitable for maximally-vectorizable
contention-free interleavers. We furthermore detail a radix-4-
based SISO decoder architecture that enables high-throughput
turbo-decoding. As a proof-of-concept, we show an 8 parallel
ASIC prototype achieving the LTE peak data-rate and
the 100 Mb/s milestone at low power, and finally compare the
key characteristics to that of other measured turbo-decoder
ASICs [5]–[7].
2) Outline: The remainder of the paper is organized as
follows. Section II reviews the principles of turbo-decoding
and details the algorithm used for SISO decoding. The parallel
turbo-decoder architecture is presented in Section III and
the corresponding throughput/area tradeoffs are studied. The
interleaver architecture is detailed in Section IV and Section V
describes the architecture of the SISO decoder. Section VI
provides ASIC-implementation results and a comparison with
existing turbo-decoders. We conclude in Section VII.
II. TURBO-DECODING FOR LTE
Turbo codes [8], capable of achieving close-to-Shannon
capacity and amenable to hardware-efficient implementation,
have been adopted by many wireless communication standards,
including HSDPA [2] and LTE [3]. The turbo encoder
specified in the LTE standard is illustrated in the left-hand side
(LHS) of Fig. 1 and consists of a feed-through, two 8-state
recursive convolutional encoders (CEs), and an interleaver.
The feed-through passes one block of K information bits xk,
k = 0; : : : ;K