19-02-2013, 02:41 PM
FPGA Implementation of a LDPC Decoder using a Reduced Complexity Message Passing Algorithm
FPGA Implementation.pdf (Size: 636.71 KB / Downloads: 166)
Abstract
In this paper, a simplified message passing
algorithm for decoding Low-Density Parity-Check (LDPC)
codes is proposed with a view to reduce the implementation
complexity. The algorithm is based on simple hard-decision
decoding techniques while utilizing the advantages of soft
channel information to improve decoder performance. It
has been validated through simulation using LDPC code
compliant with Wireless Local Area Network (WLAN –
IEEE 802.11n) standard. The results show that the
proposed algorithm can achieve significant improvement in
bit error rate (BER) performance and average decoding
iterations compared to fully hard-decision based decoding
algorithms. The proposed algorithm has been implemented
and tested on Xilinx Virtex 5 FPGA. With significantly
reduced hardware resources, the implemented decoder can
achieve an average throughput of ~16.2 Gbps with a BER
performance of 10-5 at an Eb/No of 6.25 dB.
INTRODUCTION
The Low-Density Parity-Check (LDPC) codes were
first proposed by Gallager in 1962 [1]. They were not
popular for a few decades since its introduction due to
high implementation complexity. However, it gained
popularity after it was formally re-introduced by MacKay
and Neal in 1997 [2]. It has been shown that LDPC
codes, when optimally designed, have the capability to
perform very close to the Shannon Limit [3]. LDPC
codes have several advantages over turbo codes,
including reduced implementation complexity, better bit
error rate (BER) performance at low signal to noise ratio
(SNR) and the inherent code structure that supports high
degree of parallelism [4]. Hence LDPC codes have
become increasingly popular and have been adopted in
latest generation high data rate applications such as
WLAN [5], WiMax [6] and Digital Video Broadcasting -
Second Generation (DVB-S2) [7, 8].
OVERVIEW OF LDPC DECODING
LDPC codes belong to a class of block codes [23]. As
their name suggests, its parity-check matrix (H) consists
of very small number of non-zero elements. The
sparseness of H determines the decoding complexity and
the minimum distance of the code. Apart from the
requirement that the LDPC matrix be sparse, there is no
other difference between the LDPC code and any other
block code [24]. An LDPC matrix is described by various
parameters, which are briefly described here. An encoded
message also known as codeword consists of the useful
information bits and the redundant bits. The code rate of
a decoder is the ratio of the length of useful information
bits to the length of codeword bits. The number of nonzero
entries in each of the rows and columns of H matrix
is collectively known as degree distribution. An H matrix
is said to be regular if the degree distribution of rows and
columns are uniform, otherwise it is Irregular. The H
matrix can be represented as a graph called Tanner
graph. A cycle in the graph is a sequence of connected
nodes, which start and end at the same node. The girth or
the smallest cycle in the parity-check graph, significantly
contributes to the performance of the iterative decoding
algorithms [25]. A regular (3, 6) parity-check matrix with
10-bit code length is shown in Fig. 1 (a) and the
corresponding Tanner graph representation of the paritycheck
matrix is shown in Fig. 1 (b).
LDPC DECODING ALGORITHMS
In this section, a highly complex Sum-Product
algorithm that can achieve very good performance
compared to a low complexity Bit-Flip algorithm that
suffers from poor performance is presented.
Sum-Product Algorithm
The Sum-Product algorithm for LDPC decoding is a
soft-decision message-passing algorithm that requires
LLR (intrinsic message) for variable node operations to
make decoding decisions. To begin with the decoding
process, the LLRs are passed over to the variable nodes.
The variable nodes (V) perform the ‘sum’ operations on
the input LLRs, as in (1) and the computed (extrinsic)
messages are passed along the connected edges to the
check nodes © [14].
SIMPLIFIED MESSAGE PASSING ALGORITHM
It is well known that SPA can achieve good decoding
performance [14] but with high implementation
complexity, and BFA has low implementation complexity
but poor performance [18]. The main aim of the proposed
Simplified Message Passing Algorithm (SMPA) is to
achieve a trade-off between the decoding performance
and implementation complexity of the above two
algorithms. The check node and variable node operations
of the proposed SMPA are presented next.
Check-Node operation
The complexity of a message passing algorithm
significantly depends on the quantization length of
extrinsic messages and the check node operation. These
aspects are particularly critical in case of hardware
implementation of large LDPC codes. In order to reduce
the complexity of SMPA, the check node consists of a
simple parity check operation (Eq. 5) requiring XOR
logic only, similar to BFA. However, the performance
improvement of SMPA over BFA is achieved from a
distinct variable node (V) operation.
PERFORMANCE ANALYSIS
To evaluate the performance of the proposed
algorithm, simulations were carried out for two different
code lengths, 1000-bit and 648-bit. The latter is
compliant with WLAN (IEEE 802.11n) standard [5]. A
simulation model has been developed using the C
programming language in MatLab environment. The
LDPC codes were generated using Progressive Edge
Growth (PEG) based algorithm [34] and the simulations
were carried out assuming that the codewords were
Binary Phase Shift Keying (BPSK) modulated and passed
over an Additive White Gaussian Noise (AWGN)
channel [35].
FPGA IMPLEMENTATION
Fully parallel implementation of decoders for large
LDPC codes has been problematic due to very large
amount of resources required. We seek to implement a
fully parallel LDPC decoder on FPGA based on the
proposed SMPA to judge the savings in hardware
resources. Implementing a fully parallel architecture will
also be useful to determine the feasibility of various
partially parallel architectures based on SMPA. From the
simulation results (Fig. 4 and Fig. 5), it is clear that using
4-bit SMPA over 3-bit provides significant improvement
in decoding performance. Whereas, using 5-bit SMPA
over 4-bit achieves little or negligible gain in
performance. Hence 4-bit LLR quantization is used for
the FPGA implementation. A ½-rate (3, 6) regular 648-bit
LDPC code that is compliant with WLAN (IEEE
802.11n) standard was chosen to implement the decoder.
Implementation Results
The performance of the LDPC decoder implemented
on FPGA was analyzed and compared against the
software simulation results. The BER and FER
performance of the hardware decoder are shown in Fig. 9
and Fig. 10 respectively. The loss in BER performance
due to implementation is negligible. The loss in FER
performance is less than 0.1 dB (Eb/No). The average
number of iterations required by the hardware decoder
closely follows the average iterations predicted by the
software simulation model, as shown in Fig. 11.
CONCLUSIONS
A simplified message passing algorithm for LDPC
decoding has been presented in this paper. The proposed
algorithm uses higher precision soft LLR-inputs for
variable node operations while passing only harddecision
messages between the processing nodes. This
approach has led to improved BER and FER
performances compared to fully hard-decision based
solutions such as those based on the Bit Flip Algorithm
(BFA). The proposed algorithm also reduces the average
number of decoding iterations compared to BFA. The
algorithm has been verified through FPGA
implementation of a LDPC decoder that complies with
the Wireless LAN standard. The results show that the
decoder requires significantly reduced hardware
resources compared to the sum-product algorithm (SPA).