Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: High-Throughput QR Decomposition for MIMO Detection in OFDM Systems
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
High-Throughput QR Decomposition for MIMO Detection in OFDM Systems

[attachment=48448]


Abstract—

In this paper, we aim to design and implement a
high-throughput QR decomposition architecture for 4 × 4
MIMO signal detection problems. A real-value decomposed
MIMO system model is handled and thus the channel matrix
to be processed is extended to the size 8×8. Instead of direct
factorization, we propose a QR decomposition scheme by
cascading one complex-value and one real-value Givens
rotation blocks, which can save 44% hardware complexity.
The systolic array is adopted for hardware implementation to
facilitate pipeline design. Then, the requirement of skewed
inputs to the conventional complex-value QR-decomposition
systolic array is improved and 37% of delay elements are
removed. The real-value Givens rotation stage is implemented
by a stacked triangular systolic array to match with the
throughput of the complex-value one. We have implemented
the proposed design in 0.18 μm CMOS technology with 152K
gates. From post-layout simulations, the maximum operating
frequency can achieve 90.09MHz. The proposed scheme not
only reduces the hardware complexity, but also supports high
throughput for MIMO-OFDM signal detection up to 2.16
Gbps under stationary channels.


INTRODUCTION

Multiple input multiple output (MIMO) techniques
have been widely adopted to increase the data transmission
rate or to improve the quality of services (QoS) in recent
wireless communication systems, like IEEE 802.11n,
802.16e/m and 3GPP-LTE. Thus, MIMO detection plays an
important role in their transceiver design. To detect MIMO
signals, channel matrix inversion or triangularization is
often required in these methods such as zero-forcing (ZF)
detection, minimum mean squared error (MMSE) detection
and sphere decoding. Hence, to meet the demands of a high
transmission rate, a high-throughput QR decomposition
module is necessary.
Three known algorithms are widely used to achieve QR
decomposition. The Gram-Schmidt algorithm obtains the
orthogonal basis spanning the column space of the matrix to
be decomposed. Meanwhile, the orthogonality principle is
utilized to derive the upper triangular matrix [1]. The
Householder transformation (HT) tries to zero out the most
elements of each column vector at a stroke by reflection
operations to get the upper triangular matrix [2]. On the
contrary, the Givens rotation (GR) zeros one element of the
matrix at a time by two-dimensional rotation [3]. In [4], it
has been shown that without norm and division operations,
the Givens rotation is advantageous to perform QR
decomposition by the CORDIC algorithm.



ARCHITECTURE DESIGN

The entire block diagram of the proposed QR
decomposition architecture is shown in Fig. 3. The delay
buffer is inserted between the complex stage and the real
stage for scheduling the data streams. The systolic array has
been employed to implement the hardware of the complex
QR decomposition in [3]. However, in the conventional
design, the hardware is not fully utilized. We then first
remove the hardware redundancy in the conventional
complex Givens rotation stage and design a real Given
rotation systolic array with a processing rate matching with
the first stage.
The proposed complex Givens rotation hardware is
shown in Fig. 4. Four input complex signals are fed into the
systolic array. The complex processing element (PE)
implements the rotation matrix in (1), and the delay unit
(DU) is simply inserted to adjust the timing of two paired
sequences. When the signal marked by a rectangle enters
into the PE from the left, the PE is operated in the
generation mode, which generates θ
x, θ
y, and θz.. Otherwise,
the PE is configured in the rotation mode, in which the
input signal pair is simply rotated according to the previous
generated angles.
Note that in the generation mode, the output signal
produced from the bottom of the PE is zero and
simultaneously the output leaving from the right of the PE
becomes a real number without the imaginary part as the
upper left entry indicated in (2). Consequently, the CORDIC
module handling the rotation of θ
y is not necessary to be
activated in the next PE into which the real signal enters.
We can eliminate these idle CORDIC modules to save the
hardware resources. As indicated in Fig. 4, three types of the
complex PEs are employed. Each contains four, three, and
two CORDIC modules, respectively.



CONCLUSION
In this paper, we have implemented a high-throughput
QR decomposition module for 4×4 MIMO-OFDM systems.
We combine both the complex Givens rotation algorithm
and the real Givens rotation algorithm to improve the
required CORDIC operations. Moreover, we eliminate the
delay buffers for skewed input sequences in the
conventional complex Givens rotations block and improve
hardware utilization in the real givens rotation by the timesharing
technique. We have shown that the proposed
architecture has good hardware efficiency to achieve the
more complicated QR decomposition of the 8×8 real-value
decomposed channel matrix. According to the
implementation results, the operating frequency can
achieve 90.09MHz in 0.18μm CMOS technology. It can
complete the QR decomposition every 44 ns as well as
Qk yk ~ H~ calculation every 11 ns and support MIMO detection
up to 2.16 Gbps.