VLSI implementation of a low-complexity LLL lattice reduction algorithm for MIMO detection

Bruderer, Lukas; Studer, Christoph; Wenk, M.; Seethaler, D.; Burg, Andreas

doi:10.1109/iscas.2010.5537742

Cited by 23 publications

(16 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It offers the highest throughput (832.5 Mb/s @ 333 MHz) and the lowest number of average cycles per LR-reduced matrix. However, the tradeoff is that the design relies on limiting of the number of column swaps in the LLL reduction process to only 4, which leads to a significant deviation from ML diversity as can be observed in the BER performance results in [15]. This problem has been alleviated by the proposed fixed-throughput ML diversity design in this paper.…”

Section: B Design Comparisonmentioning

confidence: 98%

“…2) The proposed LR design in this paper has a significant advantage of being the only design with fixed throughput, independent of the correlation of the channel matrix. For the other reported LR implementations, the throughput and processing latency results represent an average, since their exact number of required cycles depends on the correlation of the input matrix R. 3) One important LR implementation is the design presented in [15]. It offers the highest throughput (832.5 Mb/s @ 333 MHz) and the lowest number of average cycles per LR-reduced matrix.…”

Section: B Design Comparisonmentioning

confidence: 99%

See 1 more Smart Citation

High-Throughput 0.13-$\mu{\rm m}$ CMOS Lattice Reduction Core Supporting 880 Mb/s Detection

Shabany

Youssef

Gulak

2013

IEEE Trans. VLSI Syst.

View full text Add to dashboard Cite

This paper presents the first silicon-proven implementation of a lattice reduction (LR) algorithm, which achieves maximum likelihood diversity. The implementation is based on a novel hardware-optimized due to the Lenstra, Lenstra, and Lovász (LLL) algorithm, which significantly reduces its complexity by replacing all the computationally intensive LLL operations (multiplication, division, and square root) with low-complexity additions and comparisons. The proposed VLSI design utilizes a pipelined architecture that produces an LR-reduced matrix set every 40 cycles, which is a 60% reduction compared to current state-of-the-art LR field-programmable gate array implementations. The 0.13-µm CMOS LR core presented in this paper achieves a clock rate of 352 MHz, and thus is capable of sustaining a throughput of 880 Mb/s for 64-QAM multipleinput-multiple-output detection with superior performance while dissipating 59.4 mW at 1.32 V supply.Index Terms-Application-specific integrated circuit (ASIC) design, due to Lenstra, Lenstra, and Lovász (LLL) algorithm, lattice reduction, multiple-input-multiple-output (MIMO) detection, Seysen's algorithm.

show abstract

Section: B Design Comparisonmentioning

confidence: 98%

Section: B Design Comparisonmentioning

confidence: 99%

High-Throughput 0.13-$\mu{\rm m}$ CMOS Lattice Reduction Core Supporting 880 Mb/s Detection

Shabany

Youssef

Gulak

2013

IEEE Trans. VLSI Syst.

View full text Add to dashboard Cite

show abstract

“…However this implementation only considers slower off-the-shelf FPGA components, including the use of square root and division operations that have not been optimized. The FPGA and application-specific integrated circuit (ASIC) implementation [34] claims to achieve a "fivefold improvement in terms of throughput at the cost of only slightly more FPGA resources" over [26] and [32]. This work uses CORDIC units along with a modification of the LLL algorithm by replacing the size-reduction criterion with the reverse Siegel condition.…”

Section: Existing Workmentioning

confidence: 99%

“…For this reason we are unable to provide a direct comparison of our architecture with previously published work. Nevertheless, it is still possible to compare our implementation with three state-of-the-art VLSI implementations of hard-output LRAD-based MIMO detectors [32], [26], [34].…”

Section: Comparisons With Previously Published Workmentioning

confidence: 99%

“…Bruderer et al [34]. We caution that the results in Table 3 need to be interpreted carefully, however, since it is well known that hard-output MIMO detectors such as [32], [26] and [34] do not facilitate high-performance iterative receivers involving joint detection and decoding when error-control codes such as turbo codes and LDPC codes are employed [37], [22]. The proposed approach therefore trades off increased latency for improved BER performance and the ability to readily deal with dense constellations, e.g.…”

Section: Componentmentioning

confidence: 99%

See 1 more Smart Citation

A Digital Signal Processing Architecture for Soft-Output MIMO Lattice Reduction Aided Detection

Murray¹,

Weller²

2013

Design and Architectures for Digital Signal Processing

View full text Add to dashboard Cite

High performance lattice reduction on heterogeneous computing platform

et al. 2014

View full text Add to dashboard Cite

Jozsa, CM.; Domene Oltra, F.; Vidal Maciá, AM.; Piñero Sipán, MG.; González Salvador, A. (2014). High performance lattice reduction on heterogeneous computing platform. Journal of Supercomputing. 70(2):772-785. doi:10.1007/s11227-014-1201-2. Abstract The lattice reduction (LR) technique has become very important in many engineering fields. However, its high complexity makes difficult its use in real-time applications, especially in applications that deal with large matrices. As a solution, the Modified Block LLL (MB-LLL) algorithm was introduced in [10], where several levels of parallelism were exploited: (i.) coarse-grained parallelism was achieved by applying the block-reduction concept presented in [15] and (ii.) fine-grained parallelism was achieved through the Cost Reduced All-Swap LLL (CR-AS-LLL) algorithm introduced in [10].In this paper, we present the Cost Reduced MB-LLL (CR-MB-LLL) algorithm, which allows to significantly reduce the computational complexity of the MB-LLL by allowing the relaxation of the first LLL condition while executing the LR of submatrices, resulting in the delay of the GS coefficients update and by using less costly procedures during the boundary checks. The effects of complexity reduction and implementation details are analyzed and discussed for several architectures. A mapping of the CR-MB-LLL on a heterogenenous platform is proposed and it is compared with implementations running on a dynamic parallelism enabled GPU and a multi-core CPU. The mapping on the architecture proposed allows a dynamic scheduling of kernels where the overhead introduced is hidden by the use of several CUDA streams. Results show that the execution time of the CR-MB-LLL algorithm on the heterogeneous platform outperforms the multi-core CPU and it is more efficient than the CR-AS-LLL algorithm in case of large matrices.

show abstract

VLSI implementation of a low-complexity LLL lattice reduction algorithm for MIMO detection

Cited by 23 publications

References 17 publications

High-Throughput 0.13-$\mu{\rm m}$ CMOS Lattice Reduction Core Supporting 880 Mb/s Detection

High-Throughput 0.13-$\mu{\rm m}$ CMOS Lattice Reduction Core Supporting 880 Mb/s Detection

A Digital Signal Processing Architecture for Soft-Output MIMO Lattice Reduction Aided Detection

High performance lattice reduction on heterogeneous computing platform

Contact Info

Product

Resources

About