Jozsa, CM.; Domene Oltra, F.; Vidal Maciá, AM.; Piñero Sipán, MG.; González Salvador, A. (2014). High performance lattice reduction on heterogeneous computing platform. Journal of Supercomputing. 70(2):772-785. doi:10.1007/s11227-014-1201-2. Abstract The lattice reduction (LR) technique has become very important in many engineering fields. However, its high complexity makes difficult its use in real-time applications, especially in applications that deal with large matrices. As a solution, the Modified Block LLL (MB-LLL) algorithm was introduced in [10], where several levels of parallelism were exploited: (i.) coarse-grained parallelism was achieved by applying the block-reduction concept presented in [15] and (ii.) fine-grained parallelism was achieved through the Cost Reduced All-Swap LLL (CR-AS-LLL) algorithm introduced in [10].In this paper, we present the Cost Reduced MB-LLL (CR-MB-LLL) algorithm, which allows to significantly reduce the computational complexity of the MB-LLL by allowing the relaxation of the first LLL condition while executing the LR of submatrices, resulting in the delay of the GS coefficients update and by using less costly procedures during the boundary checks. The effects of complexity reduction and implementation details are analyzed and discussed for several architectures. A mapping of the CR-MB-LLL on a heterogenenous platform is proposed and it is compared with implementations running on a dynamic parallelism enabled GPU and a multi-core CPU. The mapping on the architecture proposed allows a dynamic scheduling of kernels where the overhead introduced is hidden by the use of several CUDA streams. Results show that the execution time of the CR-MB-LLL algorithm on the heterogeneous platform outperforms the multi-core CPU and it is more efficient than the CR-AS-LLL algorithm in case of large matrices.
The energy transmitted per bit limits the radio coverage. In impulse radio the UWB pulses used carry a very little energy since they are extremely short. As a consequence the radio coverage is unacceptable short. A solution to increase the energy per bit is the enlargement of the duration of UWB carrier pulse, however, this solution cannot be used because the correlation of received pulse envelope with a reference pulse defined in IEEE Std. 802.15.4a has to exceed a prescribed value. This problem can be overcome if the pulse compression approach is used. This technique, where the duration of radiated UWB carrier pulse is enlarged considerably to get enough energy per bit and the duration of received UWB pulse is compressed by a matched filter at the receiver, is introduced in this contribution. The increased bit energy increases the radio coverage and the envelope of compressed UWB pulse satisfies the requirements of IEEE Std. 802.15.4a. The gains in energy per bit are about 18 dB and 22 dB when the UWB pulse durations are set to 100 ns and 300 ns, respectively.
Multiple-input multiple-output (MIMO) systems have attracted considerable attention in wireless communications because they offer a significant increase in data throughput and link coverage without additional bandwidth requirement or increased transmit power. The price that has to be paid is the increased complexity of hardware components and algorithms. The sphere detector (SD) algorithm solves the problem of maximum likelihood (ML) detection for MIMO channels by significantly reducing the search space of possible solutions. The main drawback of the SD algorithm is in its sequential nature, consequently, running it on massively parallel architectures (MPAs) is very inefficient. In order to overcome the drawbacks of the SD algorithm, a new parallel sphere detector (PSD) algorithm is proposed. It implements a novel hybrid tree search method, where the algorithm parallelism is assured by the efficient combination of depth-first search and breadth-first search algorithms. A path metric-based parallel sorting is employed at each intermediate stage. The PSD algorithm is able to adjust its memory requirements and extent of parallelism to fit a wide range of parallel architectures. Mapping details for MPAs are proposed by giving the details of thread dependent, highly parallel building blocks of the algorithm. Based on the building blocks proposed, a mapping to general-purpose graphics processing unit is provided, and its performance is evaluated. In order to achieve high-throughput, several levels of parallelism are introduced, and different scheduling strategies are considered.In the first approach the robustness of MIMO is maximized, that is, the probability of error is minimized with the use of space-time codes (STCs). STCs rely on transmitting different representations of the same data stream on different parallel transmit branches, that is, it introduces controlled redundancy in both space and time.Spatial Multiplexing (SM), the second approach, focuses on maximizing the capacity of a radio link by transmitting independent data streams on different transmit branches simultaneously and within the same frequency band. The price that has to be paid is the increased complexity of detection hardware components and algorithms. The complexity of detection algorithms depends on many factors, such as antenna configuration, modulation order, channel, and coding.With regard to the bit error rate (BER) performance, the maximum likelihood (ML) detector offers the best BER performance; however, its exponential complexity is not suitable for real-time applications. The SD algorithm has been proposed in the literature to significantly reduce the search space of possible solutions while still providing the ML solution. For a few good examples, refer to [2-4] and [5].In non-optimal detectors, the complexity of the sphere detector (SD) algorithm is reduced by introducing some approximations such as (i) early termination of the search, (ii) introducing constraints on the maximum number of nodes that the detector algorithm is allowed ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.