The use of many-core processors such as general purpose Graphic Processing Units (GPUs) has recently become attractive for the efficient implementation of signal processing algorithms for communication systems. This is due to the costeffectiveness of GPUs together with their potential capability of parallel processing. This paper presents an implementation of the widely employed fixed-complexity sphere decoder on GPUs, which allows to considerably decrease the computational time required for the data detection stage in multiple-input multiple-output systems. Both, the hard-and soft-output versions of the method have been implemented. Speedup results show the proposed GPU implementation boosts the runtime of the parallel execution of the methods in a high performance multi-core CPU. In addition, the throughput of the algorithm is evaluated and is shown to outperform other recent implementations and to fulfill the real-time requirements of several LTE configurations.
Nowadays, several communication standards are emerging and evolving, searching higher transmission rates, reliability and coverage. This expansion is primarily driven by the continued increase in consumption of mobile multimedia services due to the emergence of new handheld devices such as smartphones and tablets.One of the most significant techniques employed to meet these demands is the use of multiple transmit and receive antennas, known as MIMO (Multiple Input Multiple Output) systems. The use of this technology allows to increase the transmission rate and the quality of the transmission through the use of multiple antennas at the transmitter and receiver sides.MIMO technologies have become an essential key in several wireless and broadband standards such as Wireless Local Area Network (WLAN), Worldwide interoperability for Microwave Acces (WiMAX), Long Term Evolution (LTE) and Next Generation Handheld (DVB-NGH), for the reception of Digital Terrestrial Television (DTT) in handheld devices. These technologies will be incorporated also in future standards, therefore is expected in the coming years a great deal of research in this field.Clearly, the study of MIMO systems is critical in the current investigation, however the problems that arise from this technology are very complex. High Performance Computing (HPC) systems, and specifically, modern hardware architectures as multi-core and many-cores (e.g Graphics Processing Units (GPU)) are playing a key role in the development of efficient and low-complexity algorithms for MIMO transmissions. Proof of this is that the number of scientific contributions and research projects related to its use has increased in the last years. Also, some high performance libraries have been implemented as tools for researchers or companies involved in the development of future communication standards. Two of the most popular libraries are: IT++ that is a library based on the use of some optimized libraries for multi-core processors and the Communications System Toolbox designed for use with MATLAB and Simulink, which uses GPU computing. However, there is not a library able to run on a heterogeneous platform using all the available resources.In view of the high computational requirements in MIMO application research and the shortage of tools able to satisfy them, we have made a vi Abstract special effort to develop a library to ease the development of adaptable parallel applications in accordance with the different architectures of the executing platform. The library, called MIMOPack, aims to implement efficiently using parallel computing, a set of functions to perform some of the critical stages of MIMO communication systems simulation.The main contribution of the thesis is the implementation of efficient Hard and Soft output detectors, since the detection stage is considered the most complex part of the communication process. These detectors are highly configurable and many of them include preprocessing techniques that reduce the computational cost and increase the perf...
Multi-core systems allow the e cient implementation of signal processing algorithms for communication systems due to their high parallel processing capabilities. In this paper, we present a high-throughput multi-core implementation of a fixed-complexity tree-search-based detector interesting for MIMO wireless communication systems. Experimental results confirm that this implementation allows to accelerate the data detection stage for di↵erent constellation sizes and number of subcarriers.
The number of transmit and receive antennas is an important factor that affects the performance and complexity of a MIMO system. A MIMO systems with very large number of antennas is a promising candidate technology for next generations of wireless systems. However, the vast majority of the methods proposed for conventional MIMO system are not suitable for large dimensions. In this context, the use of High Performance Computing (HPC) systems, such us multicore CPUs and Grapfhics Processing Units (GPUs) has become attractive for efficient implementation of parallel signal processing algorithms with high computational requirements. In the present work two practical parallel approaches of the Subspace Marginalization with Interference Suppression (SUMIS) detector for large MIMO systems have been proposed. Both approaches have been evaluated and compared in terms of performance and complexity with other detectors for different system parameters.
Fast parallel processing capability of general purpose Graphic Processing Units (GPU) can be exploited to accelerate the precoding calculation needed in spatially multiplexed wireless communication systems. In this paper, a GPU-based implementation of the well-known multiuser TomlinsonHarashima precoding (THP) scheme combined with a latticereduction (LR) stage is presented. The proposed approach allows the LR stage to be switched off when user requirements are achieved by using only THP. Moreover, our GPU implementation provides scalability in the number of subcarriers per symbol, which is a key factor in LTE and 4G wireless standards. Simulation results show that the GPUbased THP implementation performs up to 7 times faster than its CPU-equivalent whereas the LR stage implementation only achieves a speedup of 3. Despite the fact that the LR cannot be as efficiently parallelized as the THP, a speedup of nearly 6 is achieved when both are combined.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.