We propose a novel data detection algorithm and a corresponding very large scale integration (VLSI) design for massive multi-user (MU) multiple-input multiple-output (MIMO) wireless systems. Our algorithm uses alternating direction method of multipliers (ADMM)-based infinity norm constrained equalization and is called ADMIN. ADMIN is an iterative algorithm that outperforms linear detectors by a large margin when the ratio between the numbers of base-station (BS) and user antennas is small. In the first iteration, ADMIN computes the linear minimum mean-square error (MMSE) solution, which is sufficient when the ratio between the numbers of BS and user antennas is large. We develop time-shared and iterative VLSI architectures for LDL-decomposition based soft-output ADMIN supporting 16-and 32-user systems. We present applicationspecific integrated circuit (ASIC) designs for 16 to 64 antenna base stations in 28 nm complementary metaloxidesemiconductor (CMOS) that supports up to 64 quadrature amplitude modulation (QAM). The 16-user ADMIN ASIC achieves 303 Mbps while dissipating 85 mW. The 32-user ADMIN ASIC achieves 287 Mbps and 241 Mbps while dissipating 121 mW and 135 mW for 32 and 64 BS antennas, respectively. ADMIN has also been implemented on a Xilinx Virtex-7 field-programmable gate array (FPGA) and is compared with state-of-the-art massive MIMO data detectors.
Heterogeneous computing platforms consisting of general purpose processors (GPPs) and graphics processing units (GPUs) have become commonplace in personal mobile devices and embedded systems. For years, programming of these platforms was very tedious and simultaneous use of all available GPP and GPU resources required low-level programming to ensure efficient synchronization and data transfer between processors. However, in the last few years several high-level programming frameworks have emerged, which enable programmers to describe applications by means of abstractions such as dataflow or Kahn process networks and leave parallel execution, data transfer and synchronization to be handled by the framework.Unfortunately, even the most advanced high-level programming frameworks have had shortcomings that limit their applicability to certain classes of applications. This paper presents a new, dataflow-flavored programming framework targeting heterogeneous platforms, and differs from previous approaches by allowing GPUmapped actors to have data dependent consumption of inputs / production of outputs. Such flexibility is essential for configurable and adaptive applications that are becoming increasingly common in signal processing. In our experiments it is shown that this feature allows up to 5× increase in application throughput.The proposed framework is validated by application examples from the video processing and wireless communications domains. In the experiments the framework is compared to a well-known reference framework and it is shown that the proposed framework enables both a higher degree of flexibility and better throughput.
The Adaptive Loop Filter (ALF) is a subjective and objective image quality improving filter in the High Efficiency Video Coding standard (HEVC). The ALF has shown to be computationally complex and its complexity has been reduced during the HEVC development process. In the HEVC Test Model HM-7.0 ALF is a 9×7 cross + 3×3 square shaped filter.This paper presents a programmable application specific instruction processor for the ALF. The proposed processor processes 1920×1080p luminance frames at 30 frames per second, when operated at a clock frequency of 311 MHz. Low power consumption and a low gate count make the proposed processor suitable for embedded devices. The processor program code is written in pure C-language, which allows versatile use of the circuit and updates to the filter functionality without modifying the processor design. To the authors' best knowledge this is the first programmable solution for ALF on embedded devices.
An application specific programmable processor is designed based on the analysis of a set of greedy recovery Compressive Sensing (CS) algorithms. The solution is flexible and customizable for a wide range of problem dimensions, as well as algorithms. The versatility of the approach is demonstrated by implementing Orthogonal Matching Pursuits, Approximate Messaging Passing and Normalized Iterative Hard Thresholding algorithms, all using a high-level language. Transported Triggered Architecture (TTA) framework is employed for the efficient implementation of macro operations shared by the algorithms. The performance of the CS algorithms on ARM Cortex-A15 and NIOS II processors has also been investigated, and empirical comparisons are presented. The flexible hardware design implemented on an FPGA achieves up to 7.80Ksample/s recovery at a power dissipation of 42μJ/sample and beats both ARM and NIOS in total power consumption.
Low-level sensory data processing in many Internet-of-Things (IoT) devices pursue energy efficiency by utilizing sleep modes or slowing the clocking to the minimum. To curb the share of stand-by power dissipation in those designs, near-threshold/sub-threshold operational points or ultra-low-leakage processes in fabrication are employed. Those limit the clocking rates significantly, reducing the computing throughputs of individual processing cores. In this contribution we explore compensating for the performance loss of operating in near-threshold region (Vdd =0.6V) through massive parallelization. Benefits of near-threshold operation and massive parallelism are optimum energy consumption per instruction operation and minimized memory roundtrips, respectively. The Processing Elements (PE) of the design are based on Transport Triggered Architecture. The fine grained programmable parallel solution allows for fast and efficient computation of learnable low-level features (e.g. local binary descriptors and convolutions). Other operations, including Max-pooling have also been implemented. The programmable design achieves excellent energy efficiency for Local Binary Patterns computations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.