Ilkka Hautala scite author profile

We propose a novel data detection algorithm and a corresponding very large scale integration (VLSI) design for massive multi-user (MU) multiple-input multiple-output (MIMO) wireless systems. Our algorithm uses alternating direction method of multipliers (ADMM)-based infinity norm constrained equalization and is called ADMIN. ADMIN is an iterative algorithm that outperforms linear detectors by a large margin when the ratio between the numbers of base-station (BS) and user antennas is small. In the first iteration, ADMIN computes the linear minimum mean-square error (MMSE) solution, which is sufficient when the ratio between the numbers of BS and user antennas is large. We develop time-shared and iterative VLSI architectures for LDL-decomposition based soft-output ADMIN supporting 16-and 32-user systems. We present applicationspecific integrated circuit (ASIC) designs for 16 to 64 antenna base stations in 28 nm complementary metaloxidesemiconductor (CMOS) that supports up to 64 quadrature amplitude modulation (QAM). The 16-user ADMIN ASIC achieves 303 Mbps while dissipating 85 mW. The 32-user ADMIN ASIC achieves 287 Mbps and 241 Mbps while dissipating 121 mW and 135 mW for 32 and 64 BS antennas, respectively. ADMIN has also been implemented on a Xilinx Virtex-7 field-programmable gate array (FPGA) and is compared with state-of-the-art massive MIMO data detectors.

show abstract

Programmable Low-Power Multicore Coprocessor Architecture for HEVC/H.265 In-Loop Filtering

Hautala

Boutellier

Hannuksela

et al. 2015

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

Programmable 28nm coprocessor for HEVC/H.265 in-loop filters

Hautala

Boutellier

Siiven

2016

View full text Add to dashboard Cite

Executing Dynamic Data Rate Actor Networks on OpenCL Platforms

Boutellier

Hautala

2016

View full text Add to dashboard Cite

Heterogeneous computing platforms consisting of general purpose processors (GPPs) and graphics processing units (GPUs) have become commonplace in personal mobile devices and embedded systems. For years, programming of these platforms was very tedious and simultaneous use of all available GPP and GPU resources required low-level programming to ensure efficient synchronization and data transfer between processors. However, in the last few years several high-level programming frameworks have emerged, which enable programmers to describe applications by means of abstractions such as dataflow or Kahn process networks and leave parallel execution, data transfer and synchronization to be handled by the framework.Unfortunately, even the most advanced high-level programming frameworks have had shortcomings that limit their applicability to certain classes of applications. This paper presents a new, dataflow-flavored programming framework targeting heterogeneous platforms, and differs from previous approaches by allowing GPUmapped actors to have data dependent consumption of inputs / production of outputs. Such flexibility is essential for configurable and adaptive applications that are becoming increasingly common in signal processing. In our experiments it is shown that this feature allows up to 5× increase in application throughput.The proposed framework is validated by application examples from the video processing and wireless communications domains. In the experiments the framework is compared to a well-known reference framework and it is shown that the proposed framework enables both a higher degree of flexibility and better throughput.

show abstract

Programmable lowpower implementation of the HEVC Adaptive Loop Filter

Hautala

Boutellier

Hannuksela

2013

View full text Add to dashboard Cite

The Adaptive Loop Filter (ALF) is a subjective and objective image quality improving filter in the High Efficiency Video Coding standard (HEVC). The ALF has shown to be computationally complex and its complexity has been reduced during the HEVC development process. In the HEVC Test Model HM-7.0 ALF is a 9×7 cross + 3×3 square shaped filter.This paper presents a programmable application specific instruction processor for the ALF. The proposed processor processes 1920×1080p luminance frames at 30 frames per second, when operated at a clock frequency of 311 MHz. Low power consumption and a low gate count make the proposed processor suitable for embedded devices. The processor program code is written in pure C-language, which allows versatile use of the circuit and updates to the filter functionality without modifying the processor design. To the authors' best knowledge this is the first programmable solution for ALF on embedded devices.

show abstract

An Embedded Programmable Processor for Compressive Sensing Applications

Safarpour

Hautala

Silvén

2018

View full text Add to dashboard Cite

An application specific programmable processor is designed based on the analysis of a set of greedy recovery Compressive Sensing (CS) algorithms. The solution is flexible and customizable for a wide range of problem dimensions, as well as algorithms. The versatility of the approach is demonstrated by implementing Orthogonal Matching Pursuits, Approximate Messaging Passing and Normalized Iterative Hard Thresholding algorithms, all using a high-level language. Transported Triggered Architecture (TTA) framework is employed for the efficient implementation of macro operations shared by the algorithms. The performance of the CS algorithms on ARM Cortex-A15 and NIOS II processors has also been investigated, and empirical comparisons are presented. The flexible hardware design implemented on an FPGA achieves up to 7.80Ksample/s recovery at a power dissipation of 42μJ/sample and beats both ARM and NIOS in total power consumption.

show abstract

Programmable data parallel accelerator for mobile computer vision

Nylanden

Kultala

Hautala

et al. 2015

View full text Add to dashboard Cite

Transport Triggered Array Processor for Vision Applications

Safarpour

Hautala

López

et al. 2019

View full text Add to dashboard Cite

Low-level sensory data processing in many Internet-of-Things (IoT) devices pursue energy efficiency by utilizing sleep modes or slowing the clocking to the minimum. To curb the share of stand-by power dissipation in those designs, near-threshold/sub-threshold operational points or ultra-low-leakage processes in fabrication are employed. Those limit the clocking rates significantly, reducing the computing throughputs of individual processing cores. In this contribution we explore compensating for the performance loss of operating in near-threshold region (Vdd =0.6V) through massive parallelization. Benefits of near-threshold operation and massive parallelism are optimum energy consumption per instruction operation and minimized memory roundtrips, respectively. The Processing Elements (PE) of the design are based on Transport Triggered Architecture. The fine grained programmable parallel solution allows for fast and efficient computation of learnable low-level features (e.g. local binary descriptors and convolutions). Other operations, including Max-pooling have also been implemented. The programmable design achieves excellent energy efficiency for Local Binary Patterns computations.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ilkka Hautala

ADMM-Based Infinity-Norm Detection for Massive MIMO: Algorithm and VLSI Architecture

Programmable Low-Power Multicore Coprocessor Architecture for HEVC/H.265 In-Loop Filtering

Programmable 28nm coprocessor for HEVC/H.265 in-loop filters

Executing Dynamic Data Rate Actor Networks on OpenCL Platforms

Programmable lowpower implementation of the HEVC Adaptive Loop Filter

An Embedded Programmable Processor for Compressive Sensing Applications

Programmable data parallel accelerator for mobile computer vision

Transport Triggered Array Processor for Vision Applications

Contact Info

Product

Resources

About