Executing Dynamic Data Rate Actor Networks on OpenCL Platforms

Boutellier, Jani; Hautala, Ilkka

doi:10.1109/sips.2016.25

Cited by 6 publications

(7 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The frame size used was 1920×1080, which resulted in the token size becoming 1.98 megabytes. Due to the large token size, the token rate was kept at 1 (in our previous publication [6] that uses a preliminary version of PRUNE, resolution was 320x240 with a token rate of 4). GPU acceleration was applied to Motion Detection by mapping the Gauss, Thres and Med actors to the GPU.…”

Section: Resultsmentioning

confidence: 99%

“…Finally, compilation with the target-specific C compiler requires the PRUNE run-time library, which contains application independent actor wrappers, FIFO implementations and OpenCL support. The following detailed description of the PRUNE runtime framework contains some extension compared to our preliminary work [6], where it was first presented. For example, Equation 2 has been generalized to support token delays > 1.…”

Section: The Prune Frameworkmentioning

confidence: 99%

“…The MoC presented in this article has been published [6] recently, and in this article the MoC is complemented with design rules that enable decidability analysis. Based on the MoC, the article describes a novel Linux-arXiv:1802.06625v1 [cs.DC] 19 Feb 2018…”

Section: Introductionmentioning

confidence: 99%

“…PRUNE 1) implements application consistency analysis, 2) provides an efficient runtime memory-and concurrency management framework for heterogeneous platforms, 3) presents a compile-time translator that allows importing programs from previous similar run-time frameworks. Out of these, items 1) and 3) are novel compared to [6].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

PRUNE: Dynamic and Decidable Dataflow for Signal Processing on Heterogeneous Platforms

Boutellier

Huttunen

et al. 2018

IEEE Trans. Signal Process.

View full text Add to dashboard Cite

Abstract-The majority of contemporary mobile devices and personal computers are based on heterogeneous computing platforms that consist of a number of CPU cores and one or more Graphics Processing Units (GPUs). Despite the high volume of these devices, there are few existing programming frameworks that target full and simultaneous utilization of all CPU and GPU devices of the platform.This article presents a dataflow-flavored Model of Computation (MoC) that has been developed for deploying signal processing applications to heterogeneous platforms. The presented MoC is dynamic and allows describing applications with data dependent run-time behavior. On top of the MoC, formal design rules are presented that enable application descriptions to be simultaneously dynamic and decidable. Decidability guarantees compile-time application analyzability for deadlock freedom and bounded memory.The presented MoC and the design rules are realized in a novel Open Source programming environment "PRUNE" and demonstrated with representative application examples from the domains of image processing, computer vision and wireless communications. Experimental results show that the proposed approach outperforms the state-of-the-art in analyzability, flexibility and performance.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: The Prune Frameworkmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

PRUNE: Dynamic and Decidable Dataflow for Signal Processing on Heterogeneous Platforms

Boutellier

Huttunen

et al. 2018

IEEE Trans. Signal Process.

View full text Add to dashboard Cite

show abstract

“…The parallel processing offered by the multicore CPU allows executing DPD coefficient learning concurrently with filtering. The software implementation is based on a dataflow programming environment [6] that takes care of data transfer and synchronization between the CPU cores and the GPU. The GPU code is written in OpenCL for cross-platform portability, whereas the DPD functionalities executed on the CPU cores are written in C.…”

Section: Introductionmentioning

confidence: 99%

Digital Predistortion for 5G Small Cell: GPU Implementation and RF Measurements

Campo

Lampu

Meirhaeghe

et al. 2019

J Sign Process Syst

View full text Add to dashboard Cite

In this paper, we present a high data rate implementation of a digital predistortion (DPD) algorithm on a modern mobile multicore CPU containing an on-chip GPU. The proposed implementation is capable of running in real-time, thanks to the execution of the predistortion stage inside the GPU, and the execution of the learning stage on a separate CPU core. This configuration, combined with the low complexity DPD design, allows for more than 400 Msamples/s sample rates. This is sufficient for satisfying 5G new radio (NR) base station radio transmission specifications in the sub-6 GHz bands, where signal bandwidths up to 100 MHz are specified. The linearization performance is validated with RF measurements on two base station power amplifiers at 3.7 GHz, showing that the 5G NR downlink emission requirements are satisfied.

show abstract