2019
DOI: 10.1016/j.jpdc.2018.11.001
|View full text |Cite
|
Sign up to set email alerts
|

Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems

Abstract: The emergence of heterogeneous systems has been very notable recently. Still their programming is a complex task. The co-execution of a single OpenCL kernel on several devices is a challenging endeavour, requiring considering the different computing capabilities of the devices and application behaviour. OmpSs is a framework for task based parallel applications, that does not support coexecution between several devices. This paper presents an extension of OmpSs that solves two main issues. First, the automatic … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 28 publications
0
3
0
Order By: Relevance
“…Other works [14,18,21] proposed novel adaptive scheduling mechanisms to partition the workload at the data level aiming at achieving load-balancing or power-saving. Many frameworks built based on these designs [8,16,19] can operate directly on OpenCL kernels. They typically start with a small portion of workload assigned to CPUs and the remainder to the accelerators (or vice versa).…”
Section: Related Workmentioning
confidence: 99%
“…Other works [14,18,21] proposed novel adaptive scheduling mechanisms to partition the workload at the data level aiming at achieving load-balancing or power-saving. Many frameworks built based on these designs [8,16,19] can operate directly on OpenCL kernels. They typically start with a small portion of workload assigned to CPUs and the remainder to the accelerators (or vice versa).…”
Section: Related Workmentioning
confidence: 99%
“…To partition generic programs dynamically at runtime, several authors have shown that partitioning on a data level is a viable option for both regular and irregular problems. [26][27][28][29] Some works [30][31][32] tackle the problem of accelerating numerical algorithms such as matrix multiplication and the fast Fourier transform on heterogeneous systems using data partitioning approaches. In ABSs, however, we observe a strong locality of dependencies, as agents primarily interact with nearby agents, that is, their neighbors.…”
Section: Related Workmentioning
confidence: 99%
“…The runtime behavior of ABS not only potentially varies based on input parameters but can also substantially change over the course of a simulation run, requiring regular retraining of machine‐learning models to achieve good performance. To partition generic programs dynamically at runtime, several authors have shown that partitioning on a data level is a viable option for both regular and irregular problems 26‐29 . Some works 30‐32 tackle the problem of accelerating numerical algorithms such as matrix multiplication and the fast Fourier transform on heterogeneous systems using data partitioning approaches.…”
Section: Related Work and Backgroundmentioning
confidence: 99%