2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) 2018
DOI: 10.1109/cahpc.2018.8645867
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Partitioning for Iterated Sequences of Irregular OpenCL Kernels

Abstract: OpenCL defines a common parallel programming language for all devices, although writing tasks adapted to the devices, managing communication and load-balancing issues are left to the programmer. We propose in this paper a static/dynamic approach for the execution of an iterated sequence of datadependent kernels on a multi-device heterogeneous architecture. The method allows to automatically distribute irregular kernels onto multiple devices and tackles, without training, both load balancing and data transfers … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 10 publications
0
3
0
Order By: Relevance
“…To distribute the work to each processing unit, we rely on an adaptive partitioning algorithm [12]. The algorithm implemented in OPENCARP consists in the following steps:…”
Section: B Load Balancingmentioning
confidence: 99%
See 1 more Smart Citation
“…To distribute the work to each processing unit, we rely on an adaptive partitioning algorithm [12]. The algorithm implemented in OPENCARP consists in the following steps:…”
Section: B Load Balancingmentioning
confidence: 99%
“…Furthermore, due to the lightweight nature of OPENCARP ionic models kernels and the wave-oriented layout of their execution, instead of a pure runtime approach balancing tasks among devices such as STARPU, we propose an hybrid compiler/runtime approach. Thus, we adapted the load balancing algorithm proposed by Huchant et al [12], [13] to match OPENCARP needs. At each computation iteration, this algorithm adapts the workload chunk size given to each device (if needed) according to the execution time of the previous iteration.…”
Section: Introductionmentioning
confidence: 99%
“…33 Therefore, we propose a generalizable approach to partition the workload on the function level. Other works that partition on the function level either require periodic re-evaluation of the current assignment to adjust to the irregularly evolving workload, 34 or they need to profile the hardware and program offline. 35 To reduce overhead, we propose a heuristic to dynamically trigger re-evaluation of the hardware assignment.…”
Section: Related Workmentioning
confidence: 99%