Proceedings of the 29th Annual ACM Symposium on Applied Computing 2014
DOI: 10.1145/2554850.2555018
|View full text |Cite
|
Sign up to set email alerts
|

On the support of task-parallel algorithmic skeletons for multi-GPU computing

Abstract: An emerging trend in the field of Graphics Processing Unit (GPU) computing is the harnessing of multiple devices to cope with scalability and performance requirements. However, multi-GPU execution adds new challenges to the already complex world of General Purpose computing on GPUs (GPGPU), such as the efficient problem decomposition, and dealing with device heterogeneity. To this extent, we propose the use of the Marrow algorithmic skeleton framework (ASkF) to abstract most of the details intrinsic to the pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
3
1
1
1

Relationship

2
4

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 11 publications
0
5
0
Order By: Relevance
“…The experiment is repeated for different input data sizes. The same experiments are conducted with the existing Marrow framework [6] [18], and the obtained results are compared with the proposed workload division strategy. The Marrow framework uses OpenCL computations on both CPU and GPUs on the heterogeneous cluster and static workload division strategy to distribute the workload manually based on the CPU and GPUs' hardware configuration.…”
Section: Linpack Benchmark Applicationmentioning
confidence: 99%
“…The experiment is repeated for different input data sizes. The same experiments are conducted with the existing Marrow framework [6] [18], and the obtained results are compared with the proposed workload division strategy. The Marrow framework uses OpenCL computations on both CPU and GPUs on the heterogeneous cluster and static workload division strategy to distribute the workload manually based on the CPU and GPUs' hardware configuration.…”
Section: Linpack Benchmark Applicationmentioning
confidence: 99%
“…We establish this order relation for both integer and floating-point arithmetic by running the SHOC benchmark suite [20] at the framework's installation-time. This static approach, although simple, delivers good performance results for GPU-accelerated executions [9], due to the specialized nature of the underlying execution model: one kernel execution at a time, with no preemption and no input/output operations. These premises are not valid for CPU executions.…”
Section: Workload Distributionmentioning
confidence: 99%
“…We claim that these characteristics can be used to: (a) hide the heterogeneity of the underlying hardware and, (b) provide tools to cope with such heterogeneity, enabling device-specific problem decompositions and optimizations. To that extent, we have been developing the Marrow algorithmic skeleton framework [8,9,10] for the orchestration of OpenCL computations. Marrow offers both data and task-parallel skeletons and is the first framework on the GPU computing field to support skeleton composition, through nesting.…”
Section: Introductionmentioning
confidence: 99%
“…In [9] we addressed this issue for heterogeneous multi-GPU environments. The workload is statically distributed among the devices, according to their relative performance.…”
Section: Work-load Distributionmentioning
confidence: 99%
“…We claim that these characteristics can be used to, on one hand, hide the heterogeneity of the underlying hardware and, on the other, provide tools to cope with such heterogeneity, enabling device-specific parallel decompositions and optimizations. To that extent, we have been developing an algorithmic skeleton framework, entitled Marrow [8,9], for the orchestration of OpenCL kernels. Marrow offers both data and task-parallel skeletons, and is the first on the GPU computing field to support skeleton composition, through nesting.…”
Section: Introductionmentioning
confidence: 99%