2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT) 2019
DOI: 10.1109/pdcat46702.2019.00034
|View full text |Cite
|
Sign up to set email alerts
|

Tasking in Accelerators: Performance Evaluation

Abstract: In this work, we analyze the implications and results of implementing dynamic parallelism, concurrent kernels and CUDA Graphs to solve task-oriented problems. As a benchmark we propose three different methods for solving DGEMM operation on tiled-matrices; which might be the most popular benchmark for performance analysis. For the algorithms that we study, we present significant differences in terms of data dependencies, synchronization and granularity. The main contribution of this work is determining which of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 14 publications
0
5
0
Order By: Relevance
“…Using Dynamic Parallelism the programmers could invoke kernels inside the device without the need for switching context back to the CPU. However, the launching of kernels from other kernels has a large associated computational cost [1].…”
Section: Background 21 Cudamentioning
confidence: 99%
See 2 more Smart Citations
“…Using Dynamic Parallelism the programmers could invoke kernels inside the device without the need for switching context back to the CPU. However, the launching of kernels from other kernels has a large associated computational cost [1].…”
Section: Background 21 Cudamentioning
confidence: 99%
“…Although GPU capacity has increased significantly, the scalability of algorithms and applications still faces important challenges [1]. One important problem regarding scalability is the hardware resource assignment.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…It is undeniable that GPU capabilities have been increasing significantly in terms of performance and memory capacity. However, some applications are facing problems in terms of scalability and some algorithms seem to limit the amount of work that one GPU can perform at a single time [1]. This is mainly due to the assignment of hardware resources and the occupancy of the device, which makes it difficult to benefit from the whole GPU capacity.…”
Section: Introductionmentioning
confidence: 99%
“…For these reasons, these have gained a broad acceptance. Some representative task-based models are Intel Threading Building Blocks [27], CUDA graphs [23], OpenMP [13] and OmpSs [7]. The two former are hardware-centric models that expose the architectural features in the language, requiring programmers a considerable level of expertise to achieve productivity, while also preventing portability.…”
Section: Introductionmentioning
confidence: 99%