2020
DOI: 10.1109/tpds.2019.2944602
|View full text |Cite
|
Sign up to set email alerts
|

cCUDA: Effective Co-Scheduling of Concurrent Kernels on GPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(4 citation statements)
references
References 35 publications
0
4
0
Order By: Relevance
“…For an application with ample scope for concurrency, we have observed that rather than relying on traditional coarse-grained scheduling decisions, implementing fine-grained scheduling policies using PySchedCL where the user specifies an intuitive task component partitioning T after examining the structure of a DAG application results in significantly better execution times. Future work entails investigating sophisticated low-level scheduling approaches such as sub-kernel partitioning [9], [25] at the work-item level for effective interleaving of concurrent kernels. Such approaches coupled with Machine Learning assisted control theoretic scheduling solutions [26] shall be used to develop an auto-tuning framework on top of PySchedCL which would automatically determine given an applicationarchitecture pair, the optimal allocation of command queues across devices in the platform.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…For an application with ample scope for concurrency, we have observed that rather than relying on traditional coarse-grained scheduling decisions, implementing fine-grained scheduling policies using PySchedCL where the user specifies an intuitive task component partitioning T after examining the structure of a DAG application results in significantly better execution times. Future work entails investigating sophisticated low-level scheduling approaches such as sub-kernel partitioning [9], [25] at the work-item level for effective interleaving of concurrent kernels. Such approaches coupled with Machine Learning assisted control theoretic scheduling solutions [26] shall be used to develop an auto-tuning framework on top of PySchedCL which would automatically determine given an applicationarchitecture pair, the optimal allocation of command queues across devices in the platform.…”
Section: Discussionmentioning
confidence: 99%
“…Another interesting observation would be that the individual execution times for each kernel increases slightly as a result of interleaving. This is due to the fact, that different work groups of different kernels that have been concurrently dispatched are scheduled in a round robin fashion to the compute units of the device, thus causing resource contention [9]. However, the total time for finishing kernels concurrently is lesser than the case when they are dispatched in sequence.…”
Section: Motivationmentioning
confidence: 99%
“…Wen et al [20] propose a graph-based algorithm to schedule kernels in pairs. The recent work of Shekofteh et al [6] proposes the co-scheduling of pair of kernels with different execution behaviors. They use kernel slicing to improve the choice of kernels pairs for co-scheduling.…”
Section: Related Workmentioning
confidence: 99%
“…However, GPU resource allocation is performed by the hardware, that assigns as many resources as possible for one task and then assigns the remaining resources to the next task, if there are sufficient leftover resources [5]. This allocation policy has shown to lead to an unreasonable use of resources [2], and to be influenced by the order in which the kernels are launched [6], [7]. Therefore, the launching of kernels to the GPU has to be wisely performed in order to avoid an unbalanced occupation of the GPU resources and its consequent negative effect on the system performance.…”
Section: Introductionmentioning
confidence: 99%