2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing 2014
DOI: 10.1109/sbac-pad.2014.43
|View full text |Cite
|
Sign up to set email alerts
|

Runtime Support for Adaptive Spatial Partitioning and Inter-Kernel Communication on GPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(3 citation statements)
references
References 13 publications
0
3
0
Order By: Relevance
“…Adriaens et al [1] proposed the use of spatial multitasking to group SMs into different sets that can run different kernels (up to four) in order to maximize application speedup. Ukidave et al [23] studied the real-time support for adaptive spatial partitioning on GPUs and highlighted the importance of L2 in this process. Aguilera et al [2] demonstrated the unfairness of spatial multitasking and proposed a fair resource allocation strategy for both performance and fairness.…”
Section: Related Workmentioning
confidence: 99%
“…Adriaens et al [1] proposed the use of spatial multitasking to group SMs into different sets that can run different kernels (up to four) in order to maximize application speedup. Ukidave et al [23] studied the real-time support for adaptive spatial partitioning on GPUs and highlighted the importance of L2 in this process. Aguilera et al [2] demonstrated the unfairness of spatial multitasking and proposed a fair resource allocation strategy for both performance and fairness.…”
Section: Related Workmentioning
confidence: 99%
“…Aguilera et al [2] improve the fairness of Spatial Multitasking by balancing the individual performance and the overall performance. Ukidave et al [49] extend the OpenCL run-time environment to explore several dynamic spatial multiprogramming approaches. Compared to the architectural approaches, the software approaches require source code modification.…”
Section: Related Work Gpu Concurrent Kernel Executionmentioning
confidence: 99%
“…Round-robin CTA scheduling may lead to imbalanced execution in two specific scenarios. First, a small kernel, due to algorithmic limitations or due to a small input data set [26], [27], [28], may occupy only a subset of the SMs and may lead to imbalance across the SMs, i.e., some SMs are assigned more CTAs than others. Second, when co-executing multiple kernels through spatial multitasking, some local crossbars may be over-utilized while others are under-utilized.…”
Section: Topology-aware Cta Schedulingmentioning
confidence: 99%