2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) 2016
DOI: 10.1109/isca.2016.29
|View full text |Cite
|
Sign up to set email alerts
|

Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

3
63
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 59 publications
(66 citation statements)
references
References 41 publications
3
63
0
Order By: Relevance
“…Compared to the optimized hardware-based concurrent kernel execution, whose kernel launching order brings fast execution time, the results of corunning kernel pairs show 11%, 18%, and 12% speedup on AMD R9 290X, RX 480, and Vega 64, respectively, on average. Compared to the Warped-Slicer [31], the results show 29%, 18%, and 51% speedup on AMD R9 290X, RX 480, and Vega 64, respectively, on average. Our contributions are:…”
Section: Introductionmentioning
confidence: 94%
See 3 more Smart Citations
“…Compared to the optimized hardware-based concurrent kernel execution, whose kernel launching order brings fast execution time, the results of corunning kernel pairs show 11%, 18%, and 12% speedup on AMD R9 290X, RX 480, and Vega 64, respectively, on average. Compared to the Warped-Slicer [31], the results show 29%, 18%, and 51% speedup on AMD R9 290X, RX 480, and Vega 64, respectively, on average. Our contributions are:…”
Section: Introductionmentioning
confidence: 94%
“…We evaluate our SF for all 153 pairs of benchmarks in Table 1. We compare our selected scheme with the original scheme of the hardware scheduler and the scheme proposed by the Warped-Slicer [31]. Since the kernel launching order can affect the execution time, we choose the execution time of optimized order for the hardware scheduler, and the speedup is shown as ORI points in Figure 13.…”
Section: Smk Scheduling Evaluation For Kernel Pairsmentioning
confidence: 99%
See 2 more Smart Citations
“…Preemption mechanism [12] and dynamic scheduling [2] are orthogonal to our research, and therefore can be applied to our system. Recently, simultaneous multi-kernel, (SMK) which executes multiple kernels within a same SM, is proposed to improve the utilization of resources inside SMs [15,16]. Though these hardware algorithms can optimize thread block allocation for high resource utilization inside SMs, the control logic is too complex.…”
Section: Case Study: Pf+bp and Hs+smmentioning
confidence: 99%