2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) 2016
DOI: 10.1109/isca.2016.59
|View full text |Cite
|
Sign up to set email alerts
|

Virtual Thread: Maximizing Thread-Level Parallelism beyond GPU Scheduling Limit

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
26
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 39 publications
(27 citation statements)
references
References 50 publications
1
26
0
Order By: Relevance
“…F smem of Table II in Section V-A). This agrees to prior work's analysis [21], [22]. Exploiting such unused shared memory space, we propose to redirect memory requests of severely interfering warps to the unused shared memory space.…”
Section: B Ciao On-chip Memory Architecturesupporting
confidence: 88%
“…F smem of Table II in Section V-A). This agrees to prior work's analysis [21], [22]. Exploiting such unused shared memory space, we propose to redirect memory requests of severely interfering warps to the unused shared memory space.…”
Section: B Ciao On-chip Memory Architecturesupporting
confidence: 88%
“…In GPUs, the number of blocks a SM can serve at a time is limited due to capacity and scheduling limits. Authors in [11] suggest that number of blocks are limited mostly due to scheduling limits rather than resource constraints. So, they…”
Section: Thread-level Parallelismmentioning
confidence: 99%
“…Both these studies adjust the number of active warps to improve performance. The work in [42] analyzes the usage of computing resources and memory resources for different applications, and it simulates the maximum TLP and exploits the underutilized resources as much as possible. ILP is presented in the work in [43], which proposes to build SCs for GPU-like many-core processors to achieve both high performance and high energy efficiency.…”
Section: Two-level Parallelism Optimization Modelmentioning
confidence: 99%
“…If a warp is stalled by a data dependency or long latency memory access, then warp schedulers issue another ready warp from the warp pool so that the execution of warps is interleaved [42]. The availability of stall hiding relies on the number of eligible warps in the warp pool, which is the primary reason why GPUs require a large number of concurrent threads [45].…”
Section: The Impact Of Higher Tlpmentioning
confidence: 99%
See 1 more Smart Citation