Proceedings of the 29th ACM on International Conference on Supercomputing 2015
DOI: 10.1145/2751205.2751234
|View full text |Cite
|
Sign up to set email alerts
|

A Stall-Aware Warp Scheduling for Dynamically Optimizing Thread-level Parallelism in GPGPUs

Abstract: General-Purpose Graphic Processing Units (GPGPU) have been widely used in high performance computing as application accelerators due to their massive parallelism and high throughput. A GPGPU generally contains two layers of schedulers, a cooperative-thread-array (CTA) scheduler and a warp scheduler, which administer the thread level parallelism (TLP). Previous research shows the maximized TLP does not always deliver the optimal performance. Unfortunately, existing warp scheduling schemes do not optimize TLP at… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(4 citation statements)
references
References 35 publications
0
4
0
Order By: Relevance
“…EXPARS can improve TLP by enabling more CTAs per SM through expanding register file to scratchpad memory. However, previous works [5,13,19,43] have shown that higher TLP does not always mean higher performance due to resource contention. To alleviate the contention, we propose a Lazy Two-Level Warp Scheduler (LTLWS), which is inspired by Reference [19], to control the maximum number of schedulable warps (active warps) during runtime.…”
Section: A Lazy Two-level Warp Schedulermentioning
confidence: 87%
See 1 more Smart Citation
“…EXPARS can improve TLP by enabling more CTAs per SM through expanding register file to scratchpad memory. However, previous works [5,13,19,43] have shown that higher TLP does not always mean higher performance due to resource contention. To alleviate the contention, we propose a Lazy Two-Level Warp Scheduler (LTLWS), which is inspired by Reference [19], to control the maximum number of schedulable warps (active warps) during runtime.…”
Section: A Lazy Two-level Warp Schedulermentioning
confidence: 87%
“…Jing et al [12] introduce an integrated architecture that enables the register file to support the function of cache, which also has above weaknesses. GPU warp scheduling: GPU warp scheduling is a hot research point in recent years [19,20,31,36,43]. Lee et al [20] first propose a profiling algorithm to find the critical warps and then schedule these critical warps more frequently than others.…”
Section: Evaluation For Advanced Architecturementioning
confidence: 99%
“…In addition, Kayiran et al [11] proposed a dynamic CTA scheduling technique that attempts to allocate optimal number of CTAs per core based on application demands, demonstrating that executing the maximum number of CTAs per core is not always the best solution to boost performance due to high cache and memory contention. Yu et al [33] presented a Stall-Aware Warp Scheduling (SAWS) policy, which dynamically optimizes the TLP according to pipeline stalls. SAWS can effectively improve pipeline efficiency by reducing structural hazards without introducing new data hazards.…”
Section: Related Work Using Hints In Microprocessorsmentioning
confidence: 99%
“…Recent research on GPGPUs has led to the optimization of thread level parallelism and maximizing the execution of cooperative thread arrays [1][2][3]. This has made GPGPUs more viable for high performance computation.…”
Section: Introductionmentioning
confidence: 99%