2019
DOI: 10.1109/tc.2018.2878671
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Cooperation of Prefetching and Warp Scheduling on GPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 23 publications
0
3
0
Order By: Relevance
“…However, for GPGPU applications with irregular memory references, such as BFS and DG in the ISPASS 2009, the access of threads in a warp can hardly be combined. Such unconsolidated memory access requests often lead to memory divergence, which means some threads in a warp thread have low latency due to cache hits, while others need to endure longer latency due to cache misses [14,16,22], as shown in Fig. 1.…”
Section: Motivationmentioning
confidence: 99%
See 1 more Smart Citation
“…However, for GPGPU applications with irregular memory references, such as BFS and DG in the ISPASS 2009, the access of threads in a warp can hardly be combined. Such unconsolidated memory access requests often lead to memory divergence, which means some threads in a warp thread have low latency due to cache hits, while others need to endure longer latency due to cache misses [14,16,22], as shown in Fig. 1.…”
Section: Motivationmentioning
confidence: 99%
“…For example, the computing efficiency of GPUs may plummet. Such access behaviors cannot reasonably match the design of GPU on-chip storage hierarchy, which holds back the advantages of GPUs' architectures and greatly degrades their performances [14][15][16].…”
Section: Introductionmentioning
confidence: 99%
“…In the literature [8], the authors dynamically choose a LRR or GTO scheduling policy suitable for a task based on the locality of task load. Oh et al [9] proposed the adaptive anticipation and scheduling policy ARPES (Adaptive Prefetching and Scheduling), which divides the warps executing the same memory operation instructions into a group and prioritizes the execution of the group. Rogers et al [10] prioritized the warps based on the degree of data locality within the warp and proposed a cacheaware warp scheduling algorithm CCWS (Cache-Conscious Wavefront Scheduling) which tracks the invalidation of the L1 data cache, adjusts the number of active warp in time, and reduces cache contention to preserve access locality.…”
Section: Introductionmentioning
confidence: 99%