2014 47th Annual IEEE/ACM International Symposium on Microarchitecture 2014
DOI: 10.1109/micro.2014.11
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Cache Management for Energy-Efficient GPU Computing

Abstract: Abstract-With the SIMT execution model, GPUs can hide memory latency through massive multithreading for many applications that have regular memory access patterns. To support applications with irregular memory access patterns, cache hierarchies have been introduced to GPU architectures to capture temporal and spatial locality and mitigate the effect of irregular accesses. However, GPU caches exhibit poor efficiency due to the mismatch of the throughput-oriented execution model and its cache hierarchy design, w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
79
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 146 publications
(79 citation statements)
references
References 36 publications
0
79
0
Order By: Relevance
“…Chen et al [20] designed a hardware sampling based method on GPUs for LI data cache bypassing and used warp throttling to reduce contention. Tian et al [21] implemented a PC-based dynamic GPU cache bypassing predictor.…”
Section: Related Workmentioning
confidence: 99%
“…Chen et al [20] designed a hardware sampling based method on GPUs for LI data cache bypassing and used warp throttling to reduce contention. Tian et al [21] implemented a PC-based dynamic GPU cache bypassing predictor.…”
Section: Related Workmentioning
confidence: 99%
“…The CPU-based approaches are usually designed for last level caches (LLCs), where data locality is already filtered by previous level(s) of caches. But the poor locality of GPU workloads and resource congestion impose difficulty for them to make robust predictions and they often increase L2 and DRAM level traffic [11] (Section 6.1(a)). GPU-based bypassing schemes are generally conditional/reactive bypassing (e.g., bypass upon unavailable resources [15] or coarse-grained bypassing on warps or threadblocks [30,31,27]) which can incorrectly bypass accesses with good reuse and cause memory pipeline stalls (Section 6.1(a)).…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, a fully-adaptive bypassing scheme is required to maintain the efficiency of workloads with good caching behavior, which is often neglected by previous approaches [24,15,12,11] (Section 6.1(b)).…”
Section: Introductionmentioning
confidence: 99%
“…There is limited communication between different workgroups. Since GPU applications generally exhibit little L1 temporal locality [9], the communication between the L1 and L2 caches becomes the main source of traffic on the GPU's on-chip interconnection network. As the number of CUs increases in each future generation of GPU systems, latency in the on-chip interconnection network becomes a major performance bottleneck on the GPU [6].…”
Section: Introductionmentioning
confidence: 99%