Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture 2013
DOI: 10.1145/2540708.2540747
|View full text |Cite
|
Sign up to set email alerts
|

Heterogeneous system coherence for integrated CPU-GPU systems

Abstract: Many future heterogeneous systems will integrate CPUs and GPUs physically on a single chip and logically connect them via shared memory to avoid explicit data copying. Making this shared memory coherent facilitates programming and fine-grained sharing, but throughput-oriented GPUs can overwhelm CPUs with coherence requests not well-filtered by caches. Meanwhile, region coherence has been proposed for CPU-only systems to reduce snoop bandwidth by obtaining coherence permissions for large regions.This paper deve… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
43
0
1

Year Published

2014
2014
2020
2020

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 94 publications
(45 citation statements)
references
References 28 publications
0
43
0
1
Order By: Relevance
“…To address the high bandwidth and MSRHs demands of future GPUs, HSC [Power et al 2013] uses a regional directory and buffers in both CPU and GPU L2 caches to filter directory probes. If the permission can be acquired by the region buffer, the requestor directly accesses the memory through a dedicated interconnect, otherwise it has to access the directory before accessing the memory.…”
Section: Qualitative Comparisonmentioning
confidence: 99%
See 2 more Smart Citations
“…To address the high bandwidth and MSRHs demands of future GPUs, HSC [Power et al 2013] uses a regional directory and buffers in both CPU and GPU L2 caches to filter directory probes. If the permission can be acquired by the region buffer, the requestor directly accesses the memory through a dedicated interconnect, otherwise it has to access the directory before accessing the memory.…”
Section: Qualitative Comparisonmentioning
confidence: 99%
“…The first step to make those systems more programming friendly is to provide a cache-coherent unified address space. Research proposals, either employ sequential consistency (SC) protocols to maintain coherence between devices (CPU-GPU) such as [Power et al 2013;Power et al 2015], or exchange some programming ease for simplicity and adopt release consistency (RC) such as [Singh et al 2013;. To maintain SC across devices, every write performed by a GPU core is eagerly made visible to the device and to the entire system.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Even high-quality production compilers are out-of-date [31] and cannot extract the full performance potential of the hardware. The industry-wide trend towards heterogeneity only serves to make the optimization decision space even more complex, making effective heuristics near impossible to construct [44].…”
Section: Introductionmentioning
confidence: 99%
“…Creating analytical models on which optimization heuristics can be based has become harder as processor complexity has increased, and this trend is bound to continue as processor designs move further towards heterogeneous parallelism [1]. Compiler developers often have to spend months if not years to get a heuristic right for a targeted architecture, and these days compilers often support a wide range of disparate processors.…”
Section: Introductionmentioning
confidence: 99%