2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) 2013
DOI: 10.1109/ispass.2013.6557152
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating cache coherent shared virtual memory for heterogeneous multicore chips

Abstract: The trend in industry is towards heterogeneous multicore processors (HMCs), including chips with CPUs and massively-threaded throughput-oriented processors (MTTOPs) such as GPUs. Although current homogeneous chips tightly couple the cores with cache-coherent shared virtual memory (CCSVM), this is not the communication paradigm used by any current HMC. In this paper, we present a CCSVM design for a CPU/MTTOP chip, as well as an extension of the pthreads programming model, called xthreads, for programming this H… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(4 citation statements)
references
References 34 publications
0
4
0
Order By: Relevance
“…In case of embedded platforms with shared system DRAM, using the CE basically means duplicating the same buffer twice on the same memory device. Both CUDA and OpenCL programming models specify alternatives to the CE approach to avoid explicit memory transfers and unnecessary buffer replications, such as CUDA UVM (Unified Virtual Memory [14]) and OpenCL 2.0 SVM (Shared Virtual Memory [15]). However, these approaches introduce CPU-iGPU memory coherency problems when accessing the same shared memory buffer, so that avoiding copy engines does not necessarily lead to performance improvements 1 For this reason, we will characterize the contention originated in both CE-and non-CE-based models.…”
Section: Socs Specifications and Contention Pointsmentioning
confidence: 99%
“…In case of embedded platforms with shared system DRAM, using the CE basically means duplicating the same buffer twice on the same memory device. Both CUDA and OpenCL programming models specify alternatives to the CE approach to avoid explicit memory transfers and unnecessary buffer replications, such as CUDA UVM (Unified Virtual Memory [14]) and OpenCL 2.0 SVM (Shared Virtual Memory [15]). However, these approaches introduce CPU-iGPU memory coherency problems when accessing the same shared memory buffer, so that avoiding copy engines does not necessarily lead to performance improvements 1 For this reason, we will characterize the contention originated in both CE-and non-CE-based models.…”
Section: Socs Specifications and Contention Pointsmentioning
confidence: 99%
“…This is especially problematic for pointer-based data structures (e.g., linked lists, trees) 1 . Recent work tries to address this using various smarter memory management schemes [20,21,25,26]. Furthermore, latest CUDA releases permit limited CPU/GPU virtual address sharing [57].…”
Section: Address Translation On Cpu/gpusmentioning
confidence: 99%
“…There have been many coherence extensions proposed over the years (discussed further in Section II), but these generally build upon conventional hardware protocols originally designed for CPUs such as MESI. Such protocols are effective for a wide range of CPU workloads, but these complex coherence strategies often incur unacceptable overheads for accelerators such as GPUs [34], [33], [75]. In addition, the complexity of MESI-based protocols makes validating protocol changes expensive, requiring that the cost of any coherence extension be amortized over a broad range of general-purpose applications.…”
Section: Introductionmentioning
confidence: 99%