2014 IEEE High Performance Extreme Computing Conference (HPEC) 2014
DOI: 10.1109/hpec.2014.7040988
|View full text |Cite
|
Sign up to set email alerts
|

An investigation of Unified Memory Access performance in CUDA

Abstract: Managing memory between the CPU and GPU is a major challenge in GPU computing. A programming model, Unified Memory Access (UMA), has been recently introduced by Nvidia to simplify the complexities of memory management while claiming good overall performance. In this paper, we investigate this programming model and evaluate its performance and programming model simplifications based on our experimental results. We find that beyond on-demand data transfers to the CPU, the GPU is also able to request subsets of d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
43
0
1

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 83 publications
(48 citation statements)
references
References 7 publications
0
43
0
1
Order By: Relevance
“…Gelado et al [5] presented a new programming model for heterogeneous computing, called Asymmetric Distributed Shared Memory (ADSM), that maintains a shared logical memory space for CPUs to access objects in the accelerator physical memory. Nickolls et al [9] investigated the Unified Memory programming model and evaluate the performance. However, he only tested one benchmark suite and did not analyze the reason for the performance loss.…”
Section: Discussionmentioning
confidence: 99%
“…Gelado et al [5] presented a new programming model for heterogeneous computing, called Asymmetric Distributed Shared Memory (ADSM), that maintains a shared logical memory space for CPUs to access objects in the accelerator physical memory. Nickolls et al [9] investigated the Unified Memory programming model and evaluate the performance. However, he only tested one benchmark suite and did not analyze the reason for the performance loss.…”
Section: Discussionmentioning
confidence: 99%
“…Typically, solutions that increase flexibility and ease of programming impose a certain performance overhead. The authors of [14] thoroughly tested the UM mechanism. They incorporated several benchmarks, both those written by the authors but also the Rodinia benchmark set.…”
Section: Unified Memorymentioning
confidence: 99%
“…This work was targeted at optimizing small message transfers and was further extended by Shi et al in [14] where the authors showed how some of the new techniques such as NIC loopback and Fastcopy could enable faster transfer of eager messages with higher performance. In a recent work done by Landaverde et al in [8], the authors have done a performance evaluation of the CUDA managed memory from an applications perspective. The authors state that even though the programming productivity is high due to the on-demand fetching of data, the performance of managed memory is poor which severely restricts its flexibility and adding future optimizations.…”
Section: Related Workmentioning
confidence: 99%