Proceedings of the 27th International ACM Conference on International Conference on Supercomputing 2013
DOI: 10.1145/2464996.2465023
|View full text |Cite
|
Sign up to set email alerts
|

Scaling large-data computations on multi-GPU accelerators

Abstract: Modern supercomputers rely on accelerators to speed up highly parallel workloads. Intricate programming models, limited device memory sizes and overheads of data transfers between CPU and accelerator memories are among the open challenges that restrict the widespread use of accelerators. First, this paper proposes a mechanism and an implementation to automatically pipeline the CPU-GPU memory channel so as to overlap the GPU computation with the memory copies, alleviating the data transfer overhead. Second, in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 18 publications
(10 citation statements)
references
References 32 publications
0
7
0
Order By: Relevance
“…UM simplifies both out-of-core processing between GPUs and CPUs as well as multi-GPU processing, 1 and combinations of both. Previously, the applications focusing on large data processing on GPUs required algorithm-specific techniques for memory handling [Al-Saber and Kulkarni 2015;Gelado et al 2010b;Huynh et al 2012;Jablin et al 2012b;Krizhevsky et al 2012;Sabne et al 2013;Seo et al 2015;Shamoto et al 2015].…”
Section: Cuda Unified Memory For Multi-gpu Systemsmentioning
confidence: 99%
“…UM simplifies both out-of-core processing between GPUs and CPUs as well as multi-GPU processing, 1 and combinations of both. Previously, the applications focusing on large data processing on GPUs required algorithm-specific techniques for memory handling [Al-Saber and Kulkarni 2015;Gelado et al 2010b;Huynh et al 2012;Jablin et al 2012b;Krizhevsky et al 2012;Sabne et al 2013;Seo et al 2015;Shamoto et al 2015].…”
Section: Cuda Unified Memory For Multi-gpu Systemsmentioning
confidence: 99%
“…GPU researchers have exploited pipelining [29] to overlap data transfers with kernel computations. The distinguishing factor in the Pagoda pipelined task processing is that it overlaps spawning, which comprises the CPU finding a free task entry and performing a data copy, with GPU scheduling, which is only a sub-part of the overall task processing.…”
Section: Related Workmentioning
confidence: 99%
“…The generated pipelined code can automatically support computations with out‐of‐GPU datasets. SuperMatrix is another runtime system that supports shared‐memory systems with multiple GPUs . It uses several software cache schemes to maintain the coherence between the host RAM and the GPU memories to minimize communication.…”
Section: Related Workmentioning
confidence: 99%
“…StarPU relies on a virtual shared memory to handle data transfers and reduce communications. Eigenmann et al [25] proposed a new technique called computation splitting and used the pipelining technique to translate OpenMP programs to run on a host system attached with multiple GPUs. The generated pipelined code can automatically support computations with out-of-GPU datasets.…”
Section: Related Workmentioning
confidence: 99%