Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis 2017
DOI: 10.1145/3126908.3126950
|View full text |Cite
|
Sign up to set email alerts
|

GPU triggered networking for intra-kernel communications

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 16 publications
(8 citation statements)
references
References 15 publications
0
8
0
Order By: Relevance
“…The accelerator can employ all the features of the CPU network stack, including NIC hardware offloads, while occupying a relatively small area (see Table 1). However, the CPU is involved in every network transaction, limiting scalability, hurting performance, and wasting CPU cycles [22,60,91].…”
Section: Accelerator Networking Architecturesmentioning
confidence: 99%
See 1 more Smart Citation
“…The accelerator can employ all the features of the CPU network stack, including NIC hardware offloads, while occupying a relatively small area (see Table 1). However, the CPU is involved in every network transaction, limiting scalability, hurting performance, and wasting CPU cycles [22,60,91].…”
Section: Accelerator Networking Architecturesmentioning
confidence: 99%
“…Our work builds upon previous GPU networking work to use NICs directly from GPUs, eliminating the CPU from the critical path [2,22,60,84,91]. These works implement communication tasks in GPGPU cores or CUDA stream MemOps [2].…”
Section: Related Workmentioning
confidence: 99%
“…The advent of heterogeneous systems, especially with the use of hardware accelerators, brings back to the forefront the modeling question of these complex systems. Moving data between accelerator memories has been a significant bottleneck in distributed computing environments [20,21]. Unlike earlier systems that rely mainly on CPU-initiated mechanisms [20], moving data residing on accelerator memories has recently involved novel mechanisms, including deviceinitiated [3,12,[22][23][24] and hardware transparent migration using unified memory models [25,26].…”
Section: Related Workmentioning
confidence: 99%
“…A number of work implement intra-kernel networking while avoiding CPU helper threads. GPU-TN [19] provides an intra-kernel networking scheme by using a mechanism based on Portals 4 triggered operations [35]. GPU Global Address Space (GGAS) [27] implements intra-kernel networking by adding explicit hardware in the GPU to support a clusterwide global address space.…”
Section: Related Workmentioning
confidence: 99%
“…Inter-kernel networking can also impose performance challenges when networking is frequent compared to computation and limits the class of algorithms that can be oloaded to a GPU. To put this into perspective, waiting for kernel tear-down/startup has been shown to take upwards of 10µs [19]. This is an order of magnitude greater than modern network latencies, which hover around 0.7µs at the time of this writing [22].…”
Section: Introductionmentioning
confidence: 99%