2016
DOI: 10.1145/2963098
|View full text |Cite
|
Sign up to set email alerts
|

GPUnet

Abstract: Despite the popularity of GPUs in high-performance and scientific computing, and despite increasingly general-purpose hardware capabilities, the use of GPUs in network servers or distributed systems poses significant challenges. GPUnet is a native GPU networking layer that provides a socket abstraction and high-level networking APIs for GPU programs. We use GPUnet to streamline the development of high-performance, distributed applications like in-GPU-memory MapReduce and a new class of low-latency, h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(6 citation statements)
references
References 27 publications
0
6
0
Order By: Relevance
“…The CPU and GPU communicate through a non-coherent PCIe bus model. This is representative of most previous works that have attempted intra-kernel networking using helper threads on the host [12,16,37]. dGPU also serves as the baseline for all results that report normalized energy consumption or speedups.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The CPU and GPU communicate through a non-coherent PCIe bus model. This is representative of most previous works that have attempted intra-kernel networking using helper threads on the host [12,16,37]. dGPU also serves as the baseline for all results that report normalized energy consumption or speedups.…”
Section: Methodsmentioning
confidence: 99%
“…There are a number of works that support GPU networking through helper threads on the host CPU. GPUNet [16] provides a socket-based abstraction for GPUs. Both Distributed Computing for GPU Networks (DCGN) [37] and dCUDA [12] implement a device-side MPI library for GPU kernels that attempts to hide long-latency GPU network events across the cluster.…”
Section: Related Workmentioning
confidence: 99%
“…Efforts to improve server-based DPI performance have focused on reducing overheads [34] or offloading work to GPUs [27,63,64,67] and accelerators [42]. In contrast, DeepMatch guarantees data-independent throughput and runs on more energy efficient NPs [51].…”
Section: Related Workmentioning
confidence: 99%
“…Moving data between accelerator memories has been a significant bottleneck in distributed computing environments [20,21]. Unlike earlier systems that rely mainly on CPU-initiated mechanisms [20], moving data residing on accelerator memories has recently involved novel mechanisms, including deviceinitiated [3,12,[22][23][24] and hardware transparent migration using unified memory models [25,26].…”
Section: Related Workmentioning
confidence: 99%
“…NVSHMEM [12] leverages the relaxed memory semantics of the SHMEM [30] programming abstraction to provide efficient inter-GPU communication. Numerous studies [3,12,[22][23][24] studied the performance issues associated with scaling NVSHMEM without developing a performance model to guide algorithm development. In contrast, our study aims at not only guiding algorithm development but also identifying hardware bottlenecks with the greatest impact on the performance.…”
Section: Related Workmentioning
confidence: 99%