GPUnet

Silberstein, Mark; Kim, Sangman; Huh, Seonggu; Zhang, Xinya; Hu, Yige; Wated, Amir; Witchel, Emmett

doi:10.1145/2963098

Cited by 27 publications

(6 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The CPU and GPU communicate through a non-coherent PCIe bus model. This is representative of most previous works that have attempted intra-kernel networking using helper threads on the host [12,16,37]. dGPU also serves as the baseline for all results that report normalized energy consumption or speedups.…”

Section: Methodsmentioning

confidence: 99%

“…There are a number of works that support GPU networking through helper threads on the host CPU. GPUNet [16] provides a socket-based abstraction for GPUs. Both Distributed Computing for GPU Networks (DCGN) [37] and dCUDA [12] implement a device-side MPI library for GPU kernels that attempts to hide long-latency GPU network events across the cluster.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

ComP-net

LeBeane

Hamidouche

Benton

et al. 2018

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques

View full text Add to dashboard Cite

Current state-of-the-art in GPU networking advocates a hostcentric model that reduces performance and increases code complexity. Recently, researchers have explored several techniques for networking within a GPU kernel itself. These approaches, however, sufer from high latency, waste energy on the host, and are not scalable with larger/more GPUs on a node. In this work, we introduce Command Processor Networking (ComP-Net), which leverages the availability of scalar cores integrated on the GPU itself to provide highperformance intra-kernel networking. ComP-Net enables eicient synchronization between the Command Processors and Compute Units on the GPU through a line locking scheme implemented in the GPU's shared last-level cache. We illustrate that ComP-Net can improve application performance by up to 20% and provide up to 50% reduction in energy consumption vs. competing networking techniques across a Jacobi stencil, allreduce collective, and machine learning applications. CCS CONCEPTS • Computer systems organization → Heterogeneous (hybrid) systems;

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

ComP-net

LeBeane

Hamidouche

Benton

et al. 2018

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques

View full text Add to dashboard Cite

show abstract

“…Efforts to improve server-based DPI performance have focused on reducing overheads [34] or offloading work to GPUs [27,63,64,67] and accelerators [42]. In contrast, DeepMatch guarantees data-independent throughput and runs on more energy efficient NPs [51].…”

Section: Related Workmentioning

confidence: 99%

DeepMatch

Hypolite

Sonchack

Hershkop

et al. 2020

Proceedings of the 16th International Conference on Emerging Networking EXperiments and Technologies

View full text Add to dashboard Cite

Restricting data plane processing to packet headers precludes analysis of payloads to improve routing and security decisions. Deep-Match delivers line-rate regular expression matching on payloads using Network Processors (NPs). It further supports packet reordering to match patterns in flows that cross packet boundaries. Our evaluation shows that an implementation of DeepMatch, on a 40 Gbps Netronome NFP-6000 SmartNIC, achieves up to line rate for streams of unrelated packets and up to 20 Gbps when searches span multiple packets within a flow. In contrast with prior work, this throughput is data-independent and adds no burstiness. DeepMatch opens new opportunities for programmable data planes.

show abstract

“…Moving data between accelerator memories has been a significant bottleneck in distributed computing environments [20,21]. Unlike earlier systems that rely mainly on CPU-initiated mechanisms [20], moving data residing on accelerator memories has recently involved novel mechanisms, including deviceinitiated [3,12,[22][23][24] and hardware transparent migration using unified memory models [25,26].…”

Section: Related Workmentioning

confidence: 99%

“…NVSHMEM [12] leverages the relaxed memory semantics of the SHMEM [30] programming abstraction to provide efficient inter-GPU communication. Numerous studies [3,12,[22][23][24] studied the performance issues associated with scaling NVSHMEM without developing a performance model to guide algorithm development. In contrast, our study aims at not only guiding algorithm development but also identifying hardware bottlenecks with the greatest impact on the performance.…”

Section: Related Workmentioning

confidence: 99%

Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches

Groves

Brock

Chen

et al. 2020

2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)

View full text Add to dashboard Cite

Network communication on GPU-based systems is a significant roadblock for many applications with small but frequent messaging requirements. One common question for application developers is, "How can they reduce the overheads and achieve the best communication performance on GPUs?" This work examines device initiated versus host initiated internode GPU communication using NVSHMEM. We derive basic communication model parameters for single message and batched communication before validating our model against distributed GEMM benchmarks. We use our model to estimate performance benefits for applications transitioning from CPUs to GPUS for fixed-size and scaled workloads and provide general guidelines for reducing communication overheads. Our findings show that the host-initiated approach generally outperforms the deviceinitiated approach for the system evaluated.

show abstract

GPUnet

Cited by 27 publications

References 27 publications

ComP-net

ComP-net

DeepMatch

Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches

Contact Info

Product

Resources

About