Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis 2019
DOI: 10.1145/3295500.3356189
|View full text |Cite
|
Sign up to set email alerts
|

Network-accelerated non-contiguous memory transfers

Abstract: Applications often communicate data that is non-contiguous in the send-or the receive-buffer, e.g., when exchanging a column of a matrix stored in row-major order. While non-contiguous transfers are well supported in HPC (e.g., MPI derived datatypes), they can still be up to 5x slower than contiguous transfers of the same size. As we enter the era of network acceleration, we need to investigate which tasks to offload to the NIC: In this work we argue that non-contiguous memory transfers can be transparently ne… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
2
1

Relationship

4
5

Authors

Journals

citations
Cited by 17 publications
(9 citation statements)
references
References 45 publications
0
9
0
Order By: Relevance
“…However, the majority of them focus on training, leaving much room for developing efficient distributed-memory frameworks and techniques for GNN inference. We also note high potential in incorporating high-performance interconnect related mechanisms such as Remote Direct Memory Access (RDMA) [87], SmartNICs [28], [74], [106], or novel network topologies and routing [26], [34] into the GNN domain.…”
Section: Multi-machine Parallelismmentioning
confidence: 99%
See 1 more Smart Citation
“…However, the majority of them focus on training, leaving much room for developing efficient distributed-memory frameworks and techniques for GNN inference. We also note high potential in incorporating high-performance interconnect related mechanisms such as Remote Direct Memory Access (RDMA) [87], SmartNICs [28], [74], [106], or novel network topologies and routing [26], [34] into the GNN domain.…”
Section: Multi-machine Parallelismmentioning
confidence: 99%
“…• Incorporating high-performance distributed-memory capabilities CAGNET [199] illustrated how to scalably execute GNN training across many compute nodes. It would be interesting to push this direction and use high-performance distributed-memory developments and interconnects, and the associated mechanisms for more performance of distributed-memory GNN computations, using -for example -RDMA and RMA programming [87], [179], SmartNICs [28], [74], serverless computing [67], of high-performance networking architectures [20], [24], [26], [34]. Such techniques have been successfully used to accelerate the related graph processing field [191].…”
Section: • Parallelization Of Gnn Models Beyond Simple C-gnnsmentioning
confidence: 99%
“…It defines a flexible and programmable network instruction set architecture (NISA) that not only lowers the barrier of entry but also supports a large set of use-cases [28]. For example, Di Girolamo et al demonstrate up to 10x speedups for serialization and deserialization (marshalling) of non-consecutive data [20].…”
Section: Motivationmentioning
confidence: 99%
“…The FPGA community has recently gained interest in processing graphs [18-20, 23, 25, 27-31, 63, 125] and other forms of general irregular computations [21,22,24,53,61,82,119,120,129]. First, some established CPU-related schemes were ported to the FPGA setting, for example vertexcentric [57,58], GAS [145], edge-centric [149], BSP [78], and MapReduce [141].…”
Section: Graph Processing On Fpgasmentioning
confidence: 99%