2024
DOI: 10.21203/rs.3.rs-4174332/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

An Optimized RDMA QP Communication Mechanism for Hyperscale AI Infrastructure

Junliang Wang,
Baohong Lin,
Jiao Zhang
et al.

Abstract: The current artificial intelligence (AI) infrastructure widely employs remote direct memory access (RDMA) protocol for high-performance communication in networks, utilizing Reliable Connection (RC)-based Queue Pairs (QP) to ensure end-to-end correct and ordered data transmission. However, as the scale of AI infrastructure continues to expand, this RC-based QP communication mechanism faces deficiencies in scalability and is prone to congestion, resulting in degraded network transfer performance. In this paper, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 25 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?