2021
DOI: 10.1109/access.2021.3110840
|View full text |Cite
|
Sign up to set email alerts
|

Efficient User-Level Multi-Path Utilization in RDMA Networks

Abstract: RDMA has become one of the most prominent networking technologies in DCNs by providing high bandwidth and ultra-low latency, especially for data-intensive applications. An important challenge with RDMA is to exploit multi-path for high throughput and reliability. Several studies have been proposed to utilize multi-path in RDMA networks, but they commonly require modification of RDMA NICs, which makes it hard to deploy them in practice. In this paper, we propose a user-level multi-path RDMA (UL-MPRDMA) scheme, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 27 publications
0
1
0
Order By: Relevance
“…While these modifications enhance QP scalability, the high cost and potential impact on NIC reliability make them impractical for deployment in hyper scale AI infrastructure. Additionally, congestion control algorithms like DCQCN and TIMELY, used in these studies, cannot completely prevent collisions of multiple large flows at specific nodes (usually refers to servers or switches) in extremely large-scale AI training networks [6,9,11,18], leading to congestion in network traffic at those points.…”
Section: Introductionmentioning
confidence: 99%
“…While these modifications enhance QP scalability, the high cost and potential impact on NIC reliability make them impractical for deployment in hyper scale AI infrastructure. Additionally, congestion control algorithms like DCQCN and TIMELY, used in these studies, cannot completely prevent collisions of multiple large flows at specific nodes (usually refers to servers or switches) in extremely large-scale AI training networks [6,9,11,18], leading to congestion in network traffic at those points.…”
Section: Introductionmentioning
confidence: 99%