Conga

Alizadeh, Mohammad; Edsall, Tom; Dharmapurikar, Sarang; Vaidyanathan, Ramanan; Chu, Kevin; Fingerhut, Andy; Matus, Francis; Pan, Rong; Yadav, Navindra; Varghese, George

doi:10.1145/2619239.2626316

Cited by 464 publications

(36 citation statements)

References 48 publications

(86 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The main goal is to analyze how many bits are needed for RDN encoding with regards to different fan-outs of 2-tier Clos networks topologies that cover mostly EDC deployments [5].…”

Section: Validation Methodologymentioning

confidence: 99%

“…In this work, we carry out a RDN scalability analysis considering topologies as a design reference based on 2-tier Clos networks which cover the majority of EDC deployments and support thousands of physical servers [5].…”

Section: Packetmentioning

confidence: 99%

“…Since the multi-stage Clos networks are topologies commonly found in enterprise datacentres supporting tens of thousands of physical servers [5], our focus in this analysis is on EDC based on 2-tier Clos networks. For example, in a 2-tier Clos network with v spine switches, the RDN controller can easily allocate v disjoint routes (load balancing and traffic engineering) by having each route through a unique spine switch [13].…”

Section: A Rdn Scalability For Edc Topology Designmentioning

confidence: 99%

See 2 more Smart Citations

Programmable residues defined networks for edge data centres

Martinello

Liberato

Beldachi

et al. 2017

2017 13th International Conference on Network and Service Management (CNSM)

View full text Add to dashboard Cite

“…The main goal is to analyze how many bits are needed for RDN encoding with regards to different fan-outs of 2-tier Clos networks topologies that cover mostly EDC deployments [5].…”

Section: Validation Methodologymentioning

confidence: 99%

Section: Packetmentioning

confidence: 99%

Section: A Rdn Scalability For Edc Topology Designmentioning

confidence: 99%

See 1 more Smart Citation

Programmable residues defined networks for edge data centres

Martinello

Liberato

Beldachi

et al. 2017

2017 13th International Conference on Network and Service Management (CNSM)

View full text Add to dashboard Cite

“…The intraparty communication highly depends on the network status of the host data center. There are lots of network optimization techniques aiming at low latency and high throughput for data center networks, such as congestion control strategies [3,7,8,11,16,24,32], ow scheduling [1, 4-6, 10, 15, 21], load balancing [2,31], etc. These techniques are orthogonal and complementary to our work.…”

Section: Related Workmentioning

confidence: 99%

Accelerating Intra-Party Communication in Vertical Federated Learning with RDMA

Liu

2020

Proceedings of the 1st Workshop on Distributed Machine Learning

View full text Add to dashboard Cite

Federated learning (FL) has emerged as an elegant privacy-preserving distributed machine learning (ML) paradigm. Particularly, vertical FL (VFL) has a promising application prospect for collaborating organizations owning data of the same set of users but with disjoint features to jointly train models without leaking their private data to each other. As the volume of training data and the model size increase rapidly, each organization may deploy a cluster of many servers to participant in the federation. As such, the intra-party communication cost (i.e., network transfers within each organization's cluster) can signi cantly impact the entire VFL job's performance. Despite this, existing FL frameworks use the ine cient gRPC for intra-party communication, leading to high latency and high CPU cost. In this paper, we propose a design to transmit data with RDMA for intra-party communication, with no modi cations to applications. To improve the network e ciency, we further propose an RDMA usage arbiter to adjust the RDMA bandwidth used for a non-straggler party dynamically, and a query data size optimizer to automatically nd out the optimal query data size that each response carries. Our preliminary results show that RDMA based intra-party communication is 10x faster than gRPC based one, leading to a reduction of 9% on the completion time of a VFL job. Moreover, the RDMA usage arbiter can save over 90% bandwidth, and the query data size optimizer can improve the transmission speed by 18%. CCS CONCEPTS • Computing methodologies → Distributed arti cial intelligence; • Networks → Data center networks.

show abstract

“…Improvements throughout the networking stack, though, have mostly focused on the in-network congestion. Approaches such as [2][3][4] reduced congestion and buffer utilisation, nearly eliminated packet drops, improved fabric utilisation, and reduced latency jitter.…”

Section: Introductionmentioning

confidence: 99%

Flow control for Latency-Critical RPCs

Kogias

Bugnion

2018

Proceedings of the 2018 Afternoon Workshop on Kernel Bypassing Networks

View full text Add to dashboard Cite

In today's modern datacenters, the waiting time spent within a server's queue is a major contributor of the end-to-end tail latency of µs-scale remote procedure calls. In traditional TCP, congestion control handles in-network congestion, while flow control was designed to avoid memory overruns in streaming scenarios. The latter is unfortunately oblivious to the load on the server when processing short requests from multiple clients at very high rates. Acknowledging flow control as the mechanism that controls queuing on the end-host, we propose a different flow control mechanism that depends on the application-specific service-level objectives and controls the waiting time in the receivers queue by adjusting the incoming load accordingly. We design this latency-aware flow control mechanism as part of TCP by maintaining a wire-compatible header format without introducing extra messages. We implement a proof-of-concept userspace TCP stack on top of DPDK and we show that the new flow control mechanism prevents applications from violating service-level objectives in a single-server environment by throttling the incoming requests. We demonstrate the true benefit of the approach in a replicated, multi-server scenario, where independent clients leverage the flow-control signal to avoid directing requests to the overloaded servers.

show abstract

Conga

Cited by 464 publications

References 48 publications

Programmable residues defined networks for edge data centres

Programmable residues defined networks for edge data centres

Accelerating Intra-Party Communication in Vertical Federated Learning with RDMA

Flow control for Latency-Critical RPCs

Contact Info

Product

Resources

About