Network-accelerated distributed machine learning for multi-tenant settings

Viswanathan, Raajay; Balasubramanian, Arjun

doi:10.1145/3419111.3421296

Cited by 14 publications

(4 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Specifically for such ML tasks, network performance has been noted as a major bottleneck hindering the efficient usage of such frameworks [3], [35]. Various approaches have been suggested to modify ML methodologies in order to improve upon the network induced performance of distributed ML [4], [36], [37].…”

Section: Related Workmentioning

confidence: 99%

“…As online applications and services increase in popularity, distributed data processing capabilities and datacenter networks have become a major part of the infrastructure of modern society. Moreover, due to the vast growth in the amount of data processed by such applications, recent work shows that the bottleneck for efficient distributed computation is now the underlying communication network and not the computational capabilities at the servers [1]- [3], as was traditionally the case.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Constrained In-network Computing with Low Congestion in Datacenter Networks

Segal¹,

Avin²,

Scalosub³

2022

Preprint

View full text Add to dashboard Cite

Distributed computing has become a common practice nowadays, where recent focus has been given to the usage of smart networking devices with in-network computing capabilities. State-of-the-art switches with near-line rate computing and aggregation capabilities enable acceleration and improved performance for various modern applications like big data analytics and large-scale distributed and federated machine learning.In this paper, we formulate and study the theoretical algorithmic foundations of such approaches, and focus on how to deploy and use constrained in-network computing capabilities within the data center. We focus our attention on reducing the network congestion, i.e., the most congested link in the network, while supporting the given workload(s). We present an efficient optimal algorithm for tree-like network topologies and show that our solution provides as much as an x13 improvement over common alternative approaches. In particular, our results show that having merely a small fraction of network devices that support in-network aggregation can significantly reduce the network congestion, both for single and multiple workloads.1 Such tree topologies are common as a virtual overlay over a physical network or as sub-topologies in a data center.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Constrained In-network Computing with Low Congestion in Datacenter Networks

Segal¹,

Avin²,

Scalosub³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Efficiently performing distributed machine learning, and specifically the task of training deep neural networks, has been a fundamental concern in the past decade. In particular, network bottlenecks are arguably one of the major concerns when executing such tasks [29,53]. Various methods for improving network performance and footprint in such systems have been proposed and implemented, including sparsification, quantization, and scheduling [17,54,58].…”

Section: Related Workmentioning

confidence: 99%

“…Datacenter networks and their distributed data processing capabilities are the driving force behind leading applications and services, including search engines, content distribution, social networks and eCommerce. Recent work has shown that for many of the tasks performed by such applications, the network (and not server computation) is the actual bottleneck hindering the ability to optimize computation efficiency and performance [13,35,53]. Such bottlenecks occur, e.g., in distributed and federated machine learning (e.g., AllReduce), and in solutions employing the MapReduce methodology for big data tasks, and more generally in scenarios giving rise to the incast problem [5,56].…”

Section: Introductionmentioning

confidence: 99%

SOAR: Minimizing Network Utilization with Bounded In-network Computing

Segal¹,

Avin²,

Scalosub³

2021

Preprint

View full text Add to dashboard Cite

In-network computing via smart networking devices is a recent trend for modern datacenter networks. State-of-the-art switches with near line rate computing and aggregation capabilities are developed to enable, e.g., acceleration and better utilization for modern applications like big data analytics, and large-scale distributed and federated machine learning. We formulate and study the problem of activating a limited number of in-network computing devices within a network, aiming at reducing the overall network utilization for a given workload. Such limitations on the number of in-network computing elements per workload arise, e.g., in incremental upgrades of network infrastructure, and are also due to requiring specialized middleboxes, or FPGAs, that should support heterogeneous workloads, and multiple tenants.We present an optimal and efficient algorithm for placing such devices in tree networks with arbitrary link rates, and further evaluate our proposed solution in various scenarios and for various tasks. Our results show that having merely a small fraction of network devices support in-network aggregation can lead to a significant reduction in network utilization. Furthermore, we show that various intuitive strategies for performing such placements exhibit significantly inferior performance compared to our solution, for varying workloads, tasks, and link rates.

show abstract