Proceedings of the 46th International Symposium on Computer Architecture 2019
DOI: 10.1145/3307650.3322259
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating distributed reinforcement learning with in-switch computing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
60
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 92 publications
(60 citation statements)
references
References 19 publications
0
60
0
Order By: Relevance
“…Indeed, the fact that communication is a major performance bottleneck in DDL is well-known [32], and many works [10,35,39,44,58,66] proposed various optimizations to achieve high-bandwidth collective communication specialized for DDL. Besides, a recent body of work, primarily within the ML community, developed gradient compression methods [1,2,42,63,67] to reduce communication time by sending a smaller amount of data, albeit at the cost of reduced training quality due to the lossy nature of compression.…”
Section: Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…Indeed, the fact that communication is a major performance bottleneck in DDL is well-known [32], and many works [10,35,39,44,58,66] proposed various optimizations to achieve high-bandwidth collective communication specialized for DDL. Besides, a recent body of work, primarily within the ML community, developed gradient compression methods [1,2,42,63,67] to reduce communication time by sending a smaller amount of data, albeit at the cost of reduced training quality due to the lossy nature of compression.…”
Section: Modelmentioning
confidence: 99%
“…Efficient communication in DDL. Several efforts optimize DDL communication ranging from designing high-performance PS software [43] and transfer scheduler [20,25,50], to improving collective communication in heterogeneous networks fabrics [10,28] and within multi-GPU servers [66], to developing in-network reduction systems [35,39,44,57,58], to customizing network congestion protocols and architecture [18]. OmniReduce leverages data sparsity to optimize communication and is complementary to these efforts.…”
Section: Other Related Workmentioning
confidence: 99%
“…Table I summarizes the related works using hardware accelerators on In-Network Computing. [43] X NetDebug [44] X Lake [45] X iSwitch [46] X…”
Section: State Of the Artmentioning
confidence: 99%
“…iSwitch [46] proposes a distributed solution using innetwork computing to move gradient aggregation operations from network node servers to FPGA-based switches, reducing the number of network hops during gradient aggregation operations. Gradient aggregation is Reinforcement Learning (RL) operations used to train Artificial Intelligence (AI) applications.…”
Section: B Fpga-based Hardware Acceleratorsmentioning
confidence: 99%
“…In many networks, the data and packet rate reduction offered by the former is required to make this possible. Indeed, inswitch aggregation has seen great success in aiding ML for training [20], and direct execution [21]. We make use of the following standard classification algorithms on a fixed-size representation to attempt to single out the CCA in use:…”
Section: I I T C P C O N G E S T I O N C O N T R O L C L a S S I F I C At I O Nmentioning
confidence: 99%