2021
DOI: 10.48550/arxiv.2106.10207
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Distributed Deep Learning in Open Collaborations

Abstract: Modern deep learning applications require increasingly more compute to train state-of-the-art models. To address this demand, large corporations and institutions use dedicated High-Performance Computing clusters, whose construction and maintenance are both environmentally costly and well beyond the budget of most organizations. As a result, some research directions become the exclusive domain of a few large industrial and even fewer academic actors. To alleviate this disparity, smaller groups may pool their co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 49 publications
0
1
0
Order By: Relevance
“…Building and maintaining a cluster of thousands of accelerators also requires tremendous effort. New training paradigms like Learning@Home [Ryabinin and Gusev 2020;Diskin et al 2021] explore leveraging volunteer compute over the internet to train foundation models collaboratively. Such fundamentally new execution models can decrease the cost of training for any one entity, but require collaboration across a number of different areas like security (to ensure that a malicious volunteer cannot significantly alter the training process), distributed systems (to deal with fault tolerance issues as volunteers drop), and crowdsourcing.…”
Section: Execution and Programming Modelsmentioning
confidence: 99%
“…Building and maintaining a cluster of thousands of accelerators also requires tremendous effort. New training paradigms like Learning@Home [Ryabinin and Gusev 2020;Diskin et al 2021] explore leveraging volunteer compute over the internet to train foundation models collaboratively. Such fundamentally new execution models can decrease the cost of training for any one entity, but require collaboration across a number of different areas like security (to ensure that a malicious volunteer cannot significantly alter the training process), distributed systems (to deal with fault tolerance issues as volunteers drop), and crowdsourcing.…”
Section: Execution and Programming Modelsmentioning
confidence: 99%