2018
DOI: 10.1007/s11227-018-2375-9
|View full text |Cite
|
Sign up to set email alerts
|

A hybrid GPU cluster and volunteer computing platform for scalable deep learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 32 publications
0
9
0
Order By: Relevance
“…In this section, we address byzantine-tolerant training in a setup where new participants can join or leave collaboration midway through training. This requirement arises naturally if a given training run relies on volunteers or an open pool of paid participants [13,14,15]. In addition to all existing concerns from Section 3, this new setup allows Byzantine attackers to assume new identity each time they are blocked.…”
Section: G Reputation System For Public Collaborationsmentioning
confidence: 99%
See 2 more Smart Citations
“…In this section, we address byzantine-tolerant training in a setup where new participants can join or leave collaboration midway through training. This requirement arises naturally if a given training run relies on volunteers or an open pool of paid participants [13,14,15]. In addition to all existing concerns from Section 3, this new setup allows Byzantine attackers to assume new identity each time they are blocked.…”
Section: G Reputation System For Public Collaborationsmentioning
confidence: 99%
“…The first challenge is the sheer computational complexity of many machine learning tasks, such as pretraining transformers for NLP [7,8,9] or learning on huge datasets in vision [10,11,12]. Recent works propose several systems [13,14,15] that can share the computation across many volunteers that donate the idle time of their computers. Another challenge arises in Federated Learning, where participants train a shared model over decentralized data that cannot be shared for privacy reasons [16,17,18].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…By contrast, distributed training of a single model requires significantly more communication and does not allow a natural way to "restart" failed jobs. When it comes to distributed training of neural networks, most volunteer computing projects rely on parameter server architectures [71,72,73]. As a result, these systems are bounded by the throughput of parameter servers and the memory available on the weakest GPU.…”
Section: Volunteer Computingmentioning
confidence: 99%
“…For the NVIDIA Tesla V100 Volta GPU used in this study, the single precision and double precision of floatingpoint operations are up to 14 and 7 TFLOP/s, respectively, which are far greater than the computing ability of CPU. GPU is widely used in general-purpose computing areas, such as molecular dynamics (MD) [12], direct simulation Monte Carlo (DSMC) [13], CFD, artificial intelligence (AI) [14], and deep learning (DL) [15]. Performing large-scale numerical calculations of CFD on GPUs is a research focus in the field of general computing, and a series of important results have been achieved.…”
Section: Introductionmentioning
confidence: 99%