2022
DOI: 10.1109/mm.2021.3139027
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing Distributed DNN Training Using CPUs and BlueField-2 DPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 8 publications
0
2
0
Order By: Relevance
“…However, they did not find an offloading scheme that could optimally accelerate the training of all models in their study. In an extension of this work [11], the same authors show a consistent improvement for Convolu-tional Neural Networks (CNNs) and Transformer models with weak and strong scaling on multiple nodes.…”
Section: Related Workmentioning
confidence: 61%
See 1 more Smart Citation
“…However, they did not find an offloading scheme that could optimally accelerate the training of all models in their study. In an extension of this work [11], the same authors show a consistent improvement for Convolu-tional Neural Networks (CNNs) and Transformer models with weak and strong scaling on multiple nodes.…”
Section: Related Workmentioning
confidence: 61%
“…As in [10,11], our proposal makes use of the DPU in a Deep Learning environment, but unlike those where it is used in the training phases of the model, in our case, the card will be used to perform filtering tasks to help an already trained model to reduce the inference workload. We were motivated to try using a DPU as a filter for a video stream because of its ability to alleviate the load on the system.…”
Section: Related Workmentioning
confidence: 99%
“…HPC applications could benefit from DPU devices offloading part of their load to them. For example, when training deep neural networks, data augmentation or validation stages, could be offloaded to to less power accelerators such as DPUs [3]. In turn, in large distributed multiphysics simulations could offload the halo exchange operation making DPUs responsible for communicating and computing the halo among neighbors.…”
Section: Discussionmentioning
confidence: 99%
“…Leveraging DPUs for these tasks, Jain et al [ 27 ] achieved up to a 15% increase in training performance. Their subsequent work [ 28 ] demonstrated consistent performance improvements for CNNs and Transformer models, both in weak and strong scaling scenarios across multiple nodes.…”
Section: Previous Workmentioning
confidence: 92%