Standard Deviation Based Adaptive Gradient Compression For Distributed Deep Learning

Chen, Mengqiang; Yan, Zijie; Ren, Jiangtao; Wu, Weigang

doi:10.1109/ccgrid49817.2020.00-40

Cited by 4 publications

(2 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Gradient sparsification can achieve a higher compression rate than gradient quantization, but it can seriously affect the convergence and accuracy of the model. The standard deviation-based adaptive gradient compression (SDAGC) method is proposed in [125], which can achieve higher model performance in simultaneous training.…”

Section: A Communication Costmentioning

confidence: 99%

Distributed Learning for Wireless Communications: Methods, Applications and Challenges

Qian

Yang

Xiao

et al. 2022

IEEE J. Sel. Top. Signal Process.

View full text Add to dashboard Cite

With its privacy-preserving and decentralized features, distributed learning plays an irreplaceable role in the era of wireless networks with a plethora of smart terminals, an explosion of information volume and increasingly sensitive data privacy issues. There is a tremendous increase in the number of scholars investigating how distributed learning can be employed to emerging wireless network paradigms in the physical layer, media access control layer and network layer. Nonetheless, researches on distributed learning for wireless communications are still in its infancy. In this paper, we review the contemporary technical applications of distributed learning for wireless communications. We first introduce the typical frameworks and algorithms for distributed learning. Examples of applications of distributed learning frameworks in the emerging wireless network paradigms are then provided. Finally, main research directions and challenges of distributed learning for wireless communications are discussed.

show abstract

Section: A Communication Costmentioning

confidence: 99%

Distributed Learning for Wireless Communications: Methods, Applications and Challenges

Qian

Yang

Xiao

et al. 2022

IEEE J. Sel. Top. Signal Process.

View full text Add to dashboard Cite

show abstract

“…To have a reasonable training time, researchers have proposed various techniques. For example, there are many approaches for speeding up the training procedure by improving scalability, such as large-batch training [6], [7], exploiting different forms of parallelism [8], asynchronous training [9], reducing communication during training [10], [11], and so on. Other approaches focus on the statistical efficiency of optimization algorithms to reduce the number of training iterations, such as AdaGrad [12], Adam [13], AdamW [14] and variance-reduced SGD [15], [16].…”

Section: Introductionmentioning

confidence: 99%

Using Multi-Resolution Data to Accelerate Neural Network Training in Scientific Applications

Wang

Lee

Balewski

et al. 2022

2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)

View full text Add to dashboard Cite

Neural networks are powerful solutions to many scientific applications; however, they usually suffer from long model training times due to the typical data size and model size being large. Research has been focused on developing numerical optimization algorithms and parallel processing to reduce the training time. In this work, we propose a multi-resolution strategy that can reduce the training time by training the model with the reduced-resolution data samples at the beginning and later switching to the original resolution data samples. This strategy is motivated by the fact that many scientific applications run faster when using a coarse version of the problem, for example, data whose resolution is reduced statistically. When applying the idea to neural network training, coarse data can have a similar effect on the learning curves at the early stage as the dense data but requires less time. Once the curves no longer improve significantly, our strategy switches to using the data in original resolution. We use two real-world scientific applications, CosmoFlow and DeepCAM, to evaluate the proposed mixedresolution training strategy. Our experiment results demonstrate that the proposed training strategy effectively reduces the end-toend training time while achieving a comparable accuracy to that of the training only with the original data. While maintaining the same model accuracy, our multi-resolution training strategy reduces the end-to-end training time up to 30% and 23% for CosmoFlow and DeepCAM, respectively.

show abstract

Iteration number-based hierarchical gradient aggregation for distributed deep learning

Xiao

Zhou

et al. 2021

J Supercomput

View full text Add to dashboard Cite

Standard Deviation Based Adaptive Gradient Compression For Distributed Deep Learning

Cited by 4 publications

References 13 publications

Distributed Learning for Wireless Communications: Methods, Applications and Challenges

Distributed Learning for Wireless Communications: Methods, Applications and Challenges

Using Multi-Resolution Data to Accelerate Neural Network Training in Scientific Applications

Iteration number-based hierarchical gradient aggregation for distributed deep learning

Contact Info

Product

Resources

About