2021
DOI: 10.3390/s21155124
|View full text |Cite
|
Sign up to set email alerts
|

DisSAGD: A Distributed Parameter Update Scheme Based on Variance Reduction

Abstract: Machine learning models often converge slowly and are unstable due to the significant variance of random data when using a sample estimate gradient in SGD. To increase the speed of convergence and improve stability, a distributed SGD algorithm based on variance reduction, named DisSAGD, is proposed in this study. DisSAGD corrects the gradient estimate for each iteration by using the gradient variance of historical iterations without full gradient computation or additional storage, i.e., it reduces the mean var… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 31 publications
(36 reference statements)
0
3
0
Order By: Relevance
“…For training, the time-frequency image is first scaled into a 640 × 640 image input to the network, the maximum number of training cycles (epochs) is 100, the batch size of each update is 16, and the SGD (Stochastic Gradient Descent) optimization algorithm [ 32 ] with default parameters is used for training. In the experiments, the algorithm is trained and tested using NVIDIA Titan V 12G, and the network model is implemented on PyTorch1.11.0 framework using Python language.…”
Section: Methodsmentioning
confidence: 99%
“…For training, the time-frequency image is first scaled into a 640 × 640 image input to the network, the maximum number of training cycles (epochs) is 100, the batch size of each update is 16, and the SGD (Stochastic Gradient Descent) optimization algorithm [ 32 ] with default parameters is used for training. In the experiments, the algorithm is trained and tested using NVIDIA Titan V 12G, and the network model is implemented on PyTorch1.11.0 framework using Python language.…”
Section: Methodsmentioning
confidence: 99%
“…Then came the improved Mini-Batch SGD (MBGD) algorithm, where MBGD computes the gradient and performs weight update by randomly selecting m data samples in the original data for each iteration. SGD has the advantage that each step relies only on a simple random sample gradient, so the computational consumption is only a fraction of that of the standard GD [4]. However, it has the disadvantage that a constant step size leads to slow convergence in the case of variance introduced by randomness.…”
Section: Introductionmentioning
confidence: 99%
“…Insufficient data for training, then the model may overfit during a large number of repeated training sessions, i.e., "the learned model is so well suited to a specific set of data that it does not reliably fit other data or future observations". In order to reduce the impact of overfitting [4,27] to some extent, a regularization term is added to the empirical risk to limit the complexity of the model, which constitutes a structural risk minimization [28] in the form of "loss function + regularization term" problem:…”
Section: Introductionmentioning
confidence: 99%