ADMM for Efficient Deep Learning with Global Convergence

Wang, Junxiang; Yu, F. Richard; Chen, Xiang; Zhao, Liang

doi:10.1145/3292500.3330936

Cited by 48 publications

(42 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we present the numerical results of our algorithm. We follow the experimental setup introduced by [7]. Specifically, we consider the DNN training model (1) with ReLU activation, the squared loss, and the network architecture being an MLPs with hidden layers, on the two datasets, MNIST [25] and Fashion MNIST [26].…”

Section: Numerical Experimentsmentioning

confidence: 99%

“…On the other hand, the convergence of stochastic training method is based on the Lipschitz continuous assumption on the gradient, which fails to hold for various applications. To overcome these drawbacks, papers [4,5,6,7] propose gradient free methods by the Alternating Direction Methods of Multipliers (ADMM) or Alternating Minimization. The core idea of this method is the decomposition of the training task into a sequence of substeps which are just related to onelayer activations.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Inertial Proximal Deep Learning Alternating Minimization for Efficient Neutral Network Training

Qiao

Sun

Pan

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

In recent years, the Deep Learning Alternating Minimization (DLAM), which is actually the alternating minimization applied to the penalty form of the deep neutral networks training, has been developed as an alternative algorithm to overcome several drawbacks of Stochastic Gradient Descent (SGD) algorithms. This work develops an improved DLAM by the well-known inertial technique, namely iPDLAM, which predicts a point by linearization of current and last iterates. To obtain further training speed, we apply a warmup technique to the penalty parameter, that is, starting with a small initial one and increasing it in the iterations. Numerical results on real-world datasets are reported to demonstrate the efficiency of our proposed algorithm.

show abstract

Section: Numerical Experimentsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Inertial Proximal Deep Learning Alternating Minimization for Efficient Neutral Network Training

Qiao

Sun

Pan

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…Problem 1 has been addressed by deep learning Alternating Direction Method of Multipliers (dlADMM) [23]. However, parameters in one layer are dependent on its neighboring layers, and hence can not achieve parallelism.…”

Section: A Problem Formulationmentioning

confidence: 99%

Toward Model Parallelism for Deep Neural Network Based on Gradient-Free ADMM Framework

Wang

Chai

Cheng

et al. 2020

2020 IEEE International Conference on Data Mining (ICDM)

Self Cite

View full text Add to dashboard Cite

The Graph Augmented Multi-Layer Perceptron (GA-MLP) model is an attractive alternative to Graph Neural Networks (GNNs). This is because it is resistant to the oversmoothing problem, and deeper GA-MLP models yield better performance. GA-MLP models are traditionally optimized by the Stochastic Gradient Descent (SGD). However, SGD suffers from the layer dependency problem, which prevents the gradients of different layers of GA-MLP models from being calculated in parallel. In this paper, we propose a parallel deep learning Alternating Direction Method of Multipliers (pdADMM) framework to achieve model parallelism: parameters in each layer of GA-MLP models can be updated in parallel. The extended pdADMM-Q algorithm reduces communication cost by utilizing the quantization technique. Theoretical convergence to a critical point of the pdADMM algorithm and the pdADMM-Q algorithm is provided with a sublinear convergence rate o(1/k). Extensive experiments in six benchmark datasets demonstrate that the pdADMM can lead to high speedup, and outperforms all the existing state-of-the-art comparison methods. 1

show abstract

“…However, ALM ignores the structure of the problem, so in practice, the performance of ALM is not competitive with specialized algorithms. For example, the alternating directions method of multipliers (ADMM) [18,19], which is based on ALM, is considered the most popular technique, capable of handling multiple blocks of variables in parallel [18,20,21], thereby exhibiting superior implementation efficiency for both convex and nonconvex problems [22][23][24], even for deep neural training [25,26]. However, whether ADMM can be applied to solving the nonlinear block coupled variables for nonconvex problems is still unknown.…”

Section: Introductionmentioning

confidence: 99%

Training Logical Neural Networks by Primal–Dual Methods for Neuro-Symbolic Reasoning

Khan

Riegel²,

Horesh³

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Parametrized machine learning models for inference often include non-linear and nonconvex constraints over the parameters and meta-parameters. Training these models to convergence is in general difficult, and naive methods such as projected gradient descent or grid search are not easily able to enforce the functional constraints. This work explores the optimization of a constrained neural network (familiar from machine learning but with parameter constraints), in the service of neuro-symbolic logical reasoning. Logical neural networks (LNNs) provide a well-justified, interpretable example of training under non-trivial constraints. In this paper, we propose a unified framework for solving this nonlinear programming problem by leveraging primal-dual optimization methods, and quantify the corresponding convergence rate to the Karush-Kuhn-Tucker (KKT) points of this problem. Extensive numerical results on both a toy example and training an LNN over real datasets validate the efficacy of the method.

show abstract

ADMM for Efficient Deep Learning with Global Convergence

Cited by 48 publications

References 18 publications

Inertial Proximal Deep Learning Alternating Minimization for Efficient Neutral Network Training

Inertial Proximal Deep Learning Alternating Minimization for Efficient Neutral Network Training

Toward Model Parallelism for Deep Neural Network Based on Gradient-Free ADMM Framework

Training Logical Neural Networks by Primal–Dual Methods for Neuro-Symbolic Reasoning

Contact Info

Product

Resources

About