Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining 2019
DOI: 10.1145/3292500.3330936
|View full text |Cite
|
Sign up to set email alerts
|

ADMM for Efficient Deep Learning with Global Convergence

Abstract: Alternating Direction Method of Multipliers (ADMM) has been used successfully in many conventional machine learning applications and is considered to be a useful alternative to Stochastic Gradient Descent (SGD) as a deep learning optimizer. However, as an emerging domain, several challenges remain, including 1) The lack of global convergence guarantees, 2) Slow convergence towards solutions, and 3) Cubic time complexity with regard to feature dimensions. In this paper, we propose a novel optimization framework… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
40
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 48 publications
(42 citation statements)
references
References 18 publications
2
40
0
Order By: Relevance
“…In this section, we present the numerical results of our algorithm. We follow the experimental setup introduced by [7]. Specifically, we consider the DNN training model (1) with ReLU activation, the squared loss, and the network architecture being an MLPs with hidden layers, on the two datasets, MNIST [25] and Fashion MNIST [26].…”
Section: Numerical Experimentsmentioning
confidence: 99%
See 1 more Smart Citation
“…In this section, we present the numerical results of our algorithm. We follow the experimental setup introduced by [7]. Specifically, we consider the DNN training model (1) with ReLU activation, the squared loss, and the network architecture being an MLPs with hidden layers, on the two datasets, MNIST [25] and Fashion MNIST [26].…”
Section: Numerical Experimentsmentioning
confidence: 99%
“…On the other hand, the convergence of stochastic training method is based on the Lipschitz continuous assumption on the gradient, which fails to hold for various applications. To overcome these drawbacks, papers [4,5,6,7] propose gradient free methods by the Alternating Direction Methods of Multipliers (ADMM) or Alternating Minimization. The core idea of this method is the decomposition of the training task into a sequence of substeps which are just related to onelayer activations.…”
Section: Introductionmentioning
confidence: 99%
“…Problem 1 has been addressed by deep learning Alternating Direction Method of Multipliers (dlADMM) [23]. However, parameters in one layer are dependent on its neighboring layers, and hence can not achieve parallelism.…”
Section: A Problem Formulationmentioning
confidence: 99%
“…However, ALM ignores the structure of the problem, so in practice, the performance of ALM is not competitive with specialized algorithms. For example, the alternating directions method of multipliers (ADMM) [18,19], which is based on ALM, is considered the most popular technique, capable of handling multiple blocks of variables in parallel [18,20,21], thereby exhibiting superior implementation efficiency for both convex and nonconvex problems [22][23][24], even for deep neural training [25,26]. However, whether ADMM can be applied to solving the nonlinear block coupled variables for nonconvex problems is still unknown.…”
Section: Introductionmentioning
confidence: 99%