Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 2019
DOI: 10.24963/ijcai.2019/354
|View full text |Cite
|
Sign up to set email alerts
|

Zeroth-Order Stochastic Alternating Direction Method of Multipliers for Nonconvex Nonsmooth Optimization

Abstract: Alternating direction method of multipliers (ADMM) is a popular optimization tool for the composite and constrained problems in machine learning. However, in many machine learning problems such as black-box learning and bandit feedback, ADMM could fail because the explicit gradients of these problems are difficult or even infeasible to obtain. Zeroth-order (gradient-free) methods can effectively solve these problems due to that the objective function values are only required in the optimization. Recently, thou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
31
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 30 publications
(31 citation statements)
references
References 22 publications
0
31
0
Order By: Relevance
“…where each g i is evaluated only on the subset y i ∈ R mi of the full data y. PnP-ADMM is often impractical when b is very large due to the complexity of computing prox γg . As shown in Algorithm 1, the proposed IPA algorithm extends stochastic variants of traditional ADMM [31]- [35] by integrating denoisers D σ that are not associated with any h. Its per-iteration complexity is independent of the number of data blocks b, since it processes only a single component function g i at every iteration.…”
Section: Incremental Pnp-admmmentioning
confidence: 99%
See 1 more Smart Citation
“…where each g i is evaluated only on the subset y i ∈ R mi of the full data y. PnP-ADMM is often impractical when b is very large due to the complexity of computing prox γg . As shown in Algorithm 1, the proposed IPA algorithm extends stochastic variants of traditional ADMM [31]- [35] by integrating denoisers D σ that are not associated with any h. Its per-iteration complexity is independent of the number of data blocks b, since it processes only a single component function g i at every iteration.…”
Section: Incremental Pnp-admmmentioning
confidence: 99%
“…Scalable optimization algorithms have become increasingly important in the context of large-scale problems arising in machine learning and data science [30]. Stochastic and online optimization techniques have been investigated for traditional ADMM [31]- [35], where prox γg is approximated using a subset of observations (with or without subsequent linearization). Our work contributes to this area by investigating the scalability of PnP-ADMM that is not minimizing any explicit objective function.…”
Section: Introductionmentioning
confidence: 99%
“…In particular, stochastic gradient-descent (SGD) [37] and its variants have drawn a lot of research interests. Researchers have also applied variance reduction techniques for ADMM, such as [38], [39]. However, they all require large storage to keep past gradients, which can be problematic in large multitask learning.…”
Section: B Related Workmentioning
confidence: 99%
“…[44] have investigated the well-known SVRG strategy applied in stochastic ADMM. [45] has extensively studied stochastic ADMM combined with various variance reduction strategies for nonconvex problems. Specifically, SVRG, stochastic average gradient (SAG) and the extension of SAG (SAGA) have been incorporated into ADMM method, which has resulted in SVRG-ADMM, SAG-ADMM and SAGA-ADMM respectively.…”
mentioning
confidence: 99%
“…In [44], SVRG-ADMM has been further investigated for solving convex optimization problems with composite objective functions, which has achieved the convergence rates of 𝑂(log𝑆/𝑆) for strongly convex and Lipschitz smooth objective functions, and 𝑂(1/√𝑆 ) for convex objective functions without Lipschitz smoothness. In [46], a novel stochastic ADMM has been proposed, its main ingredient is to combine classical stochastic ADMM with gradient free and variance reduction strategy.…”
mentioning
confidence: 99%