Learning One-hidden-layer Neural Networks with Landscape Design

Ge, Rong; Lee, Jason D.; Ma, Tengyu

doi:10.48550/arxiv.1711.00501

Cited by 57 publications

(117 citation statements)

References 7 publications

Supporting

Mentioning

110

Contrasting

Unclassified

Order By: Relevance

“…Nevertheless, for many modern ML models such as CNNs, (P-DALE) remains a non-convex program in θ. And while there is overwhelming theoretical and empirical evidence that stochastic gradientbased algorithms yield good local minimizers for such overparametrized problems [35][36][37][38][39], the fact remains that solving (P-DALE) requires us to evaluate an expectation with respect to λ , which is challenging due to the fact that µ n and γ n are not known a priori. In the remainder of this section, we propose a practical algorithm to solve (P-DALE) based on the approximation discussed in Section 3.…”

Section: Dual Robust Learning Algorithmmentioning

confidence: 99%

“…What is more, maximizing over δ in the definition of adv is a severely underparametrized problem as opposed to the minimization over θ in (P-RO). It therefore does not enjoy the same benign optimization landscape [35][36][37][38][39]. Additionally, note that there is no guarantee that this alternating optimization technique converges.…”

Section: A2 Sampling Vs Optimizing Pertubationsmentioning

confidence: 99%

“…While gradient-based methods have been shown to be empirically effective at finding perturbations that lead to misclassification, there are no guarantees that these perturbations are truly worst-case due to the non-convexity of most commonlyused ML function classes [34]. Moreover, whereas optimizing the parameters of a DNNs is typically an overparameterized problem, finding worst-case perturbations is severely underparametrized and therefore does not enjoy the benign optimization landscape of standard training [35][36][37][38][39]. For this reason, state-of-the-art adversarial attacks increasingly rely on heuristics such as random initializations, multiple restarts, pruning, and other ad hoc training procedures [40][41][42][43][44][45][46][47][48][49].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Adversarial Robustness with Semi-Infinite Constrained Learning

Robey¹,

Chamon²,

Pappas³

et al. 2021

Preprint

View full text Add to dashboard Cite

Despite strong performance in numerous applications, the fragility of deep learning to input perturbations has raised serious questions about its use in safety-critical domains. While adversarial training can mitigate this issue in practice, state-ofthe-art methods are increasingly application-dependent, heuristic in nature, and suffer from fundamental trade-offs between nominal performance and robustness. Moreover, the problem of finding worst-case perturbations is non-convex and underparameterized, both of which engender a non-favorable optimization landscape. Thus, there is a gap between the theory and practice of adversarial training, particularly with respect to when and why adversarial training works. In this paper, we take a constrained learning approach to address these questions and to provide a theoretical foundation for robust learning. In particular, we leverage semi-infinite optimization and non-convex duality theory to show that adversarial training is equivalent to a statistical problem over perturbation distributions, which we characterize completely. Notably, we show that a myriad of previous robust training techniques can be recovered for particular, sub-optimal choices of these distributions. Using these insights, we then propose a hybrid Langevin Monte Carlo approach of which several common algorithms (e.g., PGD) are special cases. Finally, we show that our approach can mitigate the trade-off between nominal and robust performance, yielding state-of-the-art results on MNIST and CIFAR-10. Our code is available at: https://github.com/arobey1/advbench.

show abstract

Section: Dual Robust Learning Algorithmmentioning

confidence: 99%

Section: A2 Sampling Vs Optimizing Pertubationsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Adversarial Robustness with Semi-Infinite Constrained Learning

Robey¹,

Chamon²,

Pappas³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Instead, many theoretical works focus on finding a local minimum instead of a global one, because recent works (both empirical and theoretical) suggested that local minima are nearly as good as global minima for a significant amount of well-studied machine learning problems; see e.g. [4,11,13,14,16,17]. On the other hand, saddle points are major obstacles for solving these problems, not only because they are ubiquitous in high-dimensional settings where the directions for escaping may be few (see e.g.…”

Section: Introductionmentioning

confidence: 99%

Escape saddle points by a simple gradient-descent based algorithm

Zhang¹,

2021

Preprint

View full text Add to dashboard Cite

Escaping saddle points is a central research topic in nonconvex optimization. In this paper, we propose a simple gradient-based algorithm such that for a smooth function f : R n → R, it outputs an -approximate second-order stationary point in Õ(log n/ 1.75 ) iterations. Compared to the previous state-of-the-art algorithms by Jin et al. with Õ(log 4 n/ 2 ) or Õ(log 6 n/ 1.75 ) iterations, our algorithm is polynomially better in terms of log n and matches their complexities in terms of 1/ . For the stochastic setting, our algorithm outputs an -approximate second-order stationary point in Õ(log 2 n/ 4 ) iterations. Technically, our main contribution is an idea of implementing a robust Hessian power method using only gradients, which can find negative curvature near saddle points and achieve the polynomial speedup in log n compared to the perturbed gradient descent methods. Finally, we also perform numerical experiments that support our results.

show abstract

“…Towards mitigating the degradation, we identify a critical issue in CQL: solely regularizing the critic is insufficient for multiple agents to learn good policies for coordination in the offline setting. The primary cause is that first-order policy gradient methods are prone to local optima [14,36,46], saddle points [52,54], or noisy gradient estimates [51]. As a result, this can lead to uncoordinated suboptimal learning behavior because the actor cannot leverage the global information in the critic well.…”

Section: Introductionmentioning

confidence: 99%

Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification

Pan¹,

Huang²,

Ma³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

The idea of conservatism has led to significant progress in offline reinforcement learning (RL) where an agent learns from pre-collected datasets. However, it is still an open question to resolve offline RL in the more practical multi-agent setting as many real-world scenarios involve interaction among multiple agents. Given the recent success of transferring online RL algorithms to the multi-agent setting, one may expect that offline RL algorithms will also transfer to multi-agent settings directly. Surprisingly, when conservatism-based algorithms are applied to the multi-agent setting, the performance degrades significantly with an increasing number of agents. Towards mitigating the degradation, we identify that a key issue that the landscape of the value function can be non-concave and policy gradient improvements are prone to local optima. Multiple agents exacerbate the problem since the suboptimal policy by any agent could lead to uncoordinated global failure. Following this intuition, we propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), to tackle this critical challenge via an effective combination of first-order policy gradient and zeroth-order optimization methods for the actor to better optimize the conservative value function. Despite the simplicity, OMAR significantly outperforms strong baselines with state-of-the-art performance in multi-agent continuous control benchmarks.

show abstract

Learning One-hidden-layer Neural Networks with Landscape Design

Cited by 57 publications

References 7 publications

Adversarial Robustness with Semi-Infinite Constrained Learning

Adversarial Robustness with Semi-Infinite Constrained Learning

Escape saddle points by a simple gradient-descent based algorithm

Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification

Contact Info

Product

Resources

About