The Differentiable Cross-Entropy Method

Amos, Brandon; Yarats, Denis

doi:10.48550/arxiv.1909.12830

Cited by 3 publications

(12 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, [2,6] showed how to efficiently differentiate through convex cone programs by applying the implicit function theorem to a residual map introduced in [27], and [1] showed how to differentiate through convex optimization problems by an automatable reduction to convex cone programs; our method for learning convex optimization models builds on this recent work. Optimization layers have been used in many applications, including control [7,11,15,3], game-playing [46,45], computer graphics [37], combinatorial tasks [58,52,53,21], automatic repair of optimization problems [14], and data fitting more generally [9,17,16,10]. Differentiable optimization for nonconvex problems is often performed numerically by differentiating each individual step of a numerical solver [33,48,32,36], although sometimes it is done implicitly; see, e.g., [7,47,4].…”

Section: Related Workmentioning

confidence: 99%

Learning Convex Optimization Models

Agrawal¹,

Barratt²,

Boyd³

2020

Preprint

View full text Add to dashboard Cite

A convex optimization model predicts an output from an input by solving a convex optimization problem. The class of convex optimization models is large, and includes as special cases many well-known models like linear and logistic regression. We propose a heuristic for learning the parameters in a convex optimization model from a dataset of input-output pairs, using recently developed methods for differentiating the solution of a convex optimization problem with respect to its parameters. We describe three general classes of convex optimization models, maximum a posteriori (MAP) models, utility maximization models, and agent models, and present a numerical experiment for each.

show abstract

Section: Related Workmentioning

confidence: 99%

Learning Convex Optimization Models

Agrawal¹,

Barratt²,

Boyd³

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Relation to Differentiable Cross-Entropy: Particular importance should be given to a recent paper [9], since, to the best of our knowledge, is the first to suggest sampling-based optimization instead of gradient descent, and features some similarities with our approach. The authors in [9] propose a differentiable approximation of the cross-entropy method (CEM) [21,22], called differentiable cross-entropy (DCEM). To obtain this approximation, they need to approximate CEM's eliteness threshold operation, which is non-differentiable.…”

Section: Further Background and Related Workmentioning

confidence: 99%

“…With respect to the latter, a significant amount of attention has been devoted to incorporating optimization blocks or modules operating at some part of the network. This has been motivated by large number of applications, including meta-learning [1][2][3], differentiable physics simulators [4], classification [5], GANs [6], reinforcement learning with constraints, latent spaces, or safety [7][8][9][10], model predictive control [11,12], as well as tasks relying on the use of energy networks [13,3], among many others. Local 2 optimization modules lead to nested optimization operations, as they interact with the global, end-to-end training of the network that contains them.…”

Section: Introductionmentioning

confidence: 99%

“…One drawback in using this unrolled gradient descent operation however is the fact that doing so can lead to over-fitting to the selected gradient descent hyper-parameters, such as learning rate and number of iterations. Recently, a paper demonstrated promising results in alleviating this phenomenon by replacing these iterations of gradient descent by iterations of sampling-based optimization, in particular a differentiable approximation of the cross-entropy method [9]. While still unrolling the graph created by the fixed number of iterations, [9] showed empirically that no overfitting to the hyper-parameters occurred by performing inference on the trained network with altered inner-loop optimization hyper-parameters.…”

Section: Introductionmentioning

confidence: 99%

“…Recently, a paper demonstrated promising results in alleviating this phenomenon by replacing these iterations of gradient descent by iterations of sampling-based optimization, in particular a differentiable approximation of the cross-entropy method [9]. While still unrolling the graph created by the fixed number of iterations, [9] showed empirically that no overfitting to the hyper-parameters occurred by performing inference on the trained network with altered inner-loop optimization hyper-parameters. Another significant bottleneck in all methods involving graph unrolling is the number of iterations, which has to be kept low to prevent a prohibitively large graph during backprop, which is hard to train.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

NOVAS: Non-convex Optimization via Adaptive Stochastic Search for End-to-End Learning and Control

Exarchos,

Pereira,

Wang

et al. 2020

Preprint

View full text Add to dashboard Cite

In this work we propose the use of adaptive stochastic search as a building block for general, non-convex optimization operations within deep neural network architectures. Specifically, for an objective function located at some layer in the network and parameterized by some network parameters, we employ adaptive stochastic search to perform optimization over its output. This operation is differentiable and does not obstruct the passing of gradients during backpropagation, thus enabling us to incorporate it as a component in end-to-end learning. We study the proposed optimization module's properties and benchmark it against two existing alternatives on a synthetic energy-based structured prediction task, and further showcase its use in stochastic optimal control applications. * Equal contribution. 2 To distinguish between the optimization of the entire network as opposed to that of the optimization module, we frequently refer to the former as global or outer-loop optimization and to the latter as local or inner-loop optimization. Preprint. Under review.

show abstract

Scaling up and Stabilizing Differentiable Planning with Implicit Differentiation

Zhao¹,

Xu²,

Wong³

2022

Preprint

View full text Add to dashboard Cite

Differentiable planning promises end-to-end differentiability and adaptivity. However, an issue prevents it from scaling up to larger-scale problems: they need to differentiate through forward iteration layers to compute gradients, which couples forward computation and backpropagation and needs to balance forward planner performance and computational cost of the backward pass. To alleviate this issue, we propose to differentiate through the Bellman fixed-point equation to decouple forward and backward passes for Value Iteration Network and its variants, which enables constant backward cost (in planning horizon) and flexible forward budget and helps scale up to large tasks. We study the convergence stability, scalability, and efficiency of the proposed implicit version of VIN and its variants and demonstrate their superiorities on a range of planning tasks: 2D navigation, visual navigation, and 2-DOF manipulation in configuration space and workspace.

show abstract

The Differentiable Cross-Entropy Method

Cited by 3 publications

References 48 publications

Learning Convex Optimization Models

Learning Convex Optimization Models

NOVAS: Non-convex Optimization via Adaptive Stochastic Search for End-to-End Learning and Control

Scaling up and Stabilizing Differentiable Planning with Implicit Differentiation

Contact Info

Product

Resources

About