Structure-preserving deep learning

Celledoni, Elena; Ehrhardt, Matthias J.; Etmann, Christian; McLachlan, Robert I.; Owren, Brynjulf; Schönlieb, Carola-Bibiane; Sherry, Ferdia

doi:10.1017/s0956792521000139

Cited by 28 publications

(20 citation statements)

References 72 publications

(101 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Differential geometry plays a fundamental role in applied mathematics, statistics, and computer science, including numerical integration [1][2][3][4][5], optimisation [6][7][8][9][10][11], sampling [12][13][14][15][16], statistics on spaces with deep learning [17,18], medical imaging and shape methods [19,20], interpolation [21], and the study of random maps [22], to name a few. Of particular relevance to this chapter is information geometry, i.e., the differential geometric treatment of smooth statistical manifolds, whose origin stems from a seminal article by Rao [23] who introduced the Fisher metric tensor on parametrised statistical models, and thus a natural Riemannian geometry that was later observed to correspond to an infinitesimal distance with respect to the Kullback-Leibler (KL) divergence [24].…”

Section: Introductionmentioning

confidence: 99%

Geometric Methods for Sampling, Optimisation, Inference and Adaptive Agents

Barp,

Da Costa,

França

et al. 2022

Preprint

View full text Add to dashboard Cite

In this chapter, we identify fundamental geometric structures that underlie the problems of sampling, optimisation, inference and adaptive decision-making. Based on this identification, we derive algorithms that exploit these geometric structures to solve these problems efficiently. We show that a wide range of geometric theories emerge naturally in these fields, ranging from measure-preserving processes, information divergences, Poisson geometry, and geometric integration. Specifically, we explain how (i) leveraging the symplectic geometry of Hamiltonian systems enable us to construct (accelerated) sampling and optimisation methods, (ii) the theory of Hilbertian subspaces and Stein operators provides a general methodology to obtain robust estimators, (iii) preserving the information geometry of decision-making yields adaptive agents that perform active inference. Throughout, we emphasise the rich connections between these fields; e.g., inference draws on sampling and optimisation, and adaptive decision-making assesses decisions by inferring their counterfactual consequences. Our exposition provides a conceptual overview of underlying ideas, rather than a technical discussion, which can be found in the references herein.

show abstract

Section: Introductionmentioning

confidence: 99%

Geometric Methods for Sampling, Optimisation, Inference and Adaptive Agents

Barp,

Da Costa,

França

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…• First of all, we ought to suppose that x (i) = x (j) for i = j. Now, due to the uniqueness of Lipschitz-nonlinear ODEs (in both directions of time), trajectories corresponding to different initial data cannot cross 34 . Hence, in the context of binary classification tasks for instance (namely, where f is the characteristic function of some set), if the original dataset is not linearly separable, one cannot separate the dataset by a controlled neural ODE flow in a way that the underlying topology of the data (namely, the unknown function f ) is captured and generalized.…”

Section: Remark 105 (Time-irreversible Equations)mentioning

confidence: 99%

“…The neural ODE formalism of deep learning has been used to great effect in several machine learning contexts. To name a few, these include the use of adaptive ODE solvers ( [35,49,107]) and symplectic schemes ( [34]) for efficient training, the use of indirect training algorithms based on the Pontryagin Maximum Principle ( [119,15]), image superresolution ( [92]), as well as unsupervised learning and generative modeling ( [72,137]). The origins of continuous-time supervised learning date back at least to [117], in which the backpropagation method is connected to the adjoint method.…”

Section: Remark 105 (Time-irreversible Equations)mentioning

confidence: 99%

Turnpike in optimal control of PDEs, ResNets, and beyond

Geshkovski¹,

Zuazua²

2022

Preprint

View full text Add to dashboard Cite

The turnpike property in contemporary macroeconomics asserts that if an economic planner seeks to move an economy from one level of capital to another, then the most efficient path, as long as the planner has enough time, is to rapidly move stock to a level close to the optimal stationary or constant path, then allow for capital to develop along that path until the desired term is nearly reached, at which point the stock ought to be moved to the final target. Motivated in part by its nature as a resource allocation strategy, over the past decade, the turnpike property has also been shown to hold for several classes of partial differential equations arising in mechanics. When formalized mathematically, the turnpike theory corroborates the insights from economics: for an optimal control problem set in a finite-time horizon, optimal controls and corresponding states, are close (often exponentially), during most of the time, except near the initial and final time, to the optimal control and corresponding state for the associated stationary optimal control problem. In particular, the former are mostly constant over time. This fact provides a rigorous meaning to the asymptotic simplification that some optimal control problems appear to enjoy over long time intervals, allowing the consideration of the corresponding stationary problem for computing and applications. We review a slice of the theory developed over the past decade -the controllability of the underlying system is an important ingredient, and can even be used to devise simple turnpike-like strategies which are nearly optimal-, and present several novel applications, including, among many others, the characterization of Hamilton-Jacobi-Bellman asymptotics, and stability estimates in deep learning via residual neural networks.

show abstract

“…In particular, they take as a point of departure this variational approach that captures acceleration in continuous time considering a particular type of time-dependent Lagrangian functions, called Bregman Lagrangians (see Section 2), In a recent paper [3], the authors introduce symplectic integrators (and also presymplectic integrators) in the integration of the differential equations associated with accelerated optimizations methods (see references [27,12,5] for an introduction to symplectic integration). In [3] the authors uses the Hamiltonian formalism since it is possible to extend the phase space formalism to turn the system into a time-independent hamiltonian systems and apply there the standard symplectic techniques (see [20,9]). See recent improvements of this approach using adaptative hamiltonian variational integrators [11].…”

Section: Introductionmentioning

confidence: 99%

A Discrete Variational Derivation of Accelerated Methods in Optimization

Campos¹,

Mahillo²,

Diego³

2021

Preprint

View full text Add to dashboard Cite

Many of the new developments in machine learning are connected with gradient-based optimization methods. Recently, these methods have been studied using a variational perspective [3]. This has opened up the possibility of introducing variational and symplectic integration methods using geometric integrators. In particular, in this paper, we introduce variational integrators [19] which allow us to derive different methods for optimization. Using both, Hamilton's principle and Lagrange-d'Alembert's, we derive two families of optimization methods in one-to-one correspondence that generalize Polyak's heavy ball and the well known Nesterov accelerated gradient method [23], mimicking the behavior of the latter which reduces the oscillations of typical momentum methods. However, since the systems considered are explicitly time-dependent, the preservation of symplecticity of autonomous systems occurs here solely on the fibers.

show abstract

Structure-preserving deep learning

Cited by 28 publications

References 72 publications

Geometric Methods for Sampling, Optimisation, Inference and Adaptive Agents

Geometric Methods for Sampling, Optimisation, Inference and Adaptive Agents

Turnpike in optimal control of PDEs, ResNets, and beyond

A Discrete Variational Derivation of Accelerated Methods in Optimization

Contact Info

Product

Resources

About