Deep relaxation: partial differential equations for optimizing deep neural networks

Chaudhari, Pratik; Oberman, Adam M.; Osher, Stanley; Soatto, Stefano; Carlier, Guillaume

doi:10.1007/s40687-018-0148-y

Cited by 94 publications

(104 citation statements)

References 47 publications

Supporting

Mentioning

102

Contrasting

Unclassified

Order By: Relevance

“…We set Y−1 = Y0 to denote the initial condition. Similar to the symplectic integration in (10), this scheme is reversible. We show that the second-order network is stable in the sense of (6) when we assume stationary weights.…”

Section: Hyperbolic Cnnsmentioning

confidence: 99%

Deep Neural Networks Motivated by Partial Differential Equations

2019

View full text Add to dashboard Cite

Partial differential equations (PDEs) are indispensable for modeling many physical phenomena and also commonly used for solving image processing tasks. In the latter area, PDE-based approaches interpret image data as discretizations of multivariate functions and the output of image processing algorithms as solutions to certain PDEs. Posing image processing problems in the infinite dimensional setting provides powerful tools for their analysis and solution. Over the last few decades, the reinterpretation of classical image processing problems through the PDE lens has been creating multiple celebrated approaches that benefit a vast area of tasks including image segmentation, denoising, registration, and reconstruction.In this paper, we establish a new PDE-interpretation of a class of deep convolutional neural networks (CNN) that are commonly used to learn from speech, image, and video data. Our interpretation includes convolution residual neural networks (ResNet), which are among the most promising approaches for tasks such as image classification having improved the state-of-the-art performance in prestigious benchmark challenges. Despite their recent successes, deep ResNets still face some critical challenges associated with their design, immense computational costs and memory requirements, and lack of understanding of their reasoning.Guided by well-established PDE theory, we derive three new ResNet architectures that fall into two new classes: parabolic and hyperbolic CNNs. We demonstrate how PDE theory can provide new insights and algorithms for deep learning and demonstrate the competitiveness of three new CNN architectures using numerical experiments.

show abstract

Section: Hyperbolic Cnnsmentioning

confidence: 99%

Deep Neural Networks Motivated by Partial Differential Equations

2019

View full text Add to dashboard Cite

show abstract

“…In [33], the authors proposed a Lipschitz regularization term to the optimization problem and showed (theoretically) that the output of the regularized network converges to the correct classifier when the data satisfies certain conditions. In addition, there are several recent works that have made connections between optimization in deep learning and numerical methods for partial differential equations, in particular, the entropy-based stochastic gradient descent [6] and a Hamilton-Jacobi relaxation [7]. For a review of some other recent mathematical approaches to DNN, see [45] and the citations within.…”

Section: Introductionmentioning

confidence: 99%

Forward Stability of ResNet and Its Variants

Zhang

Schaeffer

2019

J Math Imaging Vis

View full text Add to dashboard Cite

The residual neural network (ResNet) is a popular deep network architecture which has the ability to obtain high-accuracy results on several image processing problems. In order to analyze the behavior and structure of ResNet, recent work has been on establishing connections between ResNets and continuoustime optimal control problems. In this work, we show that the post-activation ResNet is related to an optimal control problem with differential inclusions, and provide continuous-time stability results for the differential inclusion associated with ResNet. Motivated by the stability conditions, we show that alterations of either the architecture or the optimization problem can generate variants of ResNet which improve the theoretical stability bounds. In addition, we establish stability bounds for the full (discrete) network associated with two variants of ResNet, in particular, bounds on the growth of the features and a measure of the sensitivity of the features with respect to perturbations. These results also help to show the relationship between the depth, regularization, and stability of the feature space. Computational experiments on the proposed variants show that the accuracy of ResNet is preserved and that the accuracy seems to be monotone with respect to the depth and various corruptions.

show abstract

“…We will now focus on the third term in (4.6). Using Hölder's inequality together with Doob's L p inequality, see for instance [14,Theorem 1,§3,p.20], and A3, we get…”

Section: 1mentioning

confidence: 99%

“…where Σ is a covariance matrix and dW t is a standard m-dimensional Wiener process defined on a probability space. The idea of approximating stochastic gradient descent with a continuous time process has been noted by several authors, see [3,4,6,13,30,31]. A special case of what we prove in this paper, see Theorem 2.7 below, is that the stochastic gradient descent (1.7) used to minimize the risk for the ResNet model in (1.5) converges to the stochastic gradient descent used to minimize the risk for the Neural ODE model in (1.4).…”

Section: Introductionmentioning

confidence: 99%

Neural ODEs as the deep limit of ResNets with constant weights

Avelin

Nyström

2020

Anal. Appl.

View full text Add to dashboard Cite

In this paper we prove that, in the deep limit, the stochastic gradient descent on a ResNet type deep neural network, where each layer share the same weight matrix, converges to the stochastic gradient descent for a Neural ODE and that the corresponding value/loss functions converge. Our result gives, in the context of minimization by stochastic gradient descent, a theoretical foundation for considering Neural ODEs as the deep limit of ResNets. Our proof is based on certain decay estimates for associated Fokker-Planck equations.2010 Primary: 68T05, 65L20, Secondary: 34A45, 35Q84, 62F10, 60H10

show abstract

Deep relaxation: partial differential equations for optimizing deep neural networks

Cited by 94 publications

References 47 publications

Deep Neural Networks Motivated by Partial Differential Equations

Deep Neural Networks Motivated by Partial Differential Equations

Forward Stability of ResNet and Its Variants

Neural ODEs as the deep limit of ResNets with constant weights

Contact Info

Product

Resources

About