It is by now well-known that practical deep supervised learning may roughly be cast as an optimal control problem for a specific discrete-time, nonlinear dynamical system called an artificial neural network. In this work, we consider the continuous-time formulation of the deep supervised learning problem, and study the latter's behavior when the final time horizon increases, a fact that can be interpreted as increasing the number of layers in the neural network setting.When considering the classical regularized empirical risk minimization problem, we show that, in long time, the optimal states converge to zero training error, namely approach the zero training error regime, whilst the optimal control parameters approach, on an appropriate scale, minimal norm parameters with corresponding states precisely in the zero training error regime. This result provides an alternative theoretical underpinning to the notion that neural networks learn best in the overparametrized regime, when seen from the large layer perspective.We also propose a learning problem consisting of minimizing a cost with a state tracking term, and establish the well-known turnpike property, which indicates that the solutions of the learning problem in long time intervals consist of three pieces, the first and the last of which being transient short-time arcs, and the middle piece being a long-time arc staying exponentially close to the optimal solution of an associated static learning problem. This property in fact stipulates a quantitative estimate for the number of layers required to reach the zero training error regime.Both of the aforementioned asymptotic regimes are addressed in the context of continuous-time and continuous space-time neural networks, the latter taking the form of nonlinear, integro-differential equations, hence covering residual neural networks with both fixed and possibly variable depths. Contents 1. Introduction 2 2. A roadmap to continuous-time supervised learning 8 3. Asymptotics without tracking 13 4. Asymptotics with tracking 25 5. The zero training error regime 47 Date: August 7, 2020.
In this paper, we study the evolution problem ⎧ ⎪ ⎨ ⎪ ⎩ ut(x, t) − λj(D 2 u(x, t)) = 0, in Ω × (0, +∞), u(x, t) = g(x, t), on ∂Ω × (0, +∞), u(x, 0) = u0(x), in Ω, where Ω is a bounded domain in R N (which verifies a suitable geometric condition on its boundary) and λj(D 2 u) stands for the jth eigenvalue of the Hessian matrix D 2 u. We assume that u0 and g are continuous functions with the compatibility condition u0(x) = g(x, 0), x ∈ ∂Ω. We show that the (unique) solution to this problem exists in the viscosity sense and can be approximated by the value function of a two-player zero-sum game as the parameter measuring the size of the step that we move in each round of the game goes to zero. In addition, when the boundary datum is independent of time, g(x, t) = g(x), we show that viscosity solutions to this evolution problem stabilize and converge exponentially fast to the unique stationary solution as t → ∞. For j = 1, the limit profile is just the convex envelope inside Ω of the boundary datum g, while for j = N , it is the concave envelope. We obtain this result with two different techniques: with partial differential equations (PDE) tools and with game-theoretical arguments. Moreover, in some special cases (for affine boundary data), we can show that solutions coincide with the stationary solution in finite time (which depends only on Ω and not on the initial condition u0).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.