We derive upper bounds on the complexity of ReLU neural networks approximating the solution maps of parametric partial differential equations. In particular, without any knowledge of its concrete shape, we use the inherent low-dimensionality of the solution manifold to obtain approximation rates which are significantly superior to those provided by classical approximation results. We use this low dimensionality to guarantee the existence of a reduced basis. Then, for a large variety of parametric partial differential equations, we construct neural networks that yield approximations of the parametric maps not suffering from a curse of dimension and essentially only depending on the size of the reduced basis.
We derive upper bounds on the complexity of ReLU neural networks approximating the solution maps of parametric partial differential equations. In particular, without any knowledge of its concrete shape, we use the inherent low dimensionality of the solution manifold to obtain approximation rates which are significantly superior to those provided by classical neural network approximation results. Concretely, we use the existence of a small reduced basis to construct, for a large variety of parametric partial differential equations, neural networks that yield approximations of the parametric solution maps in such a way that the sizes of these networks essentially only depend on the size of the reduced basis.
We analyze the topological properties of the set of functions that can be implemented by neural networks of a fixed size. Surprisingly, this set has many undesirable properties. It is highly non-convex, except possibly for a few exotic activation functions. Moreover, the set is not closed with respect to $$L^p$$ L p -norms, $$0< p < \infty $$ 0 < p < ∞ , for all practically used activation functions, and also not closed with respect to the $$L^\infty $$ L ∞ -norm for all practically used activation functions except for the ReLU and the parametric ReLU. Finally, the function that maps a family of weights to the function computed by the associated network is not inverse stable for every practically used activation function. In other words, if $$f_1, f_2$$ f 1 , f 2 are two functions realized by neural networks and if $$f_1, f_2$$ f 1 , f 2 are close in the sense that $$\Vert f_1 - f_2\Vert _{L^\infty } \le \varepsilon $$ ‖ f 1 - f 2 ‖ L ∞ ≤ ε for $$\varepsilon > 0$$ ε > 0 , it is, regardless of the size of $$\varepsilon $$ ε , usually not possible to find weights $$w_1, w_2$$ w 1 , w 2 close together such that each $$f_i$$ f i is realized by a neural network with weights $$w_i$$ w i . Overall, our findings identify potential causes for issues in the training procedure of deep learning such as no guaranteed convergence, explosion of parameters, and slow convergence.
In this review paper, we give a comprehensive overview of the large variety of approximation results for neural networks. Approximation rates for classical function spaces as well as benefits of deep neural networks over shallow ones for specifically structured function classes are discussed. While the main body of existing results is for general feedforward architectures, we also review approximation results for convolutional, residual and recurrent neural networks.
In this paper, we analyze the topological properties of the set of functions that can be implemented by neural networks of a fixed size. Surprisingly, this set has many undesirable properties: It is highly nonconvex, except possibly for a few exotic activation functions. Moreover, the set is not closed with respect to L p -norms, 0 < p < ∞, for all practically-used activation functions, and also not closed with respect to the L ∞ -norm for all practically-used activation functions except for the ReLU and the parametric ReLU. Finally, the function that maps a family of weights to the function computed by the associated network is not inverse stable, for every practically used activation function. In other words, if f1, f2 are two functions realized by neural networks and if f1, f2 are very close in the sense that f1 − f2 L ∞ ≤ ε, it is usually not possible to find weights w1, w2 close together such that each fi is realized by a neural network with weights wi. Overall, our findings identify potential causes for issues in the optimization of neural networks such as no guaranteed convergence, explosion of parameters, and very slow convergence.
We perform a comprehensive numerical study of the effect of approximation-theoretical results for neural networks on practical learning problems in the context of numerical analysis. As the underlying model, we study the machine-learning-based solution of parametric partial differential equations. Here, approximation theory for fully-connected neural networks predicts that the performance of the model should depend only very mildly on the dimension of the parameter space and is determined by the intrinsic dimension of the solution manifold of the parametric partial differential equation. We use various methods to establish comparability between test-cases by minimizing the effect of the choice of test-cases on the optimization and sampling aspects of the learning problem. We find strong support for the hypothesis that approximation-theoretical effects heavily influence the practical behavior of learning problems in numerical analysis. Turning to practically more successful and modern architectures, at the end of this study we derive improved error bounds by focusing on convolutional neural networks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.