Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations

Hanin, Boris

doi:10.3390/math7100992

Cited by 246 publications

(182 citation statements)

References 18 publications

Supporting

Mentioning

157

Contrasting

Unclassified

Order By: Relevance

“…Great progress of deep learning is built on deepening neural networks with structures. Deep nets with different structures have been proved to be universal, i.e., [53], [54] for deep convolutional nets, [14] for deep nets with tree structures and [10] for deep fully-connected neural networks.…”

Section: A Deep Nets With Fixed Structuresmentioning

confidence: 99%

Realizing Data Features by Deep Nets

Guo

Shi

Lin

2020

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

This paper considers the power of deep neural networks (deep nets for short) in realizing data features. Based on refined covering number estimates, we find that, to realize some complex data features, deep nets can improve the performances of shallow neural networks (shallow nets for short) without requiring additional capacity costs. This verifies the advantage of deep nets in realizing complex features. On the other hand, to realize some simple data feature like the smoothness, we prove that, up to a logarithmic factor, the approximation rate of deep nets is asymptotically identical to that of shallow nets, provided that the depth is fixed. This exhibits a limitation of deep nets in realizing simple features.

show abstract

Section: A Deep Nets With Fixed Structuresmentioning

confidence: 99%

Realizing Data Features by Deep Nets

Guo

Shi

Lin

2020

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

show abstract

“…Different lines of research try to understand the mechanism of deep neural networks from different aspects. For example, a series of work tries to understand how the expressive power of deep neural networks are related to their architecture, including the width of each layer and depth of the network (Telgarsky, 2015(Telgarsky, , 2016Lu et al, 2017;Liang and Srikant, 2016;Yarotsky, 2017Yarotsky, , 2018Hanin, 2017;Hanin and Sellke, 2017). These work shows that multi-layer networks with wide layers can approximate arbitrary continuous function.…”

Section: Introductionmentioning

confidence: 99%

Gradient descent optimizes over-parameterized deep ReLU networks

et al. 2019

View full text Add to dashboard Cite

We study the problem of training deep neural networks with Rectified Linear Unit (ReLU) activation function using gradient descent and stochastic gradient descent. In particular, we study the binary classification problem and show that for a broad family of loss functions, with proper random weight initialization, both gradient descent and stochastic gradient descent can find the global minima of the training loss for an over-parameterized deep ReLU network, under mild assumption on the training data. The key idea of our proof is that Gaussian random initialization followed by (stochastic) gradient descent produces a sequence of iterates that stay inside a small perturbation region centering around the initial weights, in which the empirical loss function of deep ReLU networks enjoys nice local curvature properties that ensure the global convergence of (stochastic) gradient descent. Our theoretical results shed light on understanding the optimization for deep learning, and pave the way for studying the optimization dynamics of training modern deep neural networks.

show abstract

“…That approximation is the partition of the input space into samples that minimizes the error function between the output of the ANN given its training inputs and the training outputs. This is stated mathematically by the universal approximation theorem which implies that any functional mapping between input vectors and output vectors can be approximated to with arbitrary accuracy with an ANN provided that it has a sufficient number of neurons in a sufficient number of layers with a specific activation function [10] [11] [12] [13].…”

Section: A What Is An Artificial Neural Network?mentioning

confidence: 99%

A review of deep learning with special emphasis on architectures, applications and recent trends

Sengupta

Basak

Saikia

et al. 2020

Knowledge-Based Systems

302

143

View full text Add to dashboard Cite

Deep learning has solved a problem that as little as five years ago was thought by many to be intractable -the automatic recognition of patterns in data; and it can do so with an accuracy that often surpasses that of human beings. It has solved problems beyond the realm of traditional, hand-crafted machine learning algorithms and captured the imagination of practitioners trying to make sense out of the flood of data that now inundates our society. As public awareness of the efficacy of deep learning increases so does the desire to make use of it. But even for highly trained professionals it can be daunting to approach the rapidly increasing body of knowledge produced by experts in the field. Where does one start? How does one determine if a particular Deep Learning model is applicable to their problem? How does one train and deploy such a network? A primer on the subject can be a good place to start. With that in mind, we present an overview of some of the key multilayer artificial neural networks that comprise deep learning. We also discuss some new automatic architecture optimization protocols that use multi-agent approaches. Further, since guaranteeing system uptime is becoming critical to many computer applications, we include a section on using neural networks for fault detection and subsequent mitigation. This is followed by an exploratory survey of several application areas where deep learning has emerged as a game-changing technology: anomalous behavior detection in financial applications or in financial time-series forecasting, predictive and prescriptive analytics, medical image processing and analysis and power systems research. The thrust of this review is to outline emerging areas of application-oriented research within the deep learning community as well as to provide a handy reference to researchers seeking to use deep learning in their work for what it does best: statistical pattern recognition with unparalleled learning capacity with the ability to scale with information.

show abstract

Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations

Cited by 246 publications

References 18 publications

Realizing Data Features by Deep Nets

Realizing Data Features by Deep Nets

Gradient descent optimizes over-parameterized deep ReLU networks

A review of deep learning with special emphasis on architectures, applications and recent trends

Contact Info

Product

Resources

About