Error bounds for approximations with deep ReLU networks

Yarotsky, Dmitry

doi:10.1016/j.neunet.2017.07.002

Cited by 937 publications

(986 citation statements)

References 16 publications

Supporting

Mentioning

905

Contrasting

Unclassified

Order By: Relevance

“…Similar results for approximating functions in W k,p ([−1, 1] d ) with p < ∞ using ReLU DNNs are given by Petersen and Voigtlaender[13]. The significance of the works by Yarotsky [12] and Peterson and Voigtlaender [13] is that by using a very simple rectified nonlinearity, DNNs can obtain high order approximation property. Shallow networks do not hold such a good property.…”

supporting

confidence: 64%

Better Approximations of High Dimensional Smooth Functions by Deep Neural Networks with Rectified Power Units

Li¹

2020

CICP

View full text Add to dashboard Cite

Deep neural network with rectified linear units (ReLU) is getting more and more popular recently. However, the derivatives of the function represented by a ReLU network are not continuous, which limit the usage of ReLU network to situations only when smoothness is not required. In this paper, we construct deep neural networks with rectified power units (RePU), which can give better approximations for smooth functions. Optimal algorithms are proposed to explicitly build neural networks with sparsely connected RePUs, which we call PowerNets, to represent polynomials with no approximation error. For general smooth functions, we first project the function to their polynomial approximations, then use the proposed algorithms to construct corresponding PowerNets. Thus, the error of best polynomial approximation provides an upper bound of the best RePU network approximation error. For smooth functions in higher dimensional Sobolev spaces, we use fast spectral transforms for tensorproduct grid and sparse grid discretization to get polynomial approximations. Our constructive algorithms show clearly a close connection between spectral methods and deep neural networks: a PowerNet with n layers can exactly represent polynomials up to degree s n , where s is the power of RePUs. The proposed PowerNets have potential applications in the situations where high-accuracy is desired or smoothness is required. the curse of dimensionality for an important class of problems corresponding to compositional functions. In the general function approximation aspect, it has been proved by Yarotsky [12] that DNNs using rectified linear units (abbr. ReLU, a non-smooth activation function defined as σ 1 (x) := max{0, x}) need at most O (ε d k (log |ε| + 1)) units and nonzero weights to approximation functions in Sobolev space W k,∞ ([−1, 1] d ) within ε error. This is similar to the results of shallow networks with one hidden layer of C ∞ activation units, but only optimal up to a O (log |ε|) factor. Similar results for approximating functions in W k,p ([−1, 1] d ) with p < ∞ using ReLU DNNs are given by Petersen and Voigtlaender[13]. The significance of the works by Yarotsky [12] and Peterson and Voigtlaender [13] is that by using a very simple rectified nonlinearity, DNNs can obtain high order approximation property. Shallow networks do not hold such a good property. Other works show ReLU DNNs have high-order approximation property include the work by E and Wang [14] and the recent work by Opschoor et al. [15], the latter one relates ReLU DNNs to high-order finite element methods.A basic fact used in the error estimate given in [12] and [13] is that x 2 , x y can be approximated by a ReLU network with O (log |ε|) layers. To remove this approximation error and the extra factor O (log |ε|) in the size of neural networks, we proposed to use rectified power units (RePU) to construct exact neural network representations of polynomials [16]. The RePU function is defined aswhere s is a non-negative integer. When s = 1, we have the Heaviside step func...

show abstract

supporting

confidence: 64%

Better Approximations of High Dimensional Smooth Functions by Deep Neural Networks with Rectified Power Units

Li¹

2020

CICP

View full text Add to dashboard Cite

show abstract

“…If

σ false(x false) = \max false(x, 0 false)

, the multilayer feedforward neural network is the deep ReLU network. Yarotsky shows that deep ReLU networks can implement multiplication in the following proposition …”

Section: Notation and Preliminary Resultsmentioning

confidence: 99%

“…With the successful applications of deep learning in computer vision, speech recognition, natural language processing, and other fields, the expressive ability of deep neural networks as the theoretical foundation of deep learning has attracted more and more attention. [1][2][3][4][5][6][7] The ability to overcome the "curse of dimensionality" (the number of parameters needed to support the result grows exponentially with the dimensionality) is considered one of the advantages of deep neural networks. [8][9][10][11][12][13][14][15] Recently, Lee H et al 12…”

Section: Introductionmentioning

confidence: 99%

A note on the expressive power of deep rectified linear unit networks in high‐dimensional spaces

Chen

2019

Math Methods in App Sciences

View full text Add to dashboard Cite

show abstract

“…Deep ReLU with LSTM cells have became popular architectures as they can capture long‐range dependencies and nonlinearities. Their popularity stems from the fact that they can efficiently approximate highly multivariate functions with small number of neurons at each layer …”

Section: Deep Learningmentioning

confidence: 99%

“…Their popularity stems from the fact that they can efficiently approximate highly multivariate functions with small number of neurons at each layer. [32][33][34]…”

Section: Deep Learningmentioning

confidence: 99%

Deep learning for energy markets

Polson¹,

Sokolov

2020

Appl Stoch Models Bus & Ind

View full text Add to dashboard Cite

Deep Learning (DL) is combined with extreme value theory (EVT) to predict peak loads observed in energy grids. Forecasting energy loads and prices is challenging due to sharp peaks and troughs that arise due to supply and demand fluctuations from intraday system constraints. We propose a deep temporal extreme value model to capture these effects, which predicts the tail behavior of load spikes. Deep long‐short‐term memory architectures with rectified linear unit activation functions capture trends and temporal dependencies, while EVT captures highly volatile load spikes above a prespecified threshold. To illustrate our methodology, we develop forecasting models for hourly price and demand from the PJM interconnection. The goal is to show that DL‐EVT outperforms traditional methods, both in‐ and out‐of‐sample, by capturing the observed nonlinearities in prices and demand spikes. Finally, we conclude with directions for future research.

show abstract

Error bounds for approximations with deep ReLU networks

Cited by 937 publications

References 16 publications

Better Approximations of High Dimensional Smooth Functions by Deep Neural Networks with Rectified Power Units

Better Approximations of High Dimensional Smooth Functions by Deep Neural Networks with Rectified Power Units

A note on the expressive power of deep rectified linear unit networks in high‐dimensional spaces

Deep learning for energy markets

Contact Info

Product

Resources

About