Nonparametric regression using deep neural networks with ReLU activation function

Schmidt-Hieber, Johannes

doi:10.1214/19-aos1875

Cited by 334 publications

(424 citation statements)

References 34 publications

Supporting

Mentioning

379

Contrasting

Order By: Relevance

“…This is a possible first step in theoretically understanding why deep learning is so successful empirically. Our work differs substantially from Schmidt‐Hieber (2019). First, our goal is not to demonstrate adaptation, and we do not study this property of deep nets, but focus on the common nonparametric case.…”

Section: Introductioncontrasting

confidence: 70%

“…An important exception is the recent work of Schmidt‐Hieber (2019), who showed that a particular deep ReLU network with uniformly bounded weights attains the optimal rate in expected risk for squared loss. Further, Schmidt‐Hieber (2019) formally shows that deep neural networks can strictly improve on classical methods: if the unknown target function is itself a composition of simpler functions, then the composition‐based deep net estimator is provably superior to estimators that do not use compositions. This is a possible first step in theoretically understanding why deep learning is so successful empirically.…”

Section: Introductionmentioning

confidence: 99%

“…Second, our results and assumptions are quite different in that: (i) we prove nonasymptotic high probability bounds instead of bounds on the expected risk, (ii) we cover general, nonlinear regression problems, (iii) in linear models we allow for non‐Gaussian, heteroskedastic errors, relying only on boundedness, and (iv) we allow for unbounded weight parameters, which is crucial for feasible implementation and for approximation results. Finally, our method of proof is entirely different from Schmidt‐Hieber (2019), and it is this proof which enables us to deliver (i)–(iv). Specializing our results to the linear model treated by Schmidt‐Hieber (2019), and looking at smooth functions, our high probability bounds imply expect risk bounds similar to those obtained in that paper, but under somewhat different regularity conditions: Schmidt‐Hieber (2019) requires bounded weights and errors that are Gaussian, independent of the covariates, and have known homoskedasticity.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Deep Neural Networks for Estimation and Inference

Farrell

Liang

Misra

2021

ECTA

221

144

View full text Add to dashboard Cite

We study deep neural networks and their use in semiparametric inference. We establish novel nonasymptotic high probability bounds for deep feedforward neural nets. These deliver rates of convergence that are sufficiently fast (in some cases minimax optimal) to allow us to establish valid second‐step inference after first‐step estimation with deep learning, a result also new to the literature. Our nonasymptotic high probability bounds, and the subsequent semiparametric inference, treat the current standard architecture: fully connected feedforward neural networks (multilayer perceptrons), with the now‐common rectified linear unit activation function, unbounded weights, and a depth explicitly diverging with the sample size. We discuss other architectures as well, including fixed‐width, very deep networks. We establish the nonasymptotic bounds for these deep nets for a general class of nonparametric regression‐type loss functions, which includes as special cases least squares, logistic regression, and other generalized linear models. We then apply our theory to develop semiparametric inference, focusing on causal parameters for concreteness, and demonstrate the effectiveness of deep learning with an empirical application to direct mail marketing.

show abstract

Section: Introductioncontrasting

confidence: 70%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Deep Neural Networks for Estimation and Inference

Farrell

Liang

Misra

2021

ECTA

221

144

View full text Add to dashboard Cite

show abstract

“…where f c t+l is the output of feature data after fully connection operation in the l th branch, σ stands for the standard ReLU function [25]. Then, a softmax layer is used to normalizes the output value f c t+l whose formula is given as,…”

Section: Users Trajectory Predictionmentioning

confidence: 99%

Mobility-Aware Personalized Service Recommendation in Mobile Edge Computing

Hong-xia

Dong

Yang

2020

Preprint

View full text Add to dashboard Cite

With the proliferation of smartphones and an increasing number of services provisioned by clouds, mobile edge computing (MEC) is emerging as a complementary technology of cloud computing. It could provide cloud resources and services by local mobile edge servers, which are normally nearby users. However, a significant challenge is aroused in MEC because of the mobility of users. User trajectory prediction technologies could be used to cope with this issue, which has already played important roles in service recommendation systems with MEC. Unfortunately, little attention and work have been given in service recommendation systems considering users\' mobility. Thus, in this paper, we propose a mobility-aware personalized service recommendation approach based on user trajectory and QoS predictions. In the proposed method, users' trajectory is firstly discovered by hybrid long-short memory networks. Then, given users\' trajectories, service QoSs are predicted, considering the similarity of different users and different edge servers. Finally, services are recommended by a center trajectory strategy based on the aforementioned information. Experimental results based on the real base station dataset show that our proposed approach can outperform the traditional recommendation approaches in terms of the accuracy in mobile edge computing.

show abstract

“…The output layer has one node that is assigned the Vth in this work. The activation function used in the algorithm is the Rectified Linear Unit (ReLU), and defined by the following equation [23,24,25]:…”

Section: Introductionmentioning

confidence: 99%

Machine learning model for predicting threshold voltage by taper angle variation and word line position in 3D NAND flash memory

Lee

Shin

2020

IEICE Electron. Express

View full text Add to dashboard Cite

In this letter, a machine learning (ML) model is presented to predict the variation of the threshold voltage (Vth) according to the taper angle and target word line (WLT) position in 3D NAND flash memory. Through Technology Computer-Aided Design (TCAD) simulation, Vth is extracted according to taper angle and WLT position. TCAD data is used as the training data set required for learning by an artificial neural network algorithm (NNA). The completed ML model is then used to predict Vth for each word line (WL). It was also confirmed that the ML model predicted well even for TCAD data that was not used as a training data set.

show abstract

Nonparametric regression using deep neural networks with ReLU activation function

Cited by 334 publications

References 34 publications

Deep Neural Networks for Estimation and Inference

Deep Neural Networks for Estimation and Inference

Mobility-Aware Personalized Service Recommendation in Mobile Edge Computing

Machine learning model for predicting threshold voltage by taper angle variation and word line position in 3D NAND flash memory

Contact Info

Product

Resources

About