2018
DOI: 10.48550/arxiv.1806.07808
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning One-hidden-layer ReLU Networks via Gradient Descent

Abstract: We study the problem of learning one-hidden-layer neural networks with Rectified Linear Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian distribution and the outputs are generated from a noisy teacher network. We analyze the performance of gradient descent for training such kind of neural networks based on empirical risk minimization, and provide algorithm-dependent guarantees. In particular, we prove that tensor initialization followed by gradient descent can converge to th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 31 publications
(20 citation statements)
references
References 36 publications
(80 reference statements)
0
20
0
Order By: Relevance
“…A series of papers made strong assumptions on input distribution as well as realizability of labels, and showed global convergence of (stochastic) gradient descent for some shallow neural networks (Tian, 2017;Soltanolkotabi, 2017;Brutzkus & Globerson, 2017;Du et al, 2017a,b;Li & Yuan, 2017). Some local convergence results have also been proved (Zhong et al, 2017;Zhang et al, 2018). However, these assumptions are not satisfied in practice.…”
Section: Related Workmentioning
confidence: 99%
“…A series of papers made strong assumptions on input distribution as well as realizability of labels, and showed global convergence of (stochastic) gradient descent for some shallow neural networks (Tian, 2017;Soltanolkotabi, 2017;Brutzkus & Globerson, 2017;Du et al, 2017a,b;Li & Yuan, 2017). Some local convergence results have also been proved (Zhong et al, 2017;Zhang et al, 2018). However, these assumptions are not satisfied in practice.…”
Section: Related Workmentioning
confidence: 99%
“…However, cost functions that are optimized by neural networks might not meet this condition. So, in theory, those neural networks don't guarantee globally optimal solutions, but in practice, neural networks converge to a local minimum point as proven by Zhong et al [27] [28] and Zhang et al [29]. Just like the cases of Naive Bayes and Gradient Descent, we also make assumptions in our paper that help explain our method, but do not have to hold in order to achieve good results in practice.…”
Section: A Assumptionsmentioning
confidence: 98%
“…This assumption is also made in the practical work [58]. Moreover, there is a large body of works that directly use GANs or deconvolution networks for super-resolution [31,61,72,90,94].…”
Section: Forward Super-resolution: a Special Property Of Imagesmentioning
confidence: 99%