2018
DOI: 10.48550/arxiv.1811.01885
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Two Layer Rectified Neural Networks in Polynomial Time

Abstract: We consider the following fundamental problem in the study of neural networks: given input examples x ∈ R d and their vector-valued labels, as defined by an underlying generative neural network, recover the weight matrices of this network. We consider two-layer networks, mapping R d to R m , with a single hidden layer and k non-linear activation units f (•), where f (x) = max{x, 0} is the ReLU activation function. Such a network is specified by two weight matrices, U * ∈ R m×k , V * ∈ R k×d , such that the lab… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0
1

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(12 citation statements)
references
References 24 publications
(68 reference statements)
0
11
0
1
Order By: Relevance
“…Learning Two-Layer Network [9,10,12,22,23,36,41,41,42,46,50,51,53,54,56,58,61]. There is a rich history of works considering the learnability of neural networks trained by SGD.…”
Section: More On Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Learning Two-Layer Network [9,10,12,22,23,36,41,41,42,46,50,51,53,54,56,58,61]. There is a rich history of works considering the learnability of neural networks trained by SGD.…”
Section: More On Related Workmentioning
confidence: 99%
“…Define functions S 0 (x) = G 0 (x), S 1 (x) = G 1 (x), 9 as well as (it is convenient to think of those S(x) as the "features" used by learner network F (x))…”
Section: Learner Networkmentioning
confidence: 99%
See 1 more Smart Citation
“…Most existing works analyzing the learnability of neural networks [9,12,13,21,22,29,34,35,43,44,48,50,51,57] make unrealistic assumptions about the data distribution (such as being random Gaussian), and/or make strong assumptions about the network (such as using linear activations). Li and Liang [33] show that two-layer ReLU networks can learn classification tasks when the data come from mixtures of arbitrary but well-separated distributions.…”
Section: What Can Neural Network Provably Learn?mentioning
confidence: 99%
“…It is necessary the negative result of kernel methods is distribution dependent, since for trivial distributions where x is non-zero only on the first constantly many coordinates, both neural networks and kernel methods can learn it with constantly many samples 7. If R(w) is the 2 regularizer, then this becomes a kernel method again since the minimizer can be written in the form (3.1).…”
mentioning
confidence: 99%