2021
DOI: 10.48550/arxiv.2110.03187
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the Optimal Memorization Power of ReLU Neural Networks

Abstract: We study the memorization power of feedforward ReLU neural networks. We show that such networks can memorize any N points that satisfy a mild separability assumption using Õ √ N parameters. Known VC-dimension upper bounds imply that memorizing N samples requires Ω( √ N ) parameters, and hence our construction is optimal up to logarithmic factors. We also give a generalized construction for networks with depth bounded by 1 ≤ L ≤ √ N , for memorizing N samples using Õ(N/L) parameters. This bound is also optimal … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 29 publications
(53 reference statements)
0
5
0
Order By: Relevance
“…Then, Memorization power of neural networks. Our work is related to another line of works (e.g., Baum, 1988;Yun et al, 2019;Bubeck et al, 2020;Zhang et al, 2021;Rajput et al, 2021;Vardi et al, 2021) on the memorization power of neural networks. Among these works, Yun et al (2019) shows that a neural network with O(N ) parameters can memorize the data set with zero error, where N is the size of the data set.…”
Section: Related Workmentioning
confidence: 72%
See 2 more Smart Citations
“…Then, Memorization power of neural networks. Our work is related to another line of works (e.g., Baum, 1988;Yun et al, 2019;Bubeck et al, 2020;Zhang et al, 2021;Rajput et al, 2021;Vardi et al, 2021) on the memorization power of neural networks. Among these works, Yun et al (2019) shows that a neural network with O(N ) parameters can memorize the data set with zero error, where N is the size of the data set.…”
Section: Related Workmentioning
confidence: 72%
“…Among these works, Yun et al (2019) shows that a neural network with O(N ) parameters can memorize the data set with zero error, where N is the size of the data set. Under an additional separable assumption, Vardi et al (2021) derives an improved upper bound of O( √ N ), which is shown to be optimal. In this work, we show that O(N d) parameters is sufficient for achieving low robust training error.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Since natural signals/images data with low intrinsic dimension can be represented approximately by neural networks is empirically verified in [16,27,43]. Next, we verify the assumption (33) by construct a generator G with properly chosen depth and width based on the recent approximation ideas of deep neural networks [45,51,47,21] by utilizing the bit extraction techniques [3,2]. To this end, we recall the definition of Minkowski dimension which is used to measures the intrinsic dimension of the target signals living in a large ambient dimension.…”
Section: Analysis Of the Least Square Decodermentioning
confidence: 89%
“…However, deeper networks require much less neurons to reach the same expressive power, yielding a potential theoretical explanation of the dominance of deep networks in practice [7,29,42,44,53,62,65,68,79,80,83]. Other related work includes counting and bounding the number of linear regions [43,59,60,64,65,74], classifying the set of functions exactly representable by different architectures [7,23,46,47,61,86], or analyzing the memorization capacity of ReLU networks [82,84,85].…”
Section: Neural Networkmentioning
confidence: 99%