2021
DOI: 10.48550/arxiv.2102.01621
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Depth separation beyond radial functions

Abstract: High-dimensional depth separation results for neural networks show that certain functions can be efficiently approximated by two-hidden-layer networks but not by one-hidden-layer ones in high-dimensions d. Existing results of this type mainly focus on functions with an underlying radial or one-dimensional structure, which are usually not encountered in practice. The first contribution of this paper is to extend such results to a more general class of functions, namely functions with piece-wise oscillatory stru… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 6 publications
0
5
0
Order By: Relevance
“…That is, Eldan and Shamir [2016] show that for any g is expressed as a two-layer network of width at most ce cd for some universal constant c > 0, then E x∼D (f (x) − g(x)) 2 > c. Daniely [2017] shows a simpler setting where the exponential dependency is improved to d log (d) and the non-approximation results extend to networks with polynomial weight magnitude. Safran and Shamir [2017] provide other examples where similar behavior holds, Telgarsky [2016] gives separation results beyond depth 3, and Venturi et al [2021] generalize the work of Eldan and Shamir [2016]. Note that all the results in these works concern function approximations in the L 2 (D) norm.…”
Section: Introductionmentioning
confidence: 75%
“…That is, Eldan and Shamir [2016] show that for any g is expressed as a two-layer network of width at most ce cd for some universal constant c > 0, then E x∼D (f (x) − g(x)) 2 > c. Daniely [2017] shows a simpler setting where the exponential dependency is improved to d log (d) and the non-approximation results extend to networks with polynomial weight magnitude. Safran and Shamir [2017] provide other examples where similar behavior holds, Telgarsky [2016] gives separation results beyond depth 3, and Venturi et al [2021] generalize the work of Eldan and Shamir [2016]. Note that all the results in these works concern function approximations in the L 2 (D) norm.…”
Section: Introductionmentioning
confidence: 75%
“…Many of these works center on the representational gap between twolayer and three-layer networks [3,6]. In particular, recent works have focused on generalizing the family of functions that realize these separations, to various radial functions [18] and non-radial functions [26].…”
Section: Depth Separationmentioning
confidence: 99%
“…In this work we show that deep networks have significantly more memorization power. Quite a few theoretical works in recent years have explored the beneficial effect of depth on increasing the expressiveness of neural networks (e.g., [23,15,33,22,12,28,38,29,10,34,6,36,35]). The benefits of depth in the context of the VC dimension is implied by, e.g., [3].…”
Section: Related Workmentioning
confidence: 99%