Depth separation beyond radial functions

Venturi, Luca; Jelassi, Samy; Ozuch, Tristan; Bruna, Joan

doi:10.48550/arxiv.2102.01621

Cited by 4 publications

(5 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…That is, Eldan and Shamir [2016] show that for any g is expressed as a two-layer network of width at most ce cd for some universal constant c > 0, then E x∼D (f (x) − g(x)) 2 > c. Daniely [2017] shows a simpler setting where the exponential dependency is improved to d log (d) and the non-approximation results extend to networks with polynomial weight magnitude. Safran and Shamir [2017] provide other examples where similar behavior holds, Telgarsky [2016] gives separation results beyond depth 3, and Venturi et al [2021] generalize the work of Eldan and Shamir [2016]. Note that all the results in these works concern function approximations in the L 2 (D) norm.…”

Section: Introductionmentioning

confidence: 75%

Depth and Feature Learning are Provably Beneficial for Neural Network Discriminators

Domingo-Enrich¹

2021

Preprint

View full text Add to dashboard Cite

We construct pairs of distributions µ d , ν d on R d such that the quantityfor some three-layer ReLU network F with polynomial width and weights, while declining exponentially in d if F is any two-layer network with polynomial weights. This shows that deep GAN discriminators are able to distinguish distributions that shallow discriminators cannot. Analogously, we build pairs of distributionsfor two-layer ReLU networks with polynomial weights, while declining exponentially for bounded-norm functions in the associated RKHS. This confirms that feature learning is beneficial for discriminators. Our bounds are based on Fourier transforms.

show abstract

Section: Introductionmentioning

confidence: 75%

Depth and Feature Learning are Provably Beneficial for Neural Network Discriminators

Domingo-Enrich¹

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Many of these works center on the representational gap between twolayer and three-layer networks [3,6]. In particular, recent works have focused on generalizing the family of functions that realize these separations, to various radial functions [18] and non-radial functions [26].…”

Section: Depth Separationmentioning

confidence: 99%

Exponential Separations in Symmetric Neural Networks

Zweig¹,

Bruna²

2022

Preprint

Self Cite

View full text Add to dashboard Cite

In this work we demonstrate a novel separation between symmetric neural network architectures. Specifically, we consider the Relational Network [19] architecture as a natural generalization of the DeepSets [30] architecture, and study their representational gap. Under the restriction to analytic activation functions, we construct a symmetric function acting on sets of size N with elements in dimension D, which can be efficiently approximated by the former architecture, but provably requires width exponential in N and D for the latter.

show abstract

“…In this work we show that deep networks have significantly more memorization power. Quite a few theoretical works in recent years have explored the beneficial effect of depth on increasing the expressiveness of neural networks (e.g., [23,15,33,22,12,28,38,29,10,34,6,36,35]). The benefits of depth in the context of the VC dimension is implied by, e.g., [3].…”

Section: Related Workmentioning

confidence: 99%

On the Optimal Memorization Power of ReLU Neural Networks

Vardi¹,

Yehudai²,

Shamir³

2021

Preprint

View full text Add to dashboard Cite

We study the memorization power of feedforward ReLU neural networks. We show that such networks can memorize any N points that satisfy a mild separability assumption using Õ √ N parameters. Known VC-dimension upper bounds imply that memorizing N samples requires Ω( √ N ) parameters, and hence our construction is optimal up to logarithmic factors. We also give a generalized construction for networks with depth bounded by 1 ≤ L ≤ √ N , for memorizing N samples using Õ(N/L) parameters. This bound is also optimal up to logarithmic factors. Our construction uses weights with large bit complexity. We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters.

show abstract

Depth separation beyond radial functions

Cited by 4 publications

References 6 publications

Depth and Feature Learning are Provably Beneficial for Neural Network Discriminators

Depth and Feature Learning are Provably Beneficial for Neural Network Discriminators

Exponential Separations in Symmetric Neural Networks

On the Optimal Memorization Power of ReLU Neural Networks

Contact Info

Product

Resources

About