Simone Bombari scite author profile

Simone Bombari

4Publications

0Citation Statements Received

162Citation Statements Given

How they've been cited

How they cite others

150

Affiliations

Institut Sains dan Teknologi Al-Kamal

Publications

Order By: Most citations

Beyond the Universal Law of Robustness: Sharper Laws for Random Features and Neural Tangent Kernels

Bombari¹,

Kiyani²,

Mondelli³

2023

Preprint

View full text Add to dashboard Cite

Machine learning models are vulnerable to adversarial perturbations, and a thought-provoking paper by Bubeck and Sellke has analyzed this phenomenon through the lens of over-parameterization: interpolating smoothly the data requires significantly more parameters than simply memorizing it. However, this "universal" law provides only a necessary condition for robustness, and it is unable to discriminate between models. In this paper, we address these gaps by focusing on empirical risk minimization in two prototypical settings, namely, random features and the neural tangent kernel (NTK). We prove that, for random features, the model is not robust for any degree of over-parameterization, even when the necessary condition coming from the universal law of robustness is satisfied. In contrast, for even activations, the NTK model meets the universal lower bound, and it is robust as soon as the necessary condition on over-parameterization is fulfilled. This also addresses a conjecture in prior work by Bubeck, Li and Nagaraj. Our analysis decouples the effect of the kernel of the model from an "interaction matrix", which describes the interaction with the test data and captures the effect of the activation. Our theoretical results are corroborated by numerical evidence on both synthetic and standard datasets (MNIST, CIFAR-10).

show abstract

Sharp asymptotics on the compression of two-layer neural networks

Amani

Bombari

Mondelli

et al. 2022

View full text Add to dashboard Cite

Sharp asymptotics on the compression of two-layer neural networks

Amani¹,

Bombari²,

Mondelli³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this paper, we study the compression of a target two-layer neural network with N nodes into a compressed network with M < N nodes. More precisely, we consider the setting in which the weights of the target network are i.i.d. sub-Gaussian, and we minimize the population L2 loss between the outputs of the target and of the compressed network, under the assumption of Gaussian inputs. By using tools from highdimensional probability, we show that this non-convex problem can be simplified when the target network is sufficiently overparameterized, and provide the error rate of this approximation as a function of the input dimension and N . For a ReLU activation function, we conjecture that the optimum of the simplified optimization problem is achieved by taking weights on the Equiangular Tight Frame (ETF), while the scaling of the weights and the orientation of the ETF depend on the parameters of the target network. Numerical evidence is provided to support this conjecture.

show abstract

Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization

Bombari¹,

Amani²,

Mondelli³

2022

Preprint

View full text Add to dashboard Cite

The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks. A line of work has studied the NTK spectrum for two-layer and deep networks with at least a layer with Ω(N ) neurons, N being the number of training samples. Furthermore, there is increasing evidence suggesting that deep networks with sub-linear layer widths are powerful memorizers and optimizers, as long as the number of parameters exceeds the number of samples. Thus, a natural open question is whether the NTK is well conditioned in such a challenging sub-linear setup. In this paper, we answer this question in the affirmative. Our key technical contribution is a lower bound on the smallest NTK eigenvalue for deep networks with the minimum possible over-parameterization: the number of parameters is roughly Ω(N ) and, hence, the number of neurons is as little as Ω( √ N ). To showcase the applicability of our NTK bounds, we provide two results concerning memorization capacity and optimization guarantees for gradient descent training.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Simone Bombari

Beyond the Universal Law of Robustness: Sharper Laws for Random Features and Neural Tangent Kernels

Sharp asymptotics on the compression of two-layer neural networks

Sharp asymptotics on the compression of two-layer neural networks

Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization

Contact Info

Product

Resources

About