Random matrix analysis of deep neural network weight matrices

Matthias, Thamm,; Staats, Max; Rosenow, Bernd

doi:10.48550/arxiv.2203.14661

Cited by 3 publications

(4 citation statements)

References 73 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Note that the idea of implicit regularisation of neural networks via stochastic gradient descent pre-dates this work by several years [NTS14; Ney+17a; Ney+17b; Ney17]. Finally, we mention [TSR22] in which the spectra of random and trained neural network weight matrices was analysed but on the local scale, rather than the global scale pursued by [MM18]. This work followed on from our own in Chapter 7 [BGK22] and similarly discovered the robust presence of universal GOE random matrix spacing statistics in the spectra.…”

Section: Spectra Of Neural Networkmentioning

confidence: 98%

“…At the macroscopic scale, there are results relevant to neural networks, for example [PSG18; Pas20] consider random neural networks with Gaussian weights and establish results that are generalised to arbitrary distributions with optimal conditions, so demonstrating universality. On the microscopic scale, our work in Chapter 7 provided the first evidence of universal random matrix theory statistics in neural networks and was subsequently to the weight matrices of neural networks in [TSR22], but no prior work has considered the implications of these statistics, that being the central contribution of Chapter 8. Our main mathematical result is a significant generalisation of the Hessian spectral outlier result recently presented by [GZR20].…”

Section: Discovery Of Rmt Universality In Loss Surfaces and Consequen...mentioning

confidence: 99%

See 1 more Smart Citation

The loss surfaces of neural networks with general activation functions

Baskerville¹,

Keating²,

Mezzadri³

et al. 2021

J. Stat. Mech.

View full text Add to dashboard Cite

The loss surfaces of deep neural networks have been the subject of several studies, theoretical and experimental, over the last few years. One strand of work considers the complexity, in the sense of local optima, of high dimensional random functions with the aim of informing how local optimisation methods may perform in such complicated settings. Prior work of Choromanska et al (2015) established a direct link between the training loss surfaces of deep multi-layer perceptron networks and spherical multi-spin glass models under some very strong assumptions on the network and its data. In this work, we test the validity of this approach by removing the undesirable restriction to ReLU activation functions. In doing so, we chart a new path through the spin glass complexity calculations using supersymmetric methods in random matrix theory which may prove useful in other contexts. Our results shed new light on both the strengths and the weaknesses of spin glass models in this context.

show abstract

Section: Spectra Of Neural Networkmentioning

confidence: 98%

Section: Discovery Of Rmt Universality In Loss Surfaces and Consequen...mentioning

confidence: 99%

The loss surfaces of neural networks with general activation functions

Baskerville¹,

Keating²,

Mezzadri³

et al. 2021

J. Stat. Mech.

View full text Add to dashboard Cite

show abstract

“…On the microscopic scale, [BGK22] provided the first experimental demonstration of the presence of universal local random matrix statistics deep neural networks, specifically in the Hessians and Gauss-Newton matrices of their loss surfaces. This work has recently been extended to the weight matrices of neural networks [TSR22]. This paper explores the consequences of random matrix universality in deep neural networks.…”

Section: Introductionmentioning

confidence: 99%

Universal characteristics of deep neural network loss surfaces from random matrix theory

Baskerville¹,

Keating²,

Mezzadri³

et al. 2022

Preprint

View full text Add to dashboard Cite

This paper considers several aspects of random matrix universality in deep neural networks. Motivated by recent experimental work, we use universal properties of random matrices related to local statistics to derive practical implications for deep neural networks based on a realistic model of their Hessians. In particular we derive universal aspects of outliers in the spectra of deep neural networks and demonstrate the important role of random matrix local laws in popular pre-conditioning gradient descent algorithms. We also present insights into deep neural network loss surfaces from quite general arguments based on tools from statistical physics and random matrix theory.

show abstract

“…On the microscopic scale, [BGK22] provided the first experimental demonstration of the presence of universal local random matrix statistics DNNs, specifically in the Hessians and Gauss-Newton matrices of their loss surfaces. This work has recently been extended to the weight matrices of neural networks [TSR22].…”

Section: Introductionmentioning

confidence: 99%