Appearance of Random Matrix Theory in deep learning

Baskerville, Nicholas; Granziol, Diego; Keating, Jon P

doi:10.1016/j.physa.2021.126742

Cited by 9 publications

(9 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As we have discussed at length hitherto, we conjecture that a local law is reasonable assumption to make on random matrices arising in deep neural networks. In particular in Chapter 7 [BGK22] we demonstrated universal local random matrix theory statistics not just for Hessians of deep networks but also for Generalised Gauss-Newton matrices. Our aim here is to demonstrate how a local law on Ĥt dramatically simplifies the statistics of (8.136).…”

Section: Implications For Curvature From Local Lawsmentioning

confidence: 94%

“…The theoretical picture that has emerged is that, for very general random matrices, when universal local eigenvalue statistics are observed in random matrices, it is due to the mechanism of short time scale relaxation of local statistics under Dyson Brownian Motion made possible by a local law. In Chapter 7 [BGK22] we observed that universal local eigenvalue statistics do indeed appear to be present in the Hessian of real, albeit quite small, deep neural networks. Given all of this context, we propose that a local law assumption of some kind is reasonable for deep neural network Hessians and not particularly restrictive.…”

Section: Justification and Motivation Of Quementioning

confidence: 95%

“…Finally, we mention [TSR22] in which the spectra of random and trained neural network weight matrices was analysed but on the local scale, rather than the global scale pursued by [MM18]. This work followed on from our own in Chapter 7 [BGK22] and similarly discovered the robust presence of universal GOE random matrix spacing statistics in the spectra.…”

Section: Spectra Of Neural Networkmentioning

confidence: 98%

See 2 more Smart Citations

The loss surfaces of neural networks with general activation functions

Baskerville¹,

Keating²,

Mezzadri³

et al. 2021

J. Stat. Mech.

View full text Add to dashboard Cite

The loss surfaces of deep neural networks have been the subject of several studies, theoretical and experimental, over the last few years. One strand of work considers the complexity, in the sense of local optima, of high dimensional random functions with the aim of informing how local optimisation methods may perform in such complicated settings. Prior work of Choromanska et al (2015) established a direct link between the training loss surfaces of deep multi-layer perceptron networks and spherical multi-spin glass models under some very strong assumptions on the network and its data. In this work, we test the validity of this approach by removing the undesirable restriction to ReLU activation functions. In doing so, we chart a new path through the spin glass complexity calculations using supersymmetric methods in random matrix theory which may prove useful in other contexts. Our results shed new light on both the strengths and the weaknesses of spin glass models in this context.

show abstract

Section: Implications For Curvature From Local Lawsmentioning

confidence: 94%

Section: Justification and Motivation Of Quementioning

confidence: 95%

Section: Spectra Of Neural Networkmentioning

confidence: 98%

See 1 more Smart Citation

The loss surfaces of neural networks with general activation functions

Baskerville¹,

Keating²,

Mezzadri³

et al. 2021

J. Stat. Mech.

View full text Add to dashboard Cite

show abstract

“…Challenging the above-mentioned works, an experimental line of work has demonstrated convincingly that special RMT ensembles like the GOE do not appear to be present in DNNs [Pap18,Gra20,BGK22], for example as their Hessians. In addition, there have been challenges in the literature to the practical relevance of spin glass loss surface results for DNNs [BJSG+19].…”

Section: Introductionmentioning

confidence: 95%

Universal characteristics of deep neural network loss surfaces from random matrix theory

Baskerville

Keating

Mezzadri

et al. 2022

J. Phys. A: Math. Theor.

Self Cite

View full text Add to dashboard Cite

This paper considers several aspects of random matrix universality in deep neural networks. Motivated by recent experimental work, we use universal properties of random matrices related to local statistics to derive practical implications for deep neural networks based on a realistic model of their Hessians. In particular we derive universal aspects of outliers in the spectra of deep neural networks and demonstrate the important role of random matrix local laws in popular pre-conditioning gradient descent algorithms. We also present insights into deep neural network loss surfaces from quite general arguments based on tools from statistical physics and random matrix theory.

show abstract

“…where Z β is the normalization constant. This distribution has found many applications, to study eigenvalue statistics in spin systems [25,[28][29][30][31][32][33][34][35], in triangular billiards [36], in the Hessians of artificial neural networks [37], in Sachdev-Ye-Kitaev model [38][39][40][41][42], in quantum field theory [43], to quantify symmetries in various complex systems [44,45].…”

mentioning

confidence: 99%

Universal scaling of higher-order spacing ratios in Gaussian random matrices

Bhosale¹

2022

Preprint

View full text Add to dashboard Cite

Higher-order spacing ratios in Gaussian ensembles are investigated analytically. A universal scaling relation, known from earlier numerical studies, of the higher-order spacing ratios is proved in the asymptotic limits. I. INTRODUCTIONRandom matrix theory (RMT), introduced for more than fifty years, has been applied successfully in various fields [1][2][3]. Originally it was introduced to explain complex spectra of heavy nucleus [4]. Later, it has found applications in complex networks [5,6], many-body physics [7][8][9][10][11], wireless communications [12], etc. One of the main objective of RMT is to study the spectral fluctuations in these systems. These fluctuations can be used to characterize the different types phases of these complex systems. For example, integrable to chaotic limits of the underlying classical systems [13][14][15], thermal or localized phases of condensed matter systems [9-11, 16], etc. Bohigas, Giannoni, and Schmit conjectured that the eigenvalue fluctuations in a quantum chaotic system can be modelled by any one of the three classical ensembles of RMT. These ensembles having Dyson indices as β = 1, 2 and 3 respectively corresponds to Hermitian random matrices whose entries are chosen/distributed independently, respectively, as real (GOE), complex (GUE), or quaternionic (GSE) random variables [1].The most popular measure to model the spectral fluctuations is the nearest neighbour (NN) level spacings, s i = E i+1 − E i , where E i , i = 1, 2, . . . are the eigenvalues of the given Hamiltonian H. A surmise by Wigner states that in a time-reversal invariant system (β = 1) which do not have a spin degree of freedom, these spacings are distributed as P (s) = (π/2)s exp(−πs 2 /4), which indicates the level repulsion. This result is very close the exact one which has been obtained later on [1,3,17]. For such systems, Gaussian Orthogonal Ensemble (GOE) is well suited to study the statistical properties of their spectra. There are other ensembles also commonly used in RMT, namely, Gaussian unitary ensemble (GUE) and Gaussian symplectic ensemble (GSE) having Dyson index β = 2 and 4 respectively. These ensembles have been implemented successfully in various fields [2,18]. In this paper, the Gaussian ensembles are studied in detail and various analytical results are obtained.

show abstract

Appearance of Random Matrix Theory in deep learning

Cited by 9 publications

References 26 publications

The loss surfaces of neural networks with general activation functions

The loss surfaces of neural networks with general activation functions

Universal characteristics of deep neural network loss surfaces from random matrix theory

Universal scaling of higher-order spacing ratios in Gaussian random matrices

Contact Info

Product

Resources

About