Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks

Martin, Charles H.; Mahoney, Michael W.

doi:10.1137/1.9781611976236.57

Cited by 23 publications

(47 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The first two metrics are wellknown in ML. The last two metrics deserve special mention, as they depend on an empirical parameter α that is the PL exponent that arises in the recently developed Heavy Tailed Self Regularization (HT-SR) Theory [1][2][3] .…”

Section: Resultsmentioning

confidence: 99%

“…In the HT-SR Theory, one analyzes the eigenvalue spectrum, i.e., the Empirical Spectral Density (ESD), of the associated correlation matrices [1][2][3] . From this, one characterizes the amount and form of correlation, and therefore implicit self-regularizartion, present in the DNN's weight matrices.…”

Section: Resultsmentioning

confidence: 99%

“…Fourth, PL-based metrics can also be used to characterize fine-scale model properties, including what we call layer-wise Correlation Flow, in well-trained and poorly-trained models; and they can be used to evaluate model enhancements (e.g., distillation, fine-tuning, etc.). Our work provides a theoretically principled empirical evaluation-by far the largest, most detailed, and most comprehensive to date-and the theory we apply was developed previously [1][2][3] . Performing such a meta-analysis of previously published work is common in certain areas, but it is quite rare in ML, where the emphasis is on developing better training protocols.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data

Martin¹,

Tongsu

Peng³

et al. 2021

Nat Commun

Self Cite

View full text Add to dashboard Cite

In many applications, one works with neural network models trained by someone else. For such pretrained models, one may not have access to training data or test data. Moreover, one may not know details about the model, e.g., the specifics of the training data, the loss function, the hyperparameter values, etc. Given one or many pretrained models, it is a challenge to say anything about the expected performance or quality of the models. Here, we address this challenge by providing a detailed meta-analysis of hundreds of publicly available pretrained models. We examine norm-based capacity control metrics as well as power law based metrics from the recently-developed Theory of Heavy-Tailed Self Regularization. We find that norm based metrics correlate well with reported test accuracies for well-trained models, but that they often cannot distinguish well-trained versus poorly trained models. We also find that power law based metrics can do much better—quantitatively better at discriminating among series of well-trained models with a given architecture; and qualitatively better at discriminating well-trained versus poorly trained models. These methods can be used to identify when a pretrained neural network has problems that cannot be detected simply by examining training/test accuracies.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data

Martin¹,

Tongsu

Peng³

et al. 2021

Nat Commun

Self Cite

View full text Add to dashboard Cite

show abstract

“…4 and 5) imposes intriguing characteristics on ML-generated disordered structures: "scale-free" properties on waves. Scale-free properties, which represent the power-law probabilistic distribution with heavytailed statistics, have been one of the most influential concepts in network science 2,52 , data science 50,51 , and random matrix theory 53,54 . In addition to its ubiquitous nature in biological, social, and technological systems 2 , the most important impact of scale-free property is the emergence of core nodes, also known as "hubs", which possess a very large number of links or interactions, thereby governing signal transport inside the system 2,42,52 .…”

Section: Resultsmentioning

confidence: 99%

“…Because the ML-generated lattice deformation is strongly related to the weights of the output neurons in the L2D CNN, the apparent stochastic difference between normal-random seed structures and scale-free L2D CNN outputs raises an interesting open question; the training process of deep NNs could inherently possess the scale-free property. Recently, in random matrix theory, it was demonstrated that the correlations in the weight matrices of well-trained deep NNs can be fit to a power-law with the heavy-tailed distribution 53,54 . This theory enables the successful analogy between NN structures and ML-generated realspace wave structures in our result: the identification of the "heavy-tailed perturbation distribution" of atomic sites using the "heavy-tailed weight distribution" of CNN neurons.…”

Section: Discussionmentioning

confidence: 99%

Machine learning identifies scale-free properties in disordered materials

Piao

Park

2020

Nat Commun

View full text Add to dashboard Cite

The vast amount of design freedom in disordered systems expands the parameter space for signal processing. However, this large degree of freedom has hindered the deterministic design of disordered systems for target functionalities. Here, we employ a machine learning approach for predicting and designing wave-matter interactions in disordered structures, thereby identifying scale-free properties for waves. To abstract and map the features of wave behaviors and disordered structures, we develop disorder-to-localization and localization-to-disorder convolutional neural networks, each of which enables the instantaneous prediction of wave localization in disordered structures and the instantaneous generation of disordered structures from given localizations. We demonstrate that the structural properties of the network architectures lead to the identification of scale-free disordered structures having heavy-tailed distributions, thus achieving multiple orders of magnitude improvement in robustness to accidental defects. Our results verify the critical role of neural network structures in determining machine-learning-generated real-space structures and their defect immunity.

show abstract

Is My Neural Net Driven by the MDL Principle?

Brandao,

Duffner,

Emonet

et al. 2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

The Minimum Description Length principle (MDL) is a formalization of Occam's razor for model selection, which states that a good model is one that can losslessly compress the data while including the cost of describing the model itself. While MDL can naturally express the behavior of certain models such as autoencoders (that inherently compress data) most representation learning techniques do not rely on such models. Instead, they learn representations by training on general or, for self-supervised learning, pretext tasks. In this paper, we propose a new formulation of the MDL principle that relies on the concept of signal and noise, which are implicitly defined by the learning task at hand. Additionally, we introduce ways to empirically measure the complexity of the learned representations by analyzing the spectra of the point Jacobians. Under certain assumptions, we show that the singular values of the point Jacobians of Neural Networks driven by the MDL principle should follow either a power law or a lognormal distribution. Finally, we conduct experiments to evaluate the behavior of the proposed measure applied to deep neural networks on different datasets, with respect to several types of noise. We observe that the experimental spectral distribution is in agreement with the spectral distribution predicted by our MDL principle, which suggests that neural networks trained with gradient descent on noisy data implicitly abide the MDL principle.

show abstract

Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks

Cited by 23 publications

References 42 publications

Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data

Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data

Machine learning identifies scale-free properties in disordered materials

Is My Neural Net Driven by the MDL Principle?

Contact Info

Product

Resources

About