2019
DOI: 10.1088/1742-5468/ab3430
|View full text |Cite
|
Sign up to set email alerts
|

Entropy and mutual information in models of deep neural networks*

Abstract: We examine a class of stochastic deep learning models with a tractable method to compute information-theoretic quantities. Our contributions are three-fold: (i) we show how entropies and mutual informations can be derived from heuristic statistical physics methods, under the assumption that weight matrices are independent and orthogonally-invariant. (ii) We extend particular cases in which this result is known to be rigorously exact by providing a proof for two-layers networks with Gaussian random weights, usi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
108
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2
2

Relationship

3
6

Authors

Journals

citations
Cited by 102 publications
(111 citation statements)
references
References 42 publications
1
108
0
Order By: Relevance
“…It is useful to compare the predicted MSE with the predicted optimal values. The works [47], [48] postulate the optimal MSE for inference in deep networks under the LSL model described above using the replica method from statistical physics. Interestingly, it is shown in [47,Thm.2] that the predicted minimum MSE satisfies equations that exactly agree with the fixed points of the updates (39).…”
Section: Mmse Estimation and Connections To The Replica Predictionsmentioning
confidence: 99%
“…It is useful to compare the predicted MSE with the predicted optimal values. The works [47], [48] postulate the optimal MSE for inference in deep networks under the LSL model described above using the replica method from statistical physics. Interestingly, it is shown in [47,Thm.2] that the predicted minimum MSE satisfies equations that exactly agree with the fixed points of the updates (39).…”
Section: Mmse Estimation and Connections To The Replica Predictionsmentioning
confidence: 99%
“…So far these include matrix and tensor factorisation [24], estimation in traditional and generalised linear models [25] (e.g., compressed sensing and many of its non-linear variants), with random i.i.d. as well as special structured measurement matrices [26], learning problems in the teacher-student setting [25,27] (e.g., the single-layer perceptron network) and even multi-layer versions [28]. For inference problems with an underlying sparse graphical structure full proofs of replica formulas are scarce and much more involved, e.g., [11,17].…”
mentioning
confidence: 99%
“…In [30] several of the claims from [29] are examined and challenged. In [31] useful methods for the computation of information theoretic quantities are proposed for several deep neural network models.…”
Section: Related Workmentioning
confidence: 99%