Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural networks. We evaluate the mutual information (or “free entropy”) from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Nonrigorous predictions for the optimal errors existed for special cases of GLMs, e.g., for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades-old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance and locate the associated sharp phase transitions separating learnable and nonlearnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multipurpose algorithms.
In recent years important progress has been achieved towards proving the validity of the replica predictions for the (asymptotic) mutual information (or "free energy") in Bayesian inference problems. The proof techniques that have emerged appear to be quite general, despite they have been worked out on a case-by-case basis. Unfortunately, a common point between all these schemes is their relatively high level of technicality. We present a new proof scheme that is quite straightforward with respect to the previous ones. We call it the adaptive interpolation method because it can be seen as an extension of the interpolation method developped by Guerra and Toninelli in the context of spin glasses, with an interpolation path that is adaptive. In order to illustrate our method we show how to prove the replica formula for three non-trivial inference problems. The rst one is symmetric rank-one matrix estimation (or factorisation), which is the simplest problem considered here and the one for which the method is presented in full details. Then we generalize to symmetric tensor estimation and random linear estimation. We believe that the present method has a much wider range of applicability and also sheds new insights on the reasons for the validity of replica formulas in Bayesian inference.
Continuum percolation models in which pairs of points of a two-dimensional Poisson point process are connected if they are within some range to each other have been extensively studied. This paper considers a variation in which a connection between two points depends not only on their Euclidean distance, but also on the positions of all other points of the point process. This model has been recently proposed to model interference in radio communication networks. Our main result shows that, despite the infinite range dependencies, percolation occurs in the model when the density λ of the Poisson point process is greater than the critical density value λ c of the independent model, provided that interference from other nodes can be sufficiently reduced (without vanishing).
Abstract-We consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections, a problem relevant in compressed sensing, sparse superposition codes or code division multiple access just to cite few. There has been a number of works considering the mutual information for this problem using the heuristic replica method from statistical physics. Here we put these considerations on a firm rigorous basis. First, we show, using a Guerra-type interpolation, that the replica formula yields an upper bound to the exact mutual information. Secondly, for many relevant practical cases, we present a converse lower bound via a method that uses spatial coupling, state evolution analysis and the I-MMSE theorem. This yields, in particular, a single letter formula for the mutual information and the minimal-mean-square error for random Gaussian linear estimation of all discrete bounded signals.Random linear projections and random matrices are ubiquitous in computer science, playing an important role in machine learning [1], statistics [2] and communication [3]. In particular, the task of estimating a signal from its linear random projections has a myriad of applications such as compressed sensing (CS) [4], code division multiple access (CDMA) in communication [5], error correction via sparse superposition codes [6], or Boolean group testing [7]. It is thus natural to ask what are the information theoretic limits for the estimation of a signal via the knowledge of few of its (noisy) random linear projections.A particularly influential approach to this question has been through the use of the heuristic replica method of statistical physics [8], which allows to compute non rigorously the mutual information (MI) and the associated theoretically achievable minimal-mean-square error (MMSE). The replica method typically predicts the optimal performance through the solution of non-linear equations, which interestingly coincide in many cases with the predictions for the performance of a message-passing belief-propagation type algorithm. In this context the algorithm is usually called approximate message-passing (AMP) [9][10][11].In this contribution we prove rigorously that the replica formula for the MI is asymptotically exact for discrete bounded prior distributions of the signal, in the case of random Gaussian linear projections. In particular, our results put on a firm rigorous basis the Tanaka formula for CDMA [12], and allow to rigorously obtain the Bayesian "measurement" MMSE in CS. In addition, our analysis strongly suggests that AMP is reaching the MMSE for a large class of such problems in polynomial time, except for a region called the hard phase. In the hard phase the MMSE can be reached only through the use of a technique called spatial coupling [10,11,13] (SC), originally developed in the context of communication as a practical code construction that allows to reach the Shannon capacity [14]. Finally, we stress that our proof technique has an interest of its own as it is probably transposab...
We examine a class of stochastic deep learning models with a tractable method to compute information-theoretic quantities. Our contributions are three-fold: (i) we show how entropies and mutual informations can be derived from heuristic statistical physics methods, under the assumption that weight matrices are independent and orthogonally-invariant. (ii) We extend particular cases in which this result is known to be rigorously exact by providing a proof for two-layers networks with Gaussian random weights, using the recently introduced adaptive interpolation method. (iii) We propose an experiment framework with generative models of synthetic datasets, on which we train deep neural networks with a weight constraint designed so that the assumption in (i) is verified during learning. We study the behavior of entropies and mutual informations throughout learning and conclude that, in the proposed setting, the relationship between compression and generalization remains elusive.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.