Nearly all statistical inference methods were developed for the regime where the number N of data samples is much larger than the data dimension p. Inference protocols such as maximum likelihood (ML) or maximum a posteriori probability (MAP) are unreliable if p = O(N), due to overfitting. This limitation has for many disciplines with increasingly high-dimensional data become a serious bottleneck. We recently showed that in Cox regression for time-to-event data the overfitting errors are not just noise but take mostly the form of a bias, and how with the replica method from statistical physics one can model and predict this bias and the noise statistics. Here we extend our approach to arbitrary generalized linear regression models (GLM), with possibly correlated covariates. We analyse overfitting in ML/MAP inference without having to specify data types or regression models, relying only on the GLM form, and derive generic order parameter equations for the case of L2 priors. Second, we derive the probabilistic relationship between true and inferred regression coefficients in GLMs, and show that, for the relevant hyperparameter scaling and correlated covariates, the L2 regularization causes a predictable direction change of the coefficient vector. Our results, illustrated by application to linear, logistic, and Cox regression, enable one to correct ML and MAP inferences in GLMs systematically for overfitting bias, and thus extend their applicability into the hitherto forbidden regime p= O(N).
Abstract. We introduce and analyse ensembles of 2-regular random graphs with a tuneable distribution of short cycles. The phenomenology of these graphs depends critically on the scaling of the ensembles' control parameters relative to the number of nodes. A phase diagram is presented, showing a second order phase transition from a connected to a disconnected phase. We study both the canonical formulation, where the size is large but fixed, and the grand canonical formulation, where the size is sampled from a discrete distribution, and show their equivalence in the thermodynamical limit. We also compute analytically the spectral density, which consists of a discrete set of isolated eigenvalues, representing short cycles, and a continuous part, representing cycles of diverging size.
We analyze maximum entropy random graph ensembles with constrained degrees, drawn from arbitrary degree distributions, and a tuneable number of three-cycles (triangles). We find that such ensembles generally exhibit two transitions, a clustering and a shattering transition, separating three distinct regimes. At the clustering transition, the graphs change from typically having only isolated cycles to forming cycle clusters. At the shattering transition the graphs break up into many small cliques to achieve the desired three-cycle density. The locations of both transitions depend nontrivially on the system size. We derive a general formula for the three-cycle density in the regime of isolated cycles, for graphs with degree distributions that have finite first and second moments. For bounded degree distributions we present further analytical results on cycle densities and phase transition locations, which, while non-rigorous, are all validated via MCMC sampling simulations. We show that the shattering transition is of an entropic nature, occurring for all three-cycle density values, provided the system is large enough.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.