Fabián Aguirre López scite author profile

Nearly all statistical inference methods were developed for the regime where the number N of data samples is much larger than the data dimension p. Inference protocols such as maximum likelihood (ML) or maximum a posteriori probability (MAP) are unreliable if p = O(N), due to overfitting. This limitation has for many disciplines with increasingly high-dimensional data become a serious bottleneck. We recently showed that in Cox regression for time-to-event data the overfitting errors are not just noise but take mostly the form of a bias, and how with the replica method from statistical physics one can model and predict this bias and the noise statistics. Here we extend our approach to arbitrary generalized linear regression models (GLM), with possibly correlated covariates. We analyse overfitting in ML/MAP inference without having to specify data types or regression models, relying only on the GLM form, and derive generic order parameter equations for the case of L2 priors. Second, we derive the probabilistic relationship between true and inferred regression coefficients in GLMs, and show that, for the relevant hyperparameter scaling and correlated covariates, the L2 regularization causes a predictable direction change of the coefficient vector. Our results, illustrated by application to linear, logistic, and Cox regression, enable one to correct ML and MAP inferences in GLMs systematically for overfitting bias, and thus extend their applicability into the hitherto forbidden regime p= O(N).

show abstract

Exactly solvable random graph ensemble with extensively many short cycles

López

Barucca

Fekom

et al. 2018

J. Phys. A: Math. Theor.

View full text Add to dashboard Cite

Abstract. We introduce and analyse ensembles of 2-regular random graphs with a tuneable distribution of short cycles. The phenomenology of these graphs depends critically on the scaling of the ensembles' control parameters relative to the number of nodes. A phase diagram is presented, showing a second order phase transition from a connected to a disconnected phase. We study both the canonical formulation, where the size is large but fixed, and the grand canonical formulation, where the size is sampled from a discrete distribution, and show their equivalence in the thermodynamical limit. We also compute analytically the spectral density, which consists of a discrete set of isolated eigenvalues, representing short cycles, and a continuous part, representing cycles of diverging size.

show abstract

Transitions in random graphs of fixed degrees with many short cycles

López

Coolen

2021

J. Phys. Complex.

View full text Add to dashboard Cite

We analyze maximum entropy random graph ensembles with constrained degrees, drawn from arbitrary degree distributions, and a tuneable number of three-cycles (triangles). We find that such ensembles generally exhibit two transitions, a clustering and a shattering transition, separating three distinct regimes. At the clustering transition, the graphs change from typically having only isolated cycles to forming cycle clusters. At the shattering transition the graphs break up into many small cliques to achieve the desired three-cycle density. The locations of both transitions depend nontrivially on the system size. We derive a general formula for the three-cycle density in the regime of isolated cycles, for graphs with degree distributions that have finite first and second moments. For bounded degree distributions we present further analytical results on cycle densities and phase transition locations, which, while non-rigorous, are all validated via MCMC sampling simulations. We show that the shattering transition is of an entropic nature, occurring for all three-cycle density values, provided the system is large enough.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.