Abstract.Overfitting, which happens when the number of parameters in a model is too large compared to the number of data points available for determining these parameters, is a serious and growing problem in survival analysis. While modern medicine presents us with data of unprecedented dimensionality, these data cannot yet be used effectively for clinical outcome prediction. Standard error measures in maximum likelihood regression, such as p-values and z-scores, are blind to overfitting, and even for Cox's proportional hazards model (the main tool of medical statisticians), one finds in literature only rules of thumb on the number of samples required to avoid overfitting. In this paper we present a mathematical theory of overfitting in regression models for time-to-event data, which aims to increase our quantitative understanding of the problem and provide practical tools with which to correct regression outcomes for the impact of overfitting. It is based on the replica method, a statistical mechanical technique for the analysis of heterogeneous many-variable systems that has been used successfully for several decades in physics, biology, and computer science, but not yet in medical statistics. We develop the theory initially for arbitrary regression models for time-to-event data, and verify its predictions in detail for the popular Cox model.
Rare event statistics for random walks on complex networks are investigated using the large deviations formalism. Within this formalism, rare events are realized as typical events in a suitably deformed path-ensemble, and their statistics can be studied in terms of spectral properties of a deformed Markov transition matrix. We observe two different types of phase transition in such systems: (i) rare events which are singled out for sufficiently large values of the deformation parameter may correspond to localized modes of the deformed transition matrix; (ii) "mode-switching transitions" may occur as the deformation parameter is varied. Details depend on the nature of the observable for which the rare event statistics is studied, as well as on the underlying graph ensemble. In the present letter we report on the statistics of the average degree of the nodes visited along a random walk trajectory in Erdős-Rényi networks. Large deviations rate functions and localization properties are studied numerically. For observables of the type considered here, we also derive an analytical approximation for the Legendre transform of the large-deviations rate function, which is valid in the large connectivity limit. It is found to agree well with simulations.
We investigate the credit risk model defined in [1] under more general assumptions, in particular using a general degree distribution for sparse graphs. Expanding upon earlier results, we show that the model is exactly solvable in the N → ∞ limit and demonstrate that the exact solution is described by the message-passing approach outlined in Karrer and Newman [2], generalized to include heterogeneous agents and couplings. We provide comparisons with simulations of graph ensembles with power-law degree distributions. *
We study the large deviations of the magnetization at some finite time in the Curie-Weiss random field Ising model with parallel updating. While relaxation dynamics in an infinite-time horizon gives rise to unique dynamical trajectories [specified by initial conditions and governed by first-order dynamics of the form m t+1 = f (m t )], we observe that the introduction of a finite-time horizon and the specification of terminal conditions can generate a host of metastable solutions obeying second-order dynamics. We show that these solutions are governed by a Newtonian-like dynamics in discrete time which permits solutions in terms of both the first-order relaxation ("forward") dynamics and the backward dynamics m t+1 = f −1 (m t ). Our approach allows us to classify trajectories for a given final magnetization as stable or metastable according to the value of the rate function associated with them. We find that in analogy to the Freidlin-Wentzell description of the stochastic dynamics of escape from metastable states, the dominant trajectories may switch between the two types (forward and backward) of first-order dynamics. Additionally, we show how to compute rate functions when uncertainty in the quenched disorder is introduced.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.