Abstract.Overfitting, which happens when the number of parameters in a model is too large compared to the number of data points available for determining these parameters, is a serious and growing problem in survival analysis. While modern medicine presents us with data of unprecedented dimensionality, these data cannot yet be used effectively for clinical outcome prediction. Standard error measures in maximum likelihood regression, such as p-values and z-scores, are blind to overfitting, and even for Cox's proportional hazards model (the main tool of medical statisticians), one finds in literature only rules of thumb on the number of samples required to avoid overfitting. In this paper we present a mathematical theory of overfitting in regression models for time-to-event data, which aims to increase our quantitative understanding of the problem and provide practical tools with which to correct regression outcomes for the impact of overfitting. It is based on the replica method, a statistical mechanical technique for the analysis of heterogeneous many-variable systems that has been used successfully for several decades in physics, biology, and computer science, but not yet in medical statistics. We develop the theory initially for arbitrary regression models for time-to-event data, and verify its predictions in detail for the popular Cox model.
We investigate an XY spin-glass model in which both spins and interactions (or couplings) evolve in time, but with widely separated time-scales. For large times this model can be solved using replica theory, requiring two levels of replicas, one level for the spins and one for the couplings. We define the relevant order parameters, and derive a phase diagram in the replica-symmetric approximation, which exhibits two distinct spin-glass phases. The first phase is characterized by freezing of the spins only, whereas in the second phase both spins and couplings are frozen. A detailed stability analysis leads also to two distinct corresponding de Almeida-Thouless lines, each marking continuous replica-symmetry breaking. Numerical simulations support our theoretical study.
We have studied two specific models of frustrated and disordered coupled Kuramoto oscillators, all driven with the same natural frequency, in the presence of random external pinning fields. Our models are structurally similar, but differ in their degree of bond frustration and in their finite size ground state properties (one has random ferro-and anti-ferromagnetic interactions; the other has random chiral interactions). We have calculated the equilibrium properties of both models in the thermodynamic limit using the replica method, with emphasis on the role played by symmetries of the pinning field distribution, leading to explicit predictions for observables, transitions, and phase diagrams. For absent pinning fields our two models are found to behave identically, but pinning fields (provided with appropriate statistical properties) break this symmetry. Simulation data lend satisfactory support to our theoretical predictions.
We study the coupled dynamics of primary and secondary structure formation (i.e. slow genetic sequence selection and fast folding) in the context of a solvable microscopic model that includes both short-range steric forces and and long-range polarity-driven forces. Our solution is based on the diagonalization of replicated transfer matrices, and leads in the thermodynamic limit to explicit predictions regarding phase transitions and phase diagrams at genetic equilibrium. The predicted phenomenology allows for natural physical interpretations, and finds satisfactory support in numerical simulations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.