2020
DOI: 10.1088/1751-8121/aba028
|View full text |Cite
|
Sign up to set email alerts
|

Replica analysis of overfitting in generalized linear regression models

Abstract: Nearly all statistical inference methods were developed for the regime where the number N of data samples is much larger than the data dimension p. Inference protocols such as maximum likelihood (ML) or maximum a posteriori probability (MAP) are unreliable if p = O(N), due to overfitting. This limitation has for many disciplines with increasingly high-dimensional data become a serious bottleneck. We recently showed that in Cox regression for time-to-event data the overfitting errors are not just noise but take… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
29
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
9

Relationship

2
7

Authors

Journals

citations
Cited by 13 publications
(34 citation statements)
references
References 93 publications
(134 reference statements)
4
29
0
1
Order By: Relevance
“…This may be due to the redundant information generated by the accumulation of features from multiple growth stages. In addition, the excessive dimensionality of input features also poses the risk of overfitting the machine learning model ( Feng et al, 2017 ; Coolen et al, 2020 ). Among the combinations of EMF, the prediction accuracy of C2 was comparable to a combination with the highest prediction accuracy of C4.…”
Section: Discussionmentioning
confidence: 99%
“…This may be due to the redundant information generated by the accumulation of features from multiple growth stages. In addition, the excessive dimensionality of input features also poses the risk of overfitting the machine learning model ( Feng et al, 2017 ; Coolen et al, 2020 ). Among the combinations of EMF, the prediction accuracy of C2 was comparable to a combination with the highest prediction accuracy of C4.…”
Section: Discussionmentioning
confidence: 99%
“…The platform supports the deployment of Interpretable Artificial Intelligence (IAI) and Bayesian inference methods for rapid and scalable risk stratification of prostate cancer. These algorithms will include novel findings around overfitting of data ( Coolen et al, 2017 ; Coolen et al, 2020 ) and latent class models ( Rowley et al, 2017 ) which will help us to stratify patients more correctly.…”
Section: Methodsmentioning
confidence: 99%
“…To address these combined challenges of high data dimensionality, covariate disparity, and latent cohort heterogeneity, we build a data analytics pipeline (based on the libraries underlying the SaddlePoint-Signature and SaddlePoint-Mosaics software packages https://www.saddlepointscience.com/ ) which combine cross-validation protocols, optimisation tools for covariate selection, and modern mathematical techniques with which to “decontaminate” regression outcomes for the effects of overfitting [see e.g. ( Coolen et al, 2017 ; Sheikh and Coolen, 2019 ; Coolen et al, 2020 )], with the use of modality-specific “meta-covariates”. The latter are personalised and optimised modality-specific risk scores (decontaminated for overfitting), which are subsequently used as integrated digital biomarkers that capture the relevant predictive information in each of the data sources.…”
Section: Methodsmentioning
confidence: 99%
“…Penelitian yang dilakukan oleh ACC Coolen dengan judul "Replica analysis of overfitting in generalized linear regression models" menunjukkan hasil Derivasi yang hanya bergantung pada bentuk linear tergeneralisasi dari GLM dan saat memilih prior L2. Karena itu replika perhitungannya tidak perlu diulangi untuk setiap contoh model GLM baru; seperti biasa metode replika berfungsi sebagai kendaraan yang relatif tidak menyakitkan dan elegan tetapi kuat untuk sampai pada kumpulan persamaan parameter orde tertutup, bersama dengan rumus mengungkapkan hubungan antara penduga parameter ML / MAP dan benar nilai-nilai parameter ini [13].…”
Section: Pendahuluanunclassified