Preventing dataset shift from breaking machine-learning biomarkers

Dockès, Jéroôme; Varoquaux, Gaël; Poline, Jean‐Baptiste

doi:10.1093/gigascience/giab055

Cited by 56 publications

(35 citation statements)

References 53 publications

(58 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As the researcher may be unaware of the corresponding dataset bias is can lead to important that shortcomings of the study. Dataset bias occurs when the data used to build the decision model (the training data), has a different distribution than the data on which it should be applied 17 (the test data). To assess clinically-relevant predictions, the test data must match the actual target population, rather than be a random subset of the same data pool as the train data, the common practice in machine-learning studies.…”

Section: Data An Imperfect Window On the Clinicmentioning

confidence: 99%

Machine learning for medical imaging: methodological failures and recommendations for the future

2022

Self Cite

View full text Add to dashboard Cite

Research in computer analysis of medical images bears many promises to improve patients’ health. However, a number of systematic challenges are slowing down the progress of the field, from limitations of the data, such as biases, to research incentives, such as optimizing for publication. In this paper we review roadblocks to developing and assessing methods. Building our analysis on evidence from the literature and data challenges, we show that at every step, potential biases can creep in. On a positive note, we also discuss on-going efforts to counteract these problems. Finally we provide recommendations on how to further address these problems in the future.

show abstract

Section: Data An Imperfect Window On the Clinicmentioning

confidence: 99%

Machine learning for medical imaging: methodological failures and recommendations for the future

2022

Self Cite

View full text Add to dashboard Cite

show abstract

“…A major concern in neuroimaging research is the effect of site on the generalizability of ML models (Dockes et al, 2021; Solanes et al, 2021). Sites may differ in terms of scanner infrastructure, acquisition protocols and neuroimaging feature extraction pipelines as well as sample composition.…”

Section: Discussionmentioning

confidence: 99%

Systematic Evaluation of Machine Learning Algorithms for Neuroanatomically-Based Age Prediction in Youth

Modabbernia

Whalley

Glahn

et al. 2021

Preprint

View full text Add to dashboard Cite

Application of machine learning algorithms to structural magnetic resonance imaging (sMRI) data has yielded behaviorally meaningful estimates of the biological age of the brain (brainage). The choice of the machine learning approach in estimating brain-age in children and adolescents is important because age-related brain changes in these age-groups are dynamic. However, the comparative performance of the multiple machine learning algorithms available has not been systematically appraised. To address this gap, the present study evaluated the accuracy (Mean Absolute Error; MAE) and computational efficiency of 21 machine learning algorithms using sMRI data from 2,105 typically developing individuals aged 5 to 22 years from five cohorts. The trained models were then tested in an independent holdout datasets, comprising 4,078 pre-adolescents (aged 9-10 years). The algorithms encompassed parametric and nonparametric, Bayesian, linear and nonlinear, tree-based, and kernel-based models. Sensitivity analyses were performed for parcellation scheme, number of neuroimaging input features, number of cross-validation folds, and sample size. The best performing algorithms were Extreme Gradient Boosting (MAE of 1.25 years for females and 1.57 years for males), Random Forest Regression (MAE of 1.23 years for females and 1.65 years for males) and Support Vector Regression with Radial Basis Function Kernel (MAE of 1.47 years for females and 1.72 years for males) which had acceptable and comparable computational efficiency. Findings of the present study could be used as a guide for optimizing methodology when quantifying age-related changes during development.HighlightsEnsemble-based algorithms performed best in predicting brain age during developmentSupport vector regression offers optimal prediction accuracy and computational costsA 400-parcel resolution provided the best accuracy and computational efficiency

show abstract

“…This usually requires collecting a big set of patient data (~millions of samples)—both clinical histories and biochemical measurements for the biomarker panel (Swan et al 2015 ), which, at present, is prohibitive to generating such models (Krassowski et al 2020 ). However, recent developments in machine learning, for example, utilising a Bayesian interface (Polson and Sokolov 2017 ), mean that it is possible to train the models with datasets that are an order of magnitude smaller (Assawamakin et al 2013 ; Zhang and Ling 2018 ; Dockès et al 2021 ; Ko et al 2021 ).…”

Section: Checkpoint Inhibitor Genes As Biomarkers For Cancer Clinical...mentioning

confidence: 99%

Transcriptional and post-transcriptional regulation of checkpoint genes on the tumour side of the immunological synapse

Dobosz

Stempor²,

Moreno

et al. 2022

Heredity

View full text Add to dashboard Cite

Cancer is a disease of the genome, therefore, its development has a clear Mendelian component, demonstrated by well-studied genes such as BRCA1 and BRCA2 in breast cancer risk. However, it is known that a single genetic variant is not enough for cancer to develop leading to the theory of multistage carcinogenesis. In many cases, it is a sequence of events, acquired somatic mutations, or simply polygenic components with strong epigenetic effects, such as in the case of brain tumours. The expression of many genes is the product of the complex interplay between several factors, including the organism’s genotype (in most cases Mendelian-inherited), genetic instability, epigenetic factors (non-Mendelian-inherited) as well as the immune response of the host, to name just a few. In recent years the importance of the immune system has been elevated, especially in the light of the immune checkpoint genes discovery and the subsequent development of their inhibitors. As the expression of these genes normally suppresses self-immunoreactivity, their expression by tumour cells prevents the elimination of the tumour by the immune system. These discoveries led to the rapid growth of the field of immuno-oncology that offers new possibilities of long-lasting and effective treatment options. Here we discuss the recent advances in the understanding of the key mechanisms controlling the expression of immune checkpoint genes in tumour cells.

show abstract

Preventing dataset shift from breaking machine-learning biomarkers

Cited by 56 publications

References 53 publications

Machine learning for medical imaging: methodological failures and recommendations for the future

Machine learning for medical imaging: methodological failures and recommendations for the future

Systematic Evaluation of Machine Learning Algorithms for Neuroanatomically-Based Age Prediction in Youth

Transcriptional and post-transcriptional regulation of checkpoint genes on the tumour side of the immunological synapse

Contact Info

Product

Resources

About