Deep transfer learning provides a<i>Pareto</i>improvement for multi-ancestral clinico-genomic prediction of diseases

Gao, Yan; Cui, Yan

doi:10.1101/2022.09.22.509055

Cited by 2 publications

(4 citation statements)

References 68 publications

(89 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The synthetic datasets are available from https:// figsh are. com/ artic les/ media/ TLGP_ GM/ 25377 532 [107]. The source code is available from https:// github.…”

Section: Supplementary Informationmentioning

confidence: 99%

Optimizing clinico-genomic disease prediction across ancestries: a machine learning strategy with Pareto improvement

Gao,

Cui

2024

Genome Med

View full text Add to dashboard Cite

Background Accurate prediction of an individual’s predisposition to diseases is vital for preventive medicine and early intervention. Various statistical and machine learning models have been developed for disease prediction using clinico-genomic data. However, the accuracy of clinico-genomic prediction of diseases may vary significantly across ancestry groups due to their unequal representation in clinical genomic datasets. Methods We introduced a deep transfer learning approach to improve the performance of clinico-genomic prediction models for data-disadvantaged ancestry groups. We conducted machine learning experiments on multi-ancestral genomic datasets of lung cancer, prostate cancer, and Alzheimer’s disease, as well as on synthetic datasets with built-in data inequality and distribution shifts across ancestry groups. Results Deep transfer learning significantly improved disease prediction accuracy for data-disadvantaged populations in our multi-ancestral machine learning experiments. In contrast, transfer learning based on linear frameworks did not achieve comparable improvements for these data-disadvantaged populations. Conclusions This study shows that deep transfer learning can enhance fairness in multi-ancestral machine learning by improving prediction accuracy for data-disadvantaged populations without compromising prediction accuracy for other populations, thus providing a Pareto improvement towards equitable clinico-genomic prediction of diseases.

show abstract

“…The synthetic datasets are available from https:// figsh are. com/ artic les/ media/ TLGP_ GM/ 25377 532 [107]. The source code is available from https:// github.…”

Section: Supplementary Informationmentioning

confidence: 99%

Optimizing clinico-genomic disease prediction across ancestries: a machine learning strategy with Pareto improvement

Gao,

Cui

2024

Genome Med

View full text Add to dashboard Cite

show abstract

“…The current prevalent machine learning scheme for multiethnic data, the mixture learning scheme, and its main alternative, the independent learning scheme, have major obstacles in training optimal machine learning models for data-disadvantaged subpopulations (19,(93)(94)(95). The two Multiethnic machine learning schemes.…”

Section: Multiethnic Machine Learningmentioning

confidence: 99%

“…In transfer learning, a machine learning model trained on a data-rich subpopulation (source domain) can aid in training a model for a data-disadvantaged subpopulation (target domain) without affecting its own prediction accuracy. Thus, transfer learning provides a Pareto improvement (112) for multiethnic machine learning (95). Pareto improvement is a generally desired scenario in which some parties are better off without negatively impacting other parties in the system.…”

Section: Loss Function: the Difference Between Estimated And True Out...mentioning

confidence: 99%

“…Machine learning experiments on synthetic data show that data inequality and subpopulation shift are the key factors underlying model performance disparities (19,95). Currently, these challenges in multiethnic machine learning are being addressed on two fronts: data collection and algorithmic intervention (Figure 6).…”

Section: Machine Learning With More Ancestrally Balanced Datamentioning

confidence: 99%

See 1 more Smart Citation

Addressing the Challenge of Biomedical Data Inequality: An Artificial Intelligence Perspective

Gao

Sharma

Cui

2023

Annu. Rev. Biomed. Data Sci.

Self Cite

View full text Add to dashboard Cite

Artificial intelligence (AI) and other data-driven technologies hold great promise to transform healthcare and confer the predictive power essential to precision medicine. However, the existing biomedical data, which are a vital resource and foundation for developing medical AI models, do not reflect the diversity of the human population. The low representation in biomedical data has become a significant health risk for non-European populations, and the growing application of AI opens a new pathway for this health risk to manifest and amplify. Here we review the current status of biomedical data inequality and present a conceptual framework for understanding its impacts on machine learning. We also discuss the recent advances in algorithmic interventions for mitigating health disparities arising from biomedical data inequality. Finally, we briefly discuss the newly identified disparity in data quality among ethnic groups and its potential impacts on machine learning. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 6 is August 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

show abstract

Deep transfer learning provides aParetoimprovement for multi-ancestral clinico-genomic prediction of diseases

Cited by 2 publications

References 68 publications

Optimizing clinico-genomic disease prediction across ancestries: a machine learning strategy with Pareto improvement

Optimizing clinico-genomic disease prediction across ancestries: a machine learning strategy with Pareto improvement

Addressing the Challenge of Biomedical Data Inequality: An Artificial Intelligence Perspective

Contact Info

Product

Resources

About