Artificial intelligence (AI) and other data-driven technologies hold great promise to transform healthcare and confer the predictive power essential to precision medicine. However, the existing biomedical data, which are a vital resource and foundation for developing medical AI models, do not reflect the diversity of the human population. The low representation in biomedical data has become a significant health risk for non-European populations, and the growing application of AI opens a new pathway for this health risk to manifest and amplify. Here we review the current status of biomedical data inequality and present a conceptual framework for understanding its impacts on machine learning. We also discuss the recent advances in algorithmic interventions for mitigating health disparities arising from biomedical data inequality. Finally, we briefly discuss the newly identified disparity in data quality among ethnic groups and its potential impacts on machine learning. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 6 is August 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Somatic DNA copy number alterations (CNAs) arise in tumor tissue because of underlying genomic instability. Recurrent CNAs that occur in the same genomic region across multiple independent samples are of interest to researchers because they may contain genes that contribute to the cancer phenotype. However, differences in copy number states between cancers are also commonly of interest, for example when comparing tumors with distinct morphologies in the same anatomic location. Current methodologies are limited by their inability to perform direct comparisons of CNAs between tumor cohorts, and thus they cannot formally assess the statistical significance of observed copy number differences or identify regions of the genome where these differences occur. We introduce the DiNAMIC.Duo R package that can be used to identify recurrent CNAs in a single cohort or recurrent copy number differences between two cohorts, including when neither cohort is copy neutral. The TCGA studies of lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) identified statistically significant CNAs in many known cancer-related genes, including gains of EGFR and losses of CDKN2A in each tumor type separately. By directly comparing the two cohorts, DiNAMIC.Duo detects statistically significant copy number differences for CDKN2A, thus suggesting that losses are more pronounced in LUSC; in contrast, differences for EGFR are not statistically significant, which suggests similar levels of gain. Existing methods that detect recurrent CNAs in a single cohort cannot make this distinction. Recent studies have leveraged TCGA data to find known cancer genes in chr3q, chr14q13, and chr20q11 that are differentially expressed in LUAD vs. LUSC. DiNAMIC.Duo identifies statistically significant copy number differences in these regions, which suggests that the observed expression changes may be driven by underlying differences in copy number. Citation Format: Vonn Walter, Hyo Young Choi, Xiaobei Zhao, Yan Gao, Jeremiah Holt, D. Neil Hayes. Detecting somatic DNA copy number differences with DiNAMIC.Duo [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 2071.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.