Biobank projects are generating genomic data for many thousands of individuals. Computational methods are needed to handle these massive data sets, including genetic ancestry (GA) inference tools. Current methods for GA inference do not scale to biobank-size genomic datasets. We present Rye—a new algorithm for GA inference at biobank scale. We compared the accuracy and runtime performance of Rye to the widely used RFMix, ADMIXTURE and iAdmix programs and applied it to a dataset of 488221 genome-wide variant samples from the UK Biobank. Rye infers GA based on principal component analysis of genomic variant samples from ancestral reference populations and query individuals. The algorithm's accuracy is powered by Metropolis-Hastings optimization and its speed is provided by non-negative least squares regression. Rye produces highly accurate GA estimates for three-way admixed populations—African, European and Native American—compared to RFMix and ADMIXTURE (${R}^2 = \ 0.998 - 1.00$), and shows 50× runtime improvement compared to ADMIXTURE on the UK Biobank dataset. Rye analysis of UK Biobank samples demonstrates how it can be used to infer GA at both continental and subcontinental levels. We discuss user consideration and options for the use of Rye; the program and its documentation are distributed on the GitHub repository: https://github.com/healthdisparities/rye.
Biobank projects around the world are generating genomic data for many thousands and even millions of individuals. Computational methods are needed to handle these massive data sets, including tools for genetic ancestry (GA) inference. Current methods for GA inference are generally accurate, but they are slow and do not scale to biobank-size genomic datasets. Here we present Rye - a new algorithm for GA inference at biobank scale. We compare the accuracy and runtime performance of Rye to the widely used RFMix and ADMIXTURE programs, and we apply it to a dataset of 488,221 genome-wide variant samples from the UK Biobank. Rye infers GA based on principal component analysis (PCA) of genomic variant samples from ancestral reference populations and query individuals. The algorithm's accuracy is powered by Metropolis-Hastings optimization and its speed is provided by non-negative least squares (NNLS) regression. Rye produces highly accurate GA estimates for three-way admixed populations - African, European, and Native American - compared to RFMix and ADMIXTURE (R2=0.998-1.00), and shows 50x runtime improvement compared to ADMIXTURE on the UK Biobank dataset. Rye analysis of UK Biobank samples demonstrates how it can be used to infer GA at different levels of relatedness. We discuss user consideration and options for the use of Rye; the program and its documentation are distributed on the GitHub repository: https://github.com/healthdisparities/rye.
Ethnic minorities in developed countries suffer a disproportionately high burden of COVID-19 morbidity and mortality, and COVID-19 ethnic disparities have been attributed to social determinants of health. Vitamin D has been proposed as a modifiable risk factor that could mitigate COVID-19 health disparities. We investigated the relationship between vitamin D and COVID-19 susceptibility and severity using the UK Biobank, a large progressive cohort study of the United Kingdom population. Structural equation modelling was used to evaluate the ability of vitamin D, socioeconomic deprivation, and other known risk factors to mediate COVID-19 ethnic health disparities. Asian ethnicity is associated with higher COVID-19 susceptibility, compared to the majority White population, and Asian and Black ethnicity are both associated with higher COVID-19 severity. Socioeconomic deprivation mediates all three ethnic disparities and shows the highest overall signal of mediation for any COVID-19 risk factor. Vitamin supplements, including vitamin D, mediate the Asian disparity in COVID-19 susceptibility, and serum 25-hydroxyvitamin D (calcifediol) levels mediate Asian and Black COVID-19 severity disparities. Several measures of overall health also mediate COVID-19 ethnic disparities, underscoring the importance of comorbidities. Our results support ethnic minorities' use of vitamin D as both a prophylactic and a supplemental therapeutic for COVID-19.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.