Admixed populations constitute a large portion of global human genetic diversity, yet they are often left out of genomics analyses. This exclusion is problematic, as it leads to disparities in the understanding of the genetic structure and history of diverse cohorts and the performance of genomic medicine across populations. Admixed populations have particular statistical challenges, as they inherit genomic segments from multiple source populations—the primary reason they have historically been excluded from genetic studies. In recent years, however, an increasing number of statistical methods and software tools have been developed to account for and leverage admixture in the context of genomics analyses. Here, we provide a survey of such computational strategies for the informed consideration of admixture to allow for the well-calibrated inclusion of mixed ancestry populations in large-scale genomics studies, and we detail persisting gaps in existing tools. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 6 is August 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Polygenic risk scores built from multi-ancestry genome-wide association studies (GWAS, PRSmulti) have the potential to improve PRS accuracy and generalizability across populations. To provide the best practice to leverage the increasing diversity of genomic studies, we used large-scale simulated and empirical data to investigate how ancestry composition, trait-specific genetic architecture, and PRS methodology affect the performance of PRSmulti as compared to PRS constructed from single-ancestry GWAS (PRSsingle). In both simulations on 6 various scenarios and empirical analyses on 17 anthropometric and blood panel traits, we showed that the accuracy of PRSmulti overall outperformed PRSsingle in the understudied target populations, except for a few comparisons where the understudied population only accounted for a very small proportion of the multi-ancestry GWAS. Further, using substantially fewer samples for traits such as height and mean corpuscular volume from Biobank Japan (BBJ) may achieve comparable accuracies to using 320,000 European (EUR) individuals from UK Biobank (UKBB). Finally, we find that incorporating PRS based on local ancestry-informed GWAS and large-scale EUR-based PRS improved predictive performance than using EUR-based PRS alone in understudied African (AFR) population, especially for less polygenic traits when there are variants with large ancestry-specific effects. Overall, our study provides insights into how ancestry composition and genetic architecture impact polygenic prediction across populations, particularly across imbalanced sample sizes. Our work also highlights the need for increasing diversity in genetic studies to achieve equitable PRS performance across ancestral populations and provides practical guidance on developing PRS from multiple resources.
The spontaneous, curly whiskers mutation (abbreviated
cw
) generates kinky, brittle vibrissae in homozygous mice. Although
cw
has been mapped to the centromeric end of mouse Chromosome 9, no particular gene has been causally implicated, and this lack of genetic assignment has stymied
cw
's complete molecular and functional analysis. As a foundation for its positional cloning, we have fine-mapped
cw
to a small, 0.57 Mb interval that contains only three skin-expressed genes, including hephaestin-like 1 (
Hephl1
), which encodes a membrane-bound, multi-copper ferroxidase. Sequence analysis of all
Hephl1
coding regions in
cw
/
cw
mutants revealed a single-base-pair substitution that alters
Hephl1
mRNA splicing, and is specific to the
cw
allele, only. Sequence analysis of a second, independent, re-mutation to curly whiskers (that we verified by complementation testing with
cw
and have designated
cw
2J
) revealed a distinct defect in
Hephl1
(a frame-shifting, single-base-pair insertion) that is specific to
cw
2J
. The results presented strongly suggest that defects in the
Hephl1
gene are the molecular basis of the classical, curly-whiskers mutant phenotypes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.