Summary/AbstractGenome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development, and clinical guidelines. However, the dominance of European-ancestry populations in GWAS creates a biased view of the role of human variation in disease, and hinders the equitable translation of genetic associations into clinical and public health applications. The Population Architecture using Genomics and Epidemiology (PAGE) study conducted a GWAS of 26 clinical and behavioral phenotypes in 49,839 non-European individuals. Using strategies designed for analysis of multi-ethnic and admixed populations, we confirm 574 GWAS catalog variants across these traits, and find 38 secondary signals in known loci and 27 novel loci. Our data shows strong evidence of effect-size heterogeneity across ancestries for published GWAS associations, substantial benefits for fine-mapping using diverse cohorts, and insights into clinical implications. We strongly advocate for continued, large genome-wide efforts in diverse populations to reduce health disparities.
1The emergence of very large cohorts in genomic research has facilitated a focus on 2 genotype-imputation strategies to power rare variant association. Consequently, a new generation 3 of genotyping arrays are being developed designed with tag single nucleotide polymorphisms 4 (SNPs) to improve rare variant imputation. Selection of these tag SNPs poses several challenges 5 as rare variants tend to be continentally-or even population-specific and reflect fine-scale linkage 6 disequilibrium (LD) structure impacted by recent demographic events. To explore the landscape of 7 tag-able variation and guide design considerations for large-cohort and biobank arrays, we 8 developed a novel pipeline to select tag SNPs using the 26 population reference panel from Phase 9 3 of the 1000 Genomes Project. We evaluate our approach using leave-one-out internal validation 10 via standard imputation methods that allows the direct comparison of tag SNP performance by 11 estimating the correlation of the imputed and real genotypes for each iteration of potential array 12 sites. We show how this approach allows for an assessment of array design and performance that 13 can take advantage of the development of deeper and more diverse sequenced reference panels. 14 We quantify the impact of demography on tag SNP performance across populations and provide 15 population-specific guidelines for tag SNP selection. We also examine array design strategies that 16 target single populations versus multi-ethnic cohorts, and demonstrate a boost in performance for 17 the latter can be obtained by prioritizing tag SNPs that contribute information across multiple 18 populations simultaneously. Finally, we demonstrate the utility of improved array design to provide 19 meaningful improvements in power, particularly in trans-ethnic studies. The unified framework 20 presented will enable investigators to make informed decisions for the design of new arrays, and 21 help empower the next phase of rare variant association for global health. 22
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.