Underrepresented populations are often excluded from genomic studies due in part to a lack of resources supporting their analysis. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high quality set of 4,096 whole genomes from HGDP and 1kGP with data from gnomAD and identified over 155 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also demonstrate substantial added value from this dataset compared to the prior versions of the component resources, typically combined via liftover and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared to previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.
Asthma is a complex disease that affects millions of people and varies in prevalence by an order of magnitude across geographic regions and populations. However, the extent to which genetic variation contributes to these disparities is unclear, as studies probing the genetics of asthma have been primarily limited to populations of European (EUR) descent. As part of the Global Biobank Meta-analysis Initiative (GBMI), we conducted the largest genome-wide association study of asthma to date (153,763 cases and 1,647,022 controls) via meta-analysis across 18 biobanks spanning multiple countries and ancestries. Altogether, we discovered 180 genome-wide significant loci (p < 5x10-8) associated with asthma, 49 of which are not previously reported. We replicate well-known associations such as IL1RL1 and STAT6, and find that overall the novel associations have smaller effects than previously-discovered loci, highlighting our substantial increase in statistical power. Despite the considerable range in prevalence among biobanks, from 3% to 24%, the genetic effects of associated loci are largely consistent across the biobanks and ancestries. To further investigate the polygenic architecture of asthma, we construct polygenic risk scores (PRS) using a multi-ancestry approach, which yields higher predictive power for asthma in non-EUR populations compared to PRS derived from previous asthma meta-analyses and using other methods. Additionally, we find considerable genetic overlap between asthma and chronic obstructive pulmonary disease (COPD) across ancestries but minimal overlap in enriched biological pathways. Our work underscores the multifactorial nature of asthma development and offers insight into the shared genetic architecture of asthma that may be differentially perturbed by environmental factors and contribute to variation in prevalence.
Although there is an exponential increase and extensive availability of genome-wide association studies data, the visualization of this data remains difficult for non-specialist users. Current software and packages for visualizing GWAS data are intended for specialists and have been developed to accomplish specific functions, favouring functionality over user experience. To facilitate this, we have developed an R shiny web application, gwaRs, that allows any general user to visualize GWAS data efficiently and effortlessly. The gwaRs web-browser interface allows users to visualize GWAS data using SNP-density, quantile-quantile, Manhattan, and Principal Component Analysis plots.
Availability:The gwaRs web application is publicly hosted at https://gwasviz.shinyapps.io/gwaRs/ and R source code is released under the GNU General Public License and freely available at GitHub: https://github.com/LindoNkambule/gwaRs.
Contact
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.