Common single-nucleotide polymorphisms (SNPs) are predicted to collectively explain 40–50% of phenotypic variation in human height, but identifying the specific variants and associated regions requires huge sample sizes1. Here, using data from a genome-wide association study of 5.4 million individuals of diverse ancestries, we show that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a mean size of around 90 kb, covering about 21% of the genome. The density of independent associations varies across the genome and the regions of increased density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs (or all SNPs in the HapMap 3 panel2) account for 40% (45%) of phenotypic variance in populations of European ancestry but only around 10–20% (14–24%) in populations of other ancestries. Effect sizes, associated regions and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely to be explained by linkage disequilibrium and differences in allele frequency within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than are needed to implicate causal genes and variants. Overall, this study provides a comprehensive map of specific genomic regions that contain the vast majority of common height-associated variants. Although this map is saturated for populations of European ancestry, further research is needed to achieve equivalent saturation in other ancestries.
We develop a new method, SBayesRC, that integrates GWAS summary statistics with functional genomic annotations to improve polygenic prediction of complex traits. Our method is scalable to whole-genome variant analysis and refines signals from functional annotations by allowing them to affect both causal variant probability and causal effect distribution. We analyse 28 traits in the UK Biobank using ~7 million common SNPs and 96 annotations. SBayesRC improves prediction accuracy by 14% in European ancestry and by up to 33% in trans-ancestry prediction, compared to the baseline method SBayesR which does not use annotations, and outperforms state-of-the-art methods LDpred-funct, PolyPred-S and PRS-CSx by 12-15%. Investigation of factors affecting prediction accuracy identified a significant interaction between SNP density and annotation information, encouraging future use of whole-genome sequence variants for prediction. Functional partitioning analysis highlights a major contribution of evolutionary constrained regions to prediction accuracy and the largest per-SNP contribution from non-synonymous SNPs.
Summary Quality control (QC) of genome wide association study (GWAS) result files has become increasingly difficult due to advances in genomic technology. The main challenges include continuous increases in the number of polymorphic genetic variants contained in recent GWASs and reference panels, the rising number of cohorts participating in a GWAS consortium, and inclusion of new variant types. Here, we present GWASinspector, a flexible R package for comprehensive QC of GWAS results. This package is compatible with recent imputation reference panels, handles insertion/deletion and multi-allelic variants, provides extensive QC reports and efficiently processes big data files. Reference panels covering three human genome builds (NCBI36, GRCh37 and GRCh38) are available. GWASinspector has a user friendly design and allows easy set-up of the QC pipeline through a configuration file. In addition to checking and reporting on individual files, it can be used in preparation of a meta-analysis by testing for systemic differences between studies and generating cleaned, harmonized GWAS files. Comparison with existing GWAS QC tools shows that the main advantages of GWASinspector are its ability to more effectively deal with insertion/deletion and multi-allelic variants and its relatively low memory use. Availability and Implementation Our package is available at The Comprehensive R Archive Network (CRAN): https://CRAN.R-project.org/package=GWASinspector. Reference datasets and a detailed tutorial can be found at the package website at http://gwasinspector.com/ Supplementary information Supplementary data are available at Bioinformatics online.
Hypertension is a leading cause of premature death affecting more than a billion individuals worldwide. Here we report on the genetic determinants of blood pressure (BP) traits (systolic, diastolic, and pulse pressure) in the largest single-stage genome-wide analysis to date (N = 1,028,980 European-descent individuals). We identified 2,103 independent genetic signals (P < 5x10− 8) for BP traits, including 113 novel loci. These associations explain ~ 40% of common SNP heritability of systolic and diastolic BP. Comparison of top versus bottom deciles of polygenic risk scores (PRS) based on these results reveal clinically meaningful differences in BP (12.9 mm Hg for systolic BP, 95% CI 11.5–14.2 mm Hg, p = 9.08×10− 73) and hypertension risk (OR 5.41; 95% CI 4.12 to 7.10; P = 9.71×10− 33) in an independent dataset. Compared with the area under the curve (AUC) for hypertension discrimination for a model with sex, age, BMI, and genetic ancestry, adding systolic and diastolic BP PRS increased discrimination from 0.791 (95% CI = 0.781–0.801) to 0.814 (95% CI = 0.805–0.824, ∆AUC = 0.023, P = 2.27x10− 22). Our transcriptome-wide association study detected 2,793 BP colocalized associations with genetically-predicted expression of 1,070 genes in five cardiovascular tissues, of which 500 are previously unreported for BP traits. These findings represent an advance in our understanding of hypertension and highlight the role of increasingly large genomic studies for development of more accurate PRS, which may inform precision health research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.