Here we conducted a large-scale genetic association analysis of educational attainment in a sample of approximately 1.1 million individuals and identify 1,271 independent genome-wide-significant SNPs. For the SNPs taken together, we found evidence of heterogeneous effects across environments. The SNPs implicate genes involved in brain-development processes and neuron-to-neuron communication. In a separate analysis of the X chromosome, we identify 10 independent genome-wide-significant SNPs and estimate a SNP heritability of around 0.3% in both men and women, consistent with partial dosage compensation. A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11-13% of the variance in educational attainment and 7-10% of the variance in cognitive performance. This prediction accuracy substantially increases the utility of polygenic scores as tools in research.
Recent genome-wide association studies (GWAS) of height and body mass index (BMI) in ∼250000 European participants have led to the discovery of ∼700 and ∼100 nearly independent single nucleotide polymorphisms (SNPs) associated with these traits, respectively. Here we combine summary statistics from those two studies with GWAS of height and BMI performed in ∼450000 UK Biobank participants of European ancestry. Overall, our combined GWAS meta-analysis reaches N ∼700000 individuals and substantially increases the number of GWAS signals associated with these traits. We identified 3290 and 941 near-independent SNPs associated with height and BMI, respectively (at a revised genome-wide significance threshold of P < 1 × 10-8), including 1185 height-associated SNPs and 751 BMI-associated SNPs located within loci not previously identified by these two GWAS. The near-independent genome-wide significant SNPs explain ∼24.6% of the variance of height and ∼6.0% of the variance of BMI in an independent sample from the Health and Retirement Study (HRS). Correlations between polygenic scores based upon these SNPs with actual height and BMI in HRS participants were ∼0.44 and ∼0.22, respectively. From analyses of integrating GWAS and expression quantitative trait loci (eQTL) data by summary-data-based Mendelian randomization, we identified an enrichment of eQTLs among lead height and BMI signals, prioritizing 610 and 138 genes, respectively. Our study demonstrates that, as previously predicted, increasing GWAS sample sizes continues to deliver, by the discovery of new loci, increasing prediction accuracy and providing additional data to achieve deeper insight into complex trait biology. All summary statistics are made available for follow-up studies.
Health risk factors such as body mass index (BMI) and serum cholesterol are associated with many common diseases. It often remains unclear whether the risk factors are cause or consequence of disease, or whether the associations are the result of confounding. We develop and apply a method (called GSMR) that performs a multi-SNP Mendelian randomization analysis using summary-level data from genome-wide association studies to test the causal associations of BMI, waist-to-hip ratio, serum cholesterols, blood pressures, height, and years of schooling (EduYears) with common diseases (sample sizes of up to 405,072). We identify a number of causal associations including a protective effect of LDL-cholesterol against type-2 diabetes (T2D) that might explain the side effects of statins on T2D, a protective effect of EduYears against Alzheimer’s disease, and bidirectional associations with opposite effects (e.g., higher BMI increases the risk of T2D but the effect of T2D on BMI is negative).
Type 2 diabetes (T2D) is a very common disease in humans. Here we conduct a meta-analysis of genome-wide association studies (GWAS) with ~16 million genetic variants in 62,892 T2D cases and 596,424 controls of European ancestry. We identify 139 common and 4 rare variants associated with T2D, 42 of which (39 common and 3 rare variants) are independent of the known variants. Integration of the gene expression data from blood (n = 14,115 and 2765) with the GWAS results identifies 33 putative functional genes for T2D, 3 of which were targeted by approved drugs. A further integration of DNA methylation (n = 1980) and epigenomic annotation data highlight 3 genes (CAMK1D, TP53INP1, and ATP5G1) with plausible regulatory mechanisms, whereby a genetic variant exerts an effect on T2D through epigenetic regulation of gene expression. Our study uncovers additional loci, proposes putative genetic regulatory mechanisms for T2D, and provides evidence of purifying selection for T2D-associated variants.
The identification of genes and regulatory elements underlying the associations discovered by GWAS is essential to understanding the aetiology of complex traits (including diseases). Here, we demonstrate an analytical paradigm of prioritizing genes and regulatory elements at GWAS loci for follow-up functional studies. We perform an integrative analysis that uses summary-level SNP data from multi-omics studies to detect DNA methylation (DNAm) sites associated with gene expression and phenotype through shared genetic effects (i.e., pleiotropy). We identify pleiotropic associations between 7858 DNAm sites and 2733 genes. These DNAm sites are enriched in enhancers and promoters, and >40% of them are mapped to distal genes. Further pleiotropic association analyses, which link both the methylome and transcriptome to 12 complex traits, identify 149 DNAm sites and 66 genes, indicating a plausible mechanism whereby the effect of a genetic variant on phenotype is mediated by genetic regulation of transcription through DNAm.
Here we reiterate the fastGWA model ! = # $%& ' $%& + ) * + * + , + -[S1]where ! is an . × 1 vector of mean centred phenotypes with . being the sample size; # $%& is a vector of mean-centred genotype variables of a variant of interest with its effect ' $%& ; ) * is the incidence matrix of fixed covariates with their corresponding coefficients + * ; , is a vector of the total genetic effects captured by pedigree relatedness with ,~2(0, 67 8 9 ); 6 is the family relatedness matrix based on pedigree structure; -is a vector of residuals with -~2(0, <7 = 9 ). The variance-covariance matrix of ! is > = 67 8 9 + ?7 = 9 and the generalized least squares estimate of. Therefore, to test whether ' $%& = 0, we first need to estimate the variance components 7 8 9 and 7 = 9 . As in most existing MLM-based association tools 1-7 , to avoid running the variance estimation analysis repeatedly for each target variant, we estimate 7 8 9 and 7 = 9 under the null modelassuming the effect of a single variant on 7 N 8 9 is negligible. The REML log-likelihood (L) function of model [2] can be written asConventional REML algorithms such as the average information (AI) 8 involve the computations of > WX , Y and Y6, which is computationally intensive when n is large even if 6 is sparse. Here we describe an algorithm (termed as fastGWA-REML) that uses grid search to estimate 7 8 9 without the need to compute > WX , Y and Y6. For ease of computation, we first adjust the phenotype for covariates by linear regression (let ! Z[\ denote a vector of phenotypes after adjustment). We can rewrite L as −with 1 being an . × 1 vector of 1's. All the elements in L including |>|, > WX X and > WX ! Z[\ can be computed efficiently by the Cholesky decomposition of V (without the need of computing > WX ) in sparse matrix setting. Because the computation of L is extremely fast, we can use a grid search to obtain an estimate of 7 8 9 (note that 7 N = 9 can be computed as 7 N ] 9 − 7 N 8 9 with 7 N ] 9 being the empirical variance of phenotype after adjustment).The rationale underlying this grid-search method is similar to that in Runcie et al. 9 . We compute the log-likelihood scores given a grid of possible values of 7 N 8 9 (e.g., 7 N 8 9 Î[0, 1.67 N ] 9 ] with 100 steps, i.e., a step size of 0.0167 N ] 9 ). Note that we define an upper limit to be large than 7 N ] 9 to accommodate rare scenarios where the estimate of 7 N 8 9 from the fastGWA model can be larger than 7 N ] 9 if the true heritability is large in the presence of substantial common environmental effects. Next, we refine the search in a window around the 7 N 8 9 value that produces the highest log-likelihood score (denoted by 7 N 8(bZG) 9) with a window size of 0.27 N 8(bZG) 9 and 16 steps. For example, if 7 N 8(bZG) 9 = 0.167 N ] 9 , we will refine the search in 7 N 8 9 Î[0.1447 N ] 9 , 0.1767 N ] 9 ] with 16 steps (i.e., a step size of 0.0027 N ] 9 ). We repeat this process iteratively until the difference in 7 N 8 9 with the highest log-likelihood score between two adjacent iterations is smalle...
The capacity to accurately predict an individual's phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. Recently, Bayesian methods for generating polygenic predictors have been successfully applied in human genomics but require the individual level data, which are often limited in their access due to privacy or logistical concerns, and are computationally very intensive. This has motivated methodological frameworks that utilise publicly available genome-wide association studies (GWAS) summary data, which now for some traits include results from greater than a million individuals. In this study, we extend the established summary statistics methodological framework to include a class of point-normal mixture prior Bayesian regression models, which have been shown to generate optimal genetic predictions and can perform heritability estimation, variant mapping and estimate the distribution of the genetic effects. In a wide range of simulations and cross-validation using 10 real quantitative traits and 1.1 million variants on 350,000 individuals from the UK Biobank (UKB), we establish that our summary based method, SBayesR, performs similarly to methods that use the individual level data and outperforms other state-of-the-art summary statistics methods in terms of prediction accuracy and heritability estimation at a fraction of the computational resources. We generate polygenic predictors for body mass index and height in two independent data sets and show that by exploiting summary statistics on 1.1 million variants from the largest GWAS meta-analysis (n ≈ 700, 000) that the SBayesR prediction R 2 improved on average across traits by 6.8% relative to that estimated from an individual-level data BayesR analysis of data from the UKB (n ≈ 450, 000). Compared with commonly used state-of-the-art summarybased methods, SBayesR improved the prediction R 2 by 4.1% relative to LDpred and by 28.7% relative to clumping and p-value thresholding. SBayesR gave comparable prediction accuracy to the recent RSS method, which has a similar model, but at a computational time that is two orders of magnitude smaller. The methodology is implemented in a very efficient and user-friendly software tool titled GCTB. Introduction 1The capacity to accurately predict an individual's phenotype from their DNA sequence 2 is one of the great promises of genomics and precision medicine 1-5 , recognising that the 3 accuracy of a genetic risk predictor is dependent on the genetic contribution to variation 4 in the trait. It is anticipated that genetic risk prediction will be useful for informing early 5 disease intervention and aiding diagnosis by identifying individuals with an increased 6 genetic risk of disease 5-7 . Accurate genetic predictors for complex traits and disorders are 7 currently limited, due mainly to an incomplete understanding of complex genetic varia-8 tion, small training sample sizes and suboptimal modelling 4,8,9 . Through large consortia 9 and biobank initiatives, sample sizes for gen...
Understanding the difference in genetic regulation of gene expression between brain and blood is important for discovering genes for brain-related traits and disorders. Here, we estimate the correlation of genetic effects at the top-associated cis-expression or -DNA methylation (DNAm) quantitative trait loci (cis-eQTLs or cis-mQTLs) between brain and blood (rb). Using publicly available data, we find that genetic effects at the top cis-eQTLs or mQTLs are highly correlated between independent brain and blood samples ( for cis-eQTLs and for cis-mQTLs). Using meta-analyzed brain cis-eQTL/mQTL data (n = 526 to 1194), we identify 61 genes and 167 DNAm sites associated with four brain-related phenotypes, most of which are a subset of the discoveries (97 genes and 295 DNAm sites) using data from blood with larger sample sizes (n = 1980 to 14,115). Our results demonstrate the gain of power in gene discovery for brain-related phenotypes using blood cis-eQTL/mQTL data with large sample sizes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.