To characterise type 2 diabetes (T2D) associated variation across the allele frequency spectrum, we conducted a meta-analysis of genome-wide association data from 26,676 T2D cases and 132,532 controls of European ancestry after imputation using the 1000 Genomes multi-ethnic reference panel. Promising association signals were followed-up in additional data sets (of 14,545 or 7,397 T2D cases and 38,994 or 71,604 controls). We identified 13 novel T2D-associated loci (p<5×10-8), including variants near the GLP2R, GIP, and HLA-DQA1 genes. Our analysis brought the total number of independent T2D associations to 128 distinct signals at 113 loci. Despite substantially increased sample size and more complete coverage of low-frequency variation, all novel associations were driven by common SNVs. Credible sets of potentially causal variants were generally larger than those based on imputation with earlier reference panels, consistent with resolution of causal signals to common risk haplotypes. Stratification of T2D-associated loci based on T2D-related quantitative trait associations revealed tissue-specific enrichment of regulatory annotations in pancreatic islet enhancers for loci influencing insulin secretion, and in adipocytes, monocytes and hepatocytes for insulin action-associated loci. These findings highlight the predominant role played by common variants of modest effect and the diversity of biological mechanisms influencing T2D pathophysiology.
Here we reiterate the fastGWA model ! = # $%& ' $%& + ) * + * + , + -[S1]where ! is an . × 1 vector of mean centred phenotypes with . being the sample size; # $%& is a vector of mean-centred genotype variables of a variant of interest with its effect ' $%& ; ) * is the incidence matrix of fixed covariates with their corresponding coefficients + * ; , is a vector of the total genetic effects captured by pedigree relatedness with ,~2(0, 67 8 9 ); 6 is the family relatedness matrix based on pedigree structure; -is a vector of residuals with -~2(0, <7 = 9 ). The variance-covariance matrix of ! is > = 67 8 9 + ?7 = 9 and the generalized least squares estimate of. Therefore, to test whether ' $%& = 0, we first need to estimate the variance components 7 8 9 and 7 = 9 . As in most existing MLM-based association tools 1-7 , to avoid running the variance estimation analysis repeatedly for each target variant, we estimate 7 8 9 and 7 = 9 under the null modelassuming the effect of a single variant on 7 N 8 9 is negligible. The REML log-likelihood (L) function of model [2] can be written asConventional REML algorithms such as the average information (AI) 8 involve the computations of > WX , Y and Y6, which is computationally intensive when n is large even if 6 is sparse. Here we describe an algorithm (termed as fastGWA-REML) that uses grid search to estimate 7 8 9 without the need to compute > WX , Y and Y6. For ease of computation, we first adjust the phenotype for covariates by linear regression (let ! Z[\ denote a vector of phenotypes after adjustment). We can rewrite L as −with 1 being an . × 1 vector of 1's. All the elements in L including |>|, > WX X and > WX ! Z[\ can be computed efficiently by the Cholesky decomposition of V (without the need of computing > WX ) in sparse matrix setting. Because the computation of L is extremely fast, we can use a grid search to obtain an estimate of 7 8 9 (note that 7 N = 9 can be computed as 7 N ] 9 − 7 N 8 9 with 7 N ] 9 being the empirical variance of phenotype after adjustment).The rationale underlying this grid-search method is similar to that in Runcie et al. 9 . We compute the log-likelihood scores given a grid of possible values of 7 N 8 9 (e.g., 7 N 8 9 Î[0, 1.67 N ] 9 ] with 100 steps, i.e., a step size of 0.0167 N ] 9 ). Note that we define an upper limit to be large than 7 N ] 9 to accommodate rare scenarios where the estimate of 7 N 8 9 from the fastGWA model can be larger than 7 N ] 9 if the true heritability is large in the presence of substantial common environmental effects. Next, we refine the search in a window around the 7 N 8 9 value that produces the highest log-likelihood score (denoted by 7 N 8(bZG) 9) with a window size of 0.27 N 8(bZG) 9 and 16 steps. For example, if 7 N 8(bZG) 9 = 0.167 N ] 9 , we will refine the search in 7 N 8 9 Î[0.1447 N ] 9 , 0.1767 N ] 9 ] with 16 steps (i.e., a step size of 0.0027 N ] 9 ). We repeat this process iteratively until the difference in 7 N 8 9 with the highest log-likelihood score between two adjacent iterations is smalle...
Understanding the difference in genetic regulation of gene expression between brain and blood is important for discovering genes for brain-related traits and disorders. Here, we estimate the correlation of genetic effects at the top-associated cis-expression or -DNA methylation (DNAm) quantitative trait loci (cis-eQTLs or cis-mQTLs) between brain and blood (rb). Using publicly available data, we find that genetic effects at the top cis-eQTLs or mQTLs are highly correlated between independent brain and blood samples ( for cis-eQTLs and for cis-mQTLs). Using meta-analyzed brain cis-eQTL/mQTL data (n = 526 to 1194), we identify 61 genes and 167 DNAm sites associated with four brain-related phenotypes, most of which are a subset of the discoveries (97 genes and 295 DNAm sites) using data from blood with larger sample sizes (n = 1980 to 14,115). Our results demonstrate the gain of power in gene discovery for brain-related phenotypes using blood cis-eQTL/mQTL data with large sample sizes.
Understanding the difference in genetic regulation of gene expression between brain and blood is important for discovering genes associated with brain-related traits and disorders. Here, we estimate the correlation of genetic effects at the top associated cis-expression (cis-eQTLs or cis-mQTLs) between brain and blood for genes expressed (or CpG sites methylated) in both tissues, while accounting for errors in their estimated effects (rb). Using publicly available data (n = 72 to 1,366), we find that the genetic effects of cis-eQTLs (PeQTL < 5´10 -8 ) or mQTLs (PmQTL < 1´10 -10 ) are highly correlated between independent brain and blood samples ( " = 0.70 with SE = 0.015 for cis-eQTL and " = 0.78 with SE = 0.006 for cis-mQTLs). Using meta-analyzed brain eQTL/mQTL data (n = 526 to 1,194), we identify 61 genes and 167 DNA methylation (DNAm) sites associated with 4 brain-related traits and disorders. Most of these associations are a subset of the discoveries (97 genes and 295 DNAm sites) using data from blood with larger sample sizes (n = 1,980 to 14,115). We further find that cis-eQTLs with tissue-specific effects are approximately uniformly distributed across all the functional annotation categories, and that mean difference in gene expression level between brain and blood is almost independent of the difference in the corresponding cis-eQTL effect. Our results demonstrate the gain of power in gene discovery for brain-related phenotypes using blood cis-eQTL or cis-mQTL data with large sample sizes.Charitable Foundation. This study makes use of data from dbGaP (accessions: phs000428.v1.p1 and phs000424.v6.p1), UK Biobank Resource (application number: 12514), UK10K project and CommonMind Consortium. A full list of acknowledgements to these data sets can be found in Supplementary Note. The members of the eQTLGen Consortium are (in alphabetical order):
The genome-wide association study (GWAS) has been widely used as an experimental design to detect associations between genetic variants and a phenotype. Two major confounding factors, population stratification and relatedness, could potentially lead to inflated GWAS test-statistics and thereby spurious associations. Mixed linear model (MLM)-based approaches can be used to account for sample structure. However, genome-wide association (GWA) analyses in biobank samples such as the UK Biobank (UKB) often exceed the capability of most existing MLM-based tools especially if the number of traits is large. Here, we developed an MLM-based tool (called fastGWA) that controls for population stratification by principal components and relatedness by a sparse genetic relationship matrix for GWA analyses of biobank-scale data. We demonstrated by extensive simulations that fastGWA is reliable, robust and highly resource-efficient. We then applied fastGWA to 2,173 traits on 456,422 array-genotyped and imputed individuals and 2,048 traits on 46,191 whole-exome-sequenced individuals in the UKB.
Differences between sexes contribute to variation in the levels of fasting glucose and insulin. Epidemiological studies established a higher prevalence of impaired fasting glucose in men and impaired glucose tolerance in women, however, the genetic component underlying this phenomenon is not established. We assess sex-dimorphic (73,089/50,404 women and 67,506/47,806 men) and sex-combined (151,188/105,056 individuals) fasting glucose/fasting insulin genetic effects via genome-wide association study meta-analyses in individuals of European descent without diabetes. Here we report sex dimorphism in allelic effects on fasting insulin at IRS1 and ZNF12 loci, the latter showing higher RNA expression in whole blood in women compared to men. We also observe sex-homogeneous effects on fasting glucose at seven novel loci. Fasting insulin in women shows stronger genetic correlations than in men with waist-to-hip ratio and anorexia nervosa. Furthermore, waist-to-hip ratio is causally related to insulin resistance in women, but not in men. These results position dissection of metabolic and glycemic health sex dimorphism as a steppingstone for understanding differences in genetic effects between women and men in related phenotypes.
Compared to linear mixed model-based genome-wide association (GWA) methods, generalized linear mixed model (GLMM)-based methods have better statistical properties when applied to binary traits but are computationally much slower. Here, leveraging efficient sparse matrix-based algorithms, we developed a GLMM-based GWA tool (called fastGWA-GLMM) that is orders of magnitude faster than the state-of-the-art tool (e.g., ~37 times faster when 𝑛 = 400,000) with more scalable memory usage. We show by simulation that the fastGWA-GLMM test-statistics of both common and rare variants are well-calibrated under the null, even for traits with an extreme case-control ratio (e.g., 0.1%). We applied fastGWA-GLMM to the UK Biobank data of 456,348 individuals, 11,842,647 variants and 2,989 binary traits (full summary statistics available at http://fastgwa.info/ukbimpbin) and identified 259 rare variants associated with 75 traits, demonstrating the use of imputed genotype data in a large cohort to discover rare variants for binary complex traits.
A variety of methods have been developed to demultiplex pooled samples in a single cell RNA sequencing (scRNA-seq) experiment which either require hashtag barcodes or sample genotypes prior to pooling. We introduce scSplit which utilizes genetic differences inferred from scRNA-seq data alone to demultiplex pooled samples. scSplit also enables mapping clusters to original samples. Using simulated, merged, and pooled multi-individual datasets, we show that scSplit prediction is highly concordant with demuxlet predictions and is highly consistent with the known truth in cell-hashing dataset. scSplit is ideally suited to samples without external genotype information and is
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.