Transcriptome-wide association studies (TWAS) integrate genome-wide association studies (GWAS) and gene expression datasets to identify gene-trait associations. In this Perspective, we explore properties of TWAS as a potential approach to prioritize causal genes at GWAS loci, by using simulations and case studies of literature-curated candidate causal genes for schizophrenia, low-density-lipoprotein cholesterol and Crohn's disease. We explore risk loci where TWAS accurately prioritizes the likely causal gene as well as loci where TWAS prioritizes multiple genes, some likely to be non-causal, owing to sharing of expression quantitative trait loci (eQTL). TWAS is especially prone to spurious prioritization with expression data from non-trait-related tissues or cell types, owing to substantial cross-cell-type variation in expression levels and eQTL strengths. Nonetheless, TWAS prioritizes candidate causal genes more accurately than simple baselines. We suggest best practices for causal-gene prioritization with TWAS and discuss future opportunities for improvement. Our results showcase the strengths and limitations of using eQTL datasets to determine causal genes at GWAS loci.
Transcriptome-wide association studies using predicted expression have identified thousands of genes whose locally regulated expression is associated with complex traits and diseases. In this work, we show that linkage disequilibrium induces significant gene-trait associations at non-causal genes as a function of the expression quantitative trait loci weights used in expression prediction. We introduce a probabilistic framework that models correlation among transcriptome-wide association study signals to assign a probability for every gene in the risk region to explain the observed association signal. Importantly, our approach remains accurate when expression data for causal genes are not available in the causal tissue by leveraging expression prediction from other tissues. Our approach yields credible sets of genes containing the causal gene at a nominal confidence level (for example, 90%) that can be used to prioritize genes for functional assays. We illustrate our approach by using an integrative analysis of lipid traits, where our approach prioritizes genes with strong evidence for causality.
Parasitic nematodes infect over 1 billion people worldwide and cause some of the most common neglected tropical diseases. Despite their prevalence, our understanding of the biology of parasitic nematodes has been limited by the lack of tools for genetic intervention. In particular, it has not yet been possible to generate targeted gene disruptions and mutant phenotypes in any parasitic nematode. Here, we report the development of a method for introducing CRISPR-Cas9-mediated gene disruptions in the human-parasitic threadworm Strongyloides stercoralis. We disrupted the S. stercoralis twitchin gene unc-22, resulting in nematodes with severe motility defects. Ss-unc-22 mutations were resolved by homology-directed repair when a repair template was provided. Omission of a repair template resulted in deletions at the target locus. Ss-unc-22 mutations were heritable; we passed Ss-unc-22 mutants through a host and successfully recovered mutant progeny. Using a similar approach, we also disrupted the unc-22 gene of the rat-parasitic nematode Strongyloides ratti. Our results demonstrate the applicability of CRISPR-Cas9 to parasitic nematodes, and thereby enable future studies of gene function in these medically relevant but previously genetically intractable parasites.
Transcriptome-wide association studies (TWAS) using predicted expression have identified thousands of genes whose locally-regulated expression is associated to complex traits and diseases. In this work, we show that linkage disequilibrium (LD) among SNPs induce significant gene-trait associations at non-causal genes as a function of the overlap between eQTL weights used in expression prediction. We introduce a probabilistic framework that models the induced correlation among TWAS signals to assign a probability for every gene in the risk region to explain the observed association signal. Our approach yields credible sets of genes containing the causal gene at a nominal confidence level (e.g., 90%) that can be used to prioritize and select genes for functional assays. Importantly, our approach remains accurate when expression data for causal genes are not available in the casual tissue by leveraging expression prediction from other tissues. We illustrate our approach using an integrative analysis of lipids traits where we correctly identify known causal genes.
Genome-wide association studies (GWAS) have identified over 100 risk loci for schizophrenia, but the causal mechanisms remain largely unknown. We performed a transcriptome-wide association study (TWAS) integrating expression data from brain, blood, and adipose tissues across 3,693 individuals with schizophrenia GWAS of 79,845 individuals from the Psychiatric Genomics Consortium. We identified 157 genes with a transcriptome-wide significant association, of which 35 did not overlap a known GWAS locus; the largest number involved alternative splicing in brain. 42/157 genes were also associated to specific chromatin phenotypes measured in 121 independent samples (a 4-fold enrichment over background genes). This highthroughput connection of GWAS findings to specific genes, tissues, and regulatory mechanisms is an essential step toward understanding the biology of schizophrenia and moving towards therapeutic interventions.introduction Genome-wide association studies (GWAS) have yielded thousands of robustly associated variants for schizophre-1 nia (SCZ) and many other complex traits, but relatively few of these associations have implicated specific 2 biological mechanisms 1,2 , as GWAS association signals often span many putative target genes, may affect 3 gene expression through regulatory 3 or structural elements 4 , and may affect genes at considerable genomic 4 distances via chromatin looping 5,6 . A growing body of research has demonstrated the enrichment of SCZ 5 GWAS risk variants and heritability within regulatory elements identified through maps of chromatin mod-6 ifications and accessibility 1,7-13 . Since chromatin modifications are themselves under genetic control 6,14-19 , 7 a causal mechanism for SCZ loci could lead from genetic variation to chromatin modifiers to gene expression 8 and finally to disease risk. Indeed, QTLs for chromatin (and other molecular phenotypes) are enriched within 9 GWAS associations, further supporting this hypothesis 6,18,20,21 . 10 In this work, we leveraged large gene expression cohorts from multiple tissues, as well as splice variants in 11 brain, to perform a transcriptome-wide association study (TWAS) 22-24 in a large SCZ GWAS data set 1 12 to identify genes whose expression is associated with SCZ and mediated by genetics. We subsequently 13 performed a TWAS for a diverse set of chromatin phenotypes to identify SCZ susceptibility genes that are 14 also associated with specific regulatory elements. To our knowledge, this is the first TWAS to integrate 15 analysis of gene expression, differential splicing, and chromatin variation, moving beyond top SNPs to 16 implicate SCZ-associated molecular features across the regulatory cascade ( Figure 1A). 17 results TWAS for SCZ identifies new susceptibility genes 18 We analyzed gene expression and genome-wide SNP array data in 3,693 individuals across four expression 19 reference panels spanning three tissues: RNA-seq from the dorsolateral prefrontal cortex (PFC) of 621 20 individuals -including SCZ and bipolar (BIP) cas...
We report targeted sequencing of 63 known prostate cancer risk regions in a multi-ancestry study of 9,237 men and use the data to explore the contribution of low-frequency variation to disease risk. We show that SNPs with minor allele frequencies (MAFs) of 0.1-1% explain a substantial fraction of prostate cancer risk in men of African ancestry. We estimate that these SNPs account for 0.12 (standard error (s.e.) = 0.05) of variance in risk (∼42% of the variance contributed by SNPs with MAF of 0.1-50%). This contribution is much larger than the fraction of neutral variation due to SNPs in this class, implying that natural selection has driven down the frequency of many prostate cancer risk alleles; we estimate the coupling between selection and allelic effects at 0.48 (95% confidence interval [0.19, 0.78]) under the Eyre-Walker model. Our results indicate that rare variants make a disproportionate contribution to genetic risk for prostate cancer and suggest the possibility that rare variants may also have an outsize effect on other common traits.
Although genome-wide association studies (GWAS) for prostate cancer (PrCa) have identified more than 100 risk regions, most of the risk genes at these regions remain largely unknown. Here we integrate the largest PrCa GWAS (N = 142,392) with gene expression measured in 45 tissues (N = 4458), including normal and tumor prostate, to perform a multi-tissue transcriptome-wide association study (TWAS) for PrCa. We identify 217 genes at 84 independent 1 Mb regions associated with PrCa risk, 9 of which are regions with no genome-wide significant SNP within 2 Mb. 23 genes are significant in TWAS only for alternative splicing models in prostate tumor thus supporting the hypothesis of splicing driving risk for continued oncogenesis. Finally, we use a Bayesian probabilistic approach to estimate credible sets of genes containing the causal gene at a pre-defined level; this reduced the list of 217 associations to 109 genes in the 90% credible set. Overall, our findings highlight the power of integrating expression with PrCa GWAS to identify novel risk loci and prioritize putative causal genes at known risk loci.
Despite rapid progress in characterizing the role of host genetics in SARS-Cov-2 infection, there is limited understanding of genes and pathways that contribute to COVID-19. Here, we integrate a genome-wide association study of COVID-19 hospitalization (7,885 cases and 961,804 controls from COVID-19 Host Genetics Initiative) with mRNA expression, splicing, and protein levels (n = 18,502). We identify 27 genes related to inflammation and coagulation pathways whose genetically predicted expression was associated with COVID-19 hospitalization. We functionally characterize the 27 genes using phenome- and laboratory-wide association scans in Vanderbilt Biobank (n = 85,460) and identified coagulation-related clinical symptoms, immunologic, and blood-cell-related biomarkers. We replicate these findings across trans-ethnic studies and observed consistent effects in individuals of diverse ancestral backgrounds in Vanderbilt Biobank, pan-UK Biobank, and Biobank Japan. Our study highlights and reconfirms putative causal genes impacting COVID-19 severity and symptomology through the host inflammatory response.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.