The Genotype-Tissue Expression (GTEx) project was established to characterize genetic effects on the transcriptome across human tissues, and to link these regulatory mechanisms to trait and disease associations. Here, we present analyses of the v8 data, based on 17,382 RNA-sequencing samples from 54 tissues of 948 post-mortem donors. We comprehensively characterize genetic associations for gene expression and splicing in cis and trans, showing that regulatory associations are found for almost all genes, and describe the underlying molecular mechanisms and their contribution to allelic heterogeneity and pleiotropy of complex traits. Leveraging the large diversity of tissues, we provide insights into the tissue-specificity of genetic effects, and show that cell type composition is a key factor in understanding gene regulatory mechanisms in human tissues.
Most variants implicated in common human disease by Genome-Wide Association Studies (GWAS) lie in non-coding sequence intervals. Despite the suggestion that regulatory element disruption represents a common theme, identifying causal risk variants within indicted genomic regions remains a significant challenge. Here we present a novel sequence-based computational method to predict the effect of regulatory variation, using a classifier (gkm-SVM) which encodes cell-specific regulatory sequence vocabularies. The induced change in the gkm-SVM score, deltaSVM, quantifies the effect of variants. We show that deltaSVM accurately predicts the impact of SNPs on DNase I sensitivity in their native genomic context, and accurately predicts the results of dense mutagenesis of several enhancers in reporter assays. Previously validated GWAS SNPs yield large deltaSVM scores, and we predict novel risk SNPs for several autoimmune diseases. Thus, deltaSVM provides a powerful computational approach for systematically identifying functional regulatory variants.
Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk1,2,3,4. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants1,5. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles1,6,7, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues8,9,10,11, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release12. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.
Genetic regulation of gene expression is dynamic, as transcription can change during cell differentiation and across cell types. We mapped expression quantitative trait loci (eQTLs) throughout differentiation to elucidate the dynamics of genetic effects on cell type specific gene expression. We generated time-series RNA-sequencing data, capturing 16 time points from induced pluripotent stem cells to cardiomyocytes, in 19 human cell lines. We identified hundreds of dynamic eQTLs that change over time, with enrichment in enhancers of relevant cell types. We also found nonlinear dynamic eQTLs, which affect only intermediate stages of differentiation, and cannot be found by using data from mature tissues. These fleeting genetic associations with gene regulation may represent a new mechanism to explain complex traits and disease. We highlight one example of a nonlinear eQTL that is associated with body mass index.
Gene co-expression networks capture biologically important patterns in gene expression data, enabling functional analyses of genes, discovery of biomarkers, and interpretation of genetic variants. Most network analyses to date have been limited to assessing correlation between total gene expression levels in a single tissue or small sets of tissues. Here, we built networks that additionally capture the regulation of relative isoform abundance and splicing, along with tissue-specific connections unique to each of a diverse set of tissues. We used the Genotype-Tissue Expression (GTEx) project v6 RNA sequencing data across 50 tissues and 449 individuals. First, we developed a framework called Transcriptome-Wide Networks (TWNs) for combining total expression and relative isoform levels into a single sparse network, capturing the interplay between the regulation of splicing and transcription. We built TWNs for 16 tissues and found that hubs in these networks were strongly enriched for splicing and RNA binding genes, demonstrating their utility in unraveling regulation of splicing in the human transcriptome. Next, we used a Bayesian biclustering model that identifies network edges unique to a single tissue to reconstruct Tissue-Specific Networks (TSNs) for 26 distinct tissues and 10 groups of related tissues. Finally, we found genetic variants associated with pairs of adjacent nodes in our networks, supporting the estimated network structures and identifying 20 genetic variants with distant regulatory impact on transcription and splicing. Our networks provide an improved understanding of the complex relationships of the human transcriptome across tissues.
Rare genetic variants are abundant across the human genome, and identifying their function and phenotypic impact is a major challenge. Measuring aberrant gene expression has aided in identifying functional, large-effect rare variants (RVs). Here, we expanded detection of genetically driven transcriptome abnormalities by analyzing gene expression, allele-specific expression, and alternative splicing from multitissue RNA-sequencing data, and demonstrate that each signal informs unique classes of RVs. We developed Watershed, a probabilistic model that integrates multiple genomic and transcriptomic signals to predict variant function, validated these predictions in additional cohorts and through experimental assays, and used them to assess RVs in the UK Biobank, the Million Veterans Program, and the Jackson Heart Study. Our results link thousands of RVs to diverse molecular effects and provide evidence to associate RVs affecting the transcriptome with human traits.
Merkel cell carcinoma (MCC) frequently contains integrated copies of Merkel cell polyomavirus DNA that express a truncated form of Large T antigen (LT) and an intact Small T antigen (ST). While LT binds RB and inactivates its tumor suppressor function, it is less clear how ST contributes to MCC tumorigenesis. Here we show that ST binds specifically to the MYC homolog MYCL (L-MYC) and recruits it to the 15-component EP400 histone acetyltransferase and chromatin remodeling complex. We performed a large-scale immunoprecipitation for ST and identified co-precipitating proteins by mass spectrometry. In addition to protein phosphatase 2A (PP2A) subunits, we identified MYCL and its heterodimeric partner MAX plus the EP400 complex. Immunoprecipitation for MAX and EP400 complex components confirmed their association with ST. We determined that the ST-MYCL-EP400 complex binds together to specific gene promoters and activates their expression by integrating chromatin immunoprecipitation with sequencing (ChIP-seq) and RNA-seq. MYCL and EP400 were required for maintenance of cell viability and cooperated with ST to promote gene expression in MCC cell lines. A genome-wide CRISPR-Cas9 screen confirmed the requirement for MYCL and EP400 in MCPyV-positive MCC cell lines. We demonstrate that ST can activate gene expression in a EP400 and MYCL dependent manner and this activity contributes to cellular transformation and generation of induced pluripotent stem cells.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.