The Genotype-Tissue Expression (GTEx) project was established to characterize genetic effects on the transcriptome across human tissues and to link these regulatory mechanisms to trait and disease associations. Here, we present analyses of the version 8 data, examining 15,201 RNA-sequencing samples from 49 tissues of 838 postmortem donors. We comprehensively characterize genetic associations for gene expression and splicing in cis and trans, showing that regulatory associations are found for almost all genes, and describe the underlying molecular mechanisms and their contribution to allelic heterogeneity and pleiotropy of complex traits. Leveraging the large diversity of tissues, we provide insights into the tissue specificity of genetic effects and show that cell type composition is a key factor in understanding gene regulatory mechanisms in human tissues.
Identifying regulatory genetic effects in pluripotent cells provides important insights into disease variants with potentially transient or developmental origins. Combining existing and newlygenerated data, we established a population-scale resource of 1,367 induced pluripotent stem cell lines derived from 948 unique donors, with matched RNA-sequencing (RNA-seq) and genetic information. The sample size of our study allowed us to significantly expand our knowledge of quantitative trait loci (QTL) in pluripotent human cells and their broad trait and disease relevance.We identified cis-QTL for conventional gene expression levels but also transcript-ratio, exon expression levels, alternative-splicing and alternative polyadenylation usage, cumulatively yielding cis regulatory effects for 18,556 genes. We assessed the effects of rare variants identified from whole-genome sequencing data, by relating these to gene expression outliers, observing an enrichment for rare, deleterious variants in iPSCs with outlying expression profiles, which exceeds previous observations in differentiated tissues. Finally, we assessed distal QTL on a genomewide scale, identifying 193 trans-eQTL, linked to 191 trans-eGenes (FDR <10%), 38% of which were replicated in independent samples.By linking our regulatory map of rare and common QTL to comprehensive GWAS data, we identified high-confidence colocalization events for 4,336 individual GWAS loci, which includes physical traits such as height and coronary artery disease. In addition, we identify rare variant associations for metabolic rate and trans-eQTL linked to both cancer and height. Collectively, our data greatly expand the regulatory landscape in human pluripotent cells, and catalogues traitassociated variants that have potential developmental or transient contexts.
RNA sequencing (RNA-seq) is a complementary approach for Mendelian disease diagnosis for patients in whom exome-sequencing is not informative. For both rare neuromuscular and mitochondrial disorders, its application has improved diagnostic rates. However, the generalizability of this approach to diverse Mendelian diseases has yet to be evaluated. We sequenced whole blood RNA from 56 cases with undiagnosed rare diseases spanning 11 diverse disease categories to evaluate the general application of RNA-seq to Mendelian disease diagnosis. We developed a robust approach to compare rare disease cases to existing large sets of RNA-seq controls (N=1,594 external and N=31 family-based controls) and demonstrated the substantial impacts of gene and variant filtering strategies on disease gene identification when combined with RNA-seq. Across our cohort, we observed that RNA-seq yields a 8.5% diagnostic rate. These diagnoses included diseases where blood would not intuitively reflect evidence of disease. We identified RARS2 as an under-expression outlier containing compound heterozygous pathogenic variants for an individual exhibiting profound global developmental delay, seizures, microcephaly, hypotonia, and progressive scoliosis. We also identified a new splicing junction in KCTD7 for an individual with global developmental delay, loss of milestones, tremors and seizures. Our study provides a broad evaluation of blood RNA-seq for the diagnosis of rare disease.
Long non-coding RNA (lncRNA) genes are known to have diverse impacts on gene regulation. However, it is still a major challenge to distinguish functional lncRNAs from those that are byproducts of surrounding transcriptional activity. To systematically identify hallmarks of biological function, we used the GTEx v8 data to profile the expression, regulation, network relationships and trait associations of lncRNA genes across 49 tissues encompassing 87 distinct traits. In addition to revealing widespread differences in regulatory patterns between lncRNA and protein-coding genes, we identified novel disease-associated lncRNAs, such as C6orf3 for psoriasis and LINC01475/RP11-129J12.1 for ulcerative colitis. This work provides a comprehensive resource to interrogate lncRNA genes of interest and annotate cell type and human trait relevance.One Sentence SummarylncRNA genes have distinctive regulatory patterns and unique trait associations compared to protein-coding genes.
SummaryPolygenic risk scores (PRS) aim to quantify the contribution of multiple genetic loci to an individual’s likelihood of a complex trait or disease. However, existing PRS estimate genetic liability using common genetic variants, excluding the impact of rare variants. We identified rare, large-effect variants in individuals with outlier gene expression from the GTEx project and then assessed their impact on PRS predictions in the UK Biobank (UKB). We observed large deviations from the PRS-predicted phenotypes for carriers of multiple outlier rare variants; for example, individuals classified as “low-risk” but in the top 1% of outlier rare variant burden had a 6-fold higher rate of severe obesity. We replicated these findings using data from the NHLBI Trans-Omics for Precision Medicine (TOPMed) biobank and the Million Veteran Program, and demonstrated that PRS across multiple traits will significantly benefit from the inclusion of rare genetic variants.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.