Graphical AbstractHighlights d SpliceAI, a 32-layer deep neural network, predicts splicing from a pre-mRNA sequence d 75% of predicted cryptic splice variants validate on RNA-seq d Cryptic splicing may yield 10% of pathogenic variants in neurodevelopmental disorders d Cryptic splice variants frequently give rise to alternative splicing A deep neural network precisely models mRNA splicing from a genomic sequence and accurately predicts noncoding cryptic splice mutations in patients with rare genetic diseases. SUMMARYThe splicing of pre-mRNAs into mature transcripts is remarkable for its precision, but the mechanisms by which the cellular machinery achieves such specificity are incompletely understood. Here, we describe a deep neural network that accurately predicts splice junctions from an arbitrary pre-mRNA transcript sequence, enabling precise prediction of noncoding genetic variants that cause cryptic splicing. Synonymous and intronic mutations with predicted splice-altering consequence validate at a high rate on RNA-seq and are strongly deleterious in the human population. De novo mutations with predicted splice-altering consequence are significantly enriched in patients with autism and intellectual disability compared to healthy controls and validate against RNA-seq in 21 out of 28 of these patients. We estimate that 9%-11% of pathogenic mutations in patients with rare genetic disorders are caused by this previously underappreciated class of disease variation.(legend continued on next page) (F) Relationship between exon-intron length and the strength of the adjoining splice sites, as predicted by SpliceAI-80 nt (local motif score) and SpliceAI-10k. The genome-wide distributions of exon length (yellow) and intron length (pink) are shown in the background. The x axis is in log-scale. (G) A pair of splice acceptor and donor motifs, placed 150 nt apart, are walked along the HMGCR gene. Shown are, at each position, K562 nucleosome signal and the likelihood of the pair forming an exon at that position, as predicted by SpliceAI-10k. The genome-wide Spearman correlation between the two tracks is shown. (H) Average K562 and GM12878 nucleosome signal near private mutations that are predicted by the SpliceAI-10k model to create novel exons in the GTEx cohort.
Sequence-based variation in gene expression is a key driver of disease risk. Common variants regulating expression in cis have been mapped in many expression quantitative trait locus (eQTL) studies, typically in single tissues from unrelated individuals. Here, we present a comprehensive analysis of gene expression across multiple tissues conducted in a large set of mono- and dizygotic twins that allows systematic dissection of genetic (cis and trans) and non-genetic effects on gene expression. Using identity-by-descent estimates, we show that at least 40% of the total heritable cis effect on expression cannot be accounted for by common cis variants, a finding that reveals the contribution of low-frequency and rare regulatory variants with respect to both transcriptional regulation and complex trait susceptibility. We show that a substantial proportion of gene expression heritability is trans to the structural gene, and we identify several replicating trans variants that act predominantly in a tissue-restricted manner and may regulate the transcription of many genes
Transcriptome-wide association studies (TWAS) integrate genome-wide association studies (GWAS) and gene expression datasets to identify gene-trait associations. In this Perspective, we explore properties of TWAS as a potential approach to prioritize causal genes at GWAS loci, by using simulations and case studies of literature-curated candidate causal genes for schizophrenia, low-density-lipoprotein cholesterol and Crohn's disease. We explore risk loci where TWAS accurately prioritizes the likely causal gene as well as loci where TWAS prioritizes multiple genes, some likely to be non-causal, owing to sharing of expression quantitative trait loci (eQTL). TWAS is especially prone to spurious prioritization with expression data from non-trait-related tissues or cell types, owing to substantial cross-cell-type variation in expression levels and eQTL strengths. Nonetheless, TWAS prioritizes candidate causal genes more accurately than simple baselines. We suggest best practices for causal-gene prioritization with TWAS and discuss future opportunities for improvement. Our results showcase the strengths and limitations of using eQTL datasets to determine causal genes at GWAS loci.
The excision of introns from pre-mRNA is an essential step in mRNA processing. We developed LeafCutter to study sample and population variation in intron splicing. LeafCutter identifies variable splicing events from short-read RNA-seq data and finds events of high complexity. Our approach obviates the need for transcript annotations and circumvents the challenges in estimating relative isoform or exon usage in complex splicing events. LeafCutter can be used both for detecting differential splicing between sample groups, and for mapping splicing quantitative trait loci (sQTLs). Compared to contemporary methods, we find 1.4–2.1 times more sQTLs, many of which help us ascribe molecular effects to disease-associated variants. Strikingly, transcriptome-wide associations between LeafCutter intron quantifications and 40 complex traits increased the number of associated disease genes at 5% FDR by an average of 2.1-fold as compared to using gene expression levels alone. LeafCutter is fast, scalable, easy to use, and available online.
Noncoding variants play a central role in the genetics of complex traits, but we still lack a full understanding of the molecular pathways through which they act. We quantified the contribution of cis-acting genetic effects at all major stages of gene regulation from chromatin to proteins, in Yoruba lymphoblastoid cell lines (LCLs). About ~65% of expression quantitative trait loci (eQTLs) have primary effects on chromatin, whereas the remaining eQTLs are enriched in transcribed regions. Using a novel method, we also detected 2893 splicing QTLs, most of which have little or no effect on gene-level expression. These splicing QTLs are major contributors to complex traits, roughly on a par with variants that affect gene expression levels. Our study provides a comprehensive view of the mechanisms linking genetic variation to variation in human gene regulation.
Amyotrophic lateral sclerosis (ALS) is a rapidly progressing neurodegenerative disease characterized by motor neuron loss, leading to paralysis and death 2–5 years following disease onset1. Nearly all ALS patients contain aggregates of the RNA-binding protein TDP-43 in the brain and spinal cord2, and rare mutations in the gene encoding TDP-43 can cause ALS3. There are no effective TDP-43-directed therapies for ALS or related TDP-43 proteinopathies, such as frontotemporal dementia (FTD). Antisense oligonucleotides (ASOs) and RNA interference approaches are emerging as attractive therapeutic strategies in neurological diseases4. Indeed, treating a rodent model of inherited ALS (caused by a mutation in SOD1) with ASOs to SOD1 significantly slowed disease progression5. But since SOD1 mutations account for only ~2–5% of ALS cases, additional therapeutic strategies are needed. Silencing TDP-43 itself is probably not warranted given its critical cellular functions1,6 Here we present an unexpectedly powerful alternative therapeutic strategy for ALS, by targeting ataxin 2. Lowering ataxin 2 suppresses TDP-43 toxicity in yeast and flies7, and intermediate-length polyglutamine expansions in the ataxin 2 gene increase risk of ALS7,8. We used two independent approaches to test whether reducing ataxin 2 levels could mitigate disease in a mouse model of TDP-43 proteinopathy9. First, we crossed ataxin 2 knockout mice to TDP-43 transgenic mice. Lowering ataxin 2 reduced TDP-43 aggregation, had a dramatic effect on survival and improved motor function. Second, in a more therapeutically applicable approach, we administered ASOs targeting ataxin 2 to the central nervous system of TDP-43 mice. This single treatment markedly extended survival. Because TDP-43 aggregation is a component of nearly all ALS cases6, targeting ataxin 2 could represent a broadly effective therapeutic strategy.
A hallmark of the immune system is the interplay among specialized cell types transitioning between resting and stimulated states. The gene regulatory landscape of this dynamic system has not been fully characterized in human cells. Here, we collected ATAC-seq and RNA-seq data under resting and stimulated conditions for up to 32 immune cell populations. Stimulation caused widespread chromatin remodeling, including response elements shared between stimulated B and T cells. Furthermore, several autoimmune traits showed significant heritability in stimulationresponsive elements from distinct cell types, highlighting the importance of these cell states in autoimmunity. Use of allele-specific read-mapping identified variants that alter chromatin accessibility in particular conditions, allowing us to observe evidence of function for a candidate causal variant that is undetected by existing large-scale studies in resting cells. Our results provide a resource of chromatin dynamics and highlight the need for characterization of effects of genetic variation in stimulated cells.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.