The lack of tools to identify causative variants from sequencing data greatly limits the promise of precision medicine. Previous studies suggest that one-third of disease-associated alleles alter splicing. We discovered that the alleles causing splicing defects cluster in disease-associated genes (for example, haploinsufficient genes). We analyzed 4,964 published disease-causing exonic mutations using a massively parallel splicing assay (MaPSy), which showed an 81% concordance rate with splicing in patient tissue. Approximately 10% of exonic mutations altered splicing, mostly by disrupting multiple stages of spliceosome assembly. We present a large-scale characterization of exonic splicing mutations using a new technology that facilitates variant classification and keeps pace with variant discovery.
Predicting the effects of genetic variants on splicing is highly relevant for human genetics. We describe the framework MMSplice (modular modeling of splicing) with which we built the winning model of the CAGI5 exon skipping prediction challenge. The MMSplice modules are neural networks scoring exon, intron, and splice sites, trained on distinct large-scale genomics datasets. These modules are combined to predict effects of variants on exon skipping, splice site choice, splicing efficiency, and pathogenicity, with matched or higher performance than state-of-the-art. Our models, available in the repository Kipoi, apply to variants including indels directly from VCF files. Electronic supplementary material The online version of this article (10.1186/s13059-019-1653-z) contains supplementary material, which is available to authorized users.
Decades of research have shown that mutations in the p53 stress response pathway affect the incidence of diverse cancers more than mutations in other pathways. However, most evidence is limited to somatic mutations and rare inherited mutations. Using newly abundant genomic data, we demonstrate that commonly inherited genetic variants in the p53 pathway also affect the incidence of a broad range of cancers more than variants in other pathways. The cancer-associated single nucleotide polymorphisms (SNPs) of the p53 pathway have strikingly similar genetic characteristics to well-studied p53 pathway cancer-causing somatic mutations. Our results enable insights into p53-mediated tumour suppression in humans and into p53 pathway-based cancer surveillance and treatment strategies.
Research into the problem of splice site selection has followed a reductionist approach focused on how individual splice sites are recognized. Early applications of information theory uncovered an inconsistency. Human splice signals do not contain enough information to explain the observed fidelity of splicing. Here, we conclude that introns do not necessarily contain ‘missing’ information but rather may require definition from neighboring processing events. For example, there are known cases where an intronic mutation disrupts the splicing of not only the local intron but also adjacent introns. We present a genome-wide measurement of the order of splicing within human transcripts. The observed order of splicing cannot be explained by a simple kinetic model. Simulations reveal a bias toward a particular, transcript-specific order of intron removal in human genes. We validate an extreme class of intron that can only splice in a multi-intron context. Special categories of splicing such as exon circularization, first and last intron processing, alternative 5 and 3′ss usage and exon skipping are marked by distinct patterns of ordered intron removal. Excessive intronic length and silencer density tend to delay splicing. Shorter introns that contain enhancers splice early.
Pre-mRNA splicing is mediated by interactions of the Core Spliceosome and an array of accessory RNA binding proteins with cis-sequence elements. Splicing is a major regulatory component in higher eukaryotes. Disruptions in splicing are a major contributor to human disease. One in three hereditary disease alleles are believed to cause aberrant splicing. Hereditary disease alleles can alter splicing by disrupting a splicing element, creating a toxic RNA, or affecting splicing factors. One of the challenges of medical genetics is identifying causal variants from the thousands of possibilities discovered in a clinical sequencing experiment. Here we review the basic biochemistry of splicing, the mechanisms of splicing mutations, the methods for identifying splicing mutants, and the potential of therapeutic interventions.
RNA secondary structure plays an integral role in catalytic, ribosomal, small nuclear, micro, and transfer RNAs. Discovering a prevalent role for secondary structure in pre-mRNAs has proven more elusive. By utilizing a variety of computational and biochemical approaches, we present evidence for a class of nuclear introns that relies upon secondary structure for correct splicing. These introns are defined by simple repeat expansions of complementary AC and GT dimers that co-occur at opposite boundaries of an intron to form a bridging structure that enforces correct splice site pairing. Remarkably, this class of introns does not require U2AF2, a core component of the spliceosome, for its processing. Phylogenetic analysis suggests that this mechanism was present in the ancestral vertebrate lineage prior to the divergence of tetrapods from teleosts. While largely lost from land dwelling vertebrates, this class of introns is found in 10% of all zebrafish genes.[Supplemental material is available for this article.]RNA splicing is a process that removes an internal segment of RNA (i.e., the intron) and rejoins together the two flanking segments (exons). Distinct but evolutionarily related versions of this processing reaction are found in prokaryotes and eukaryotes in a variety of different contexts. In eukaryotes, the splicing of nuclear introns is catalyzed by a large riboprotein complex called the spliceosome (Matlin and Moore 2007). RNA encoded by genes in organelles and some bacterial genomes contain self-splicing group I and II introns which catalyze their own removal (Cech et al. 1981). A basic problem for all introns is the correct identification and pairing of the splice sites. In group I and II introns, this pairing function is performed by RNA secondary structure alone, whereas in spliceosomal introns, small nuclear ribonucleoproteins (snRNPs) recognize and pair together the correct 5 ′ splice site (5 ′ ss) and branchpoint site (BP). However, there are some examples where the pairing of sites is assisted by intramolecular secondary structure in the intron (Goguel and Rosbash 1993;Libri et al. 1995;Charpentier and Rosbash 1996;Howe and Ares 1997;Spingola et al. 1999). In addition, there are some fascinating examples of how secondary structures can regulate mutually exclusive alternative splicing (Warf and Berglund 2007;McManus and Graveley 2011): Several regions of the Dscam1 pre-mRNA undergo extensive alternative splicing. In one of these regions, an upstream "selector" sequence near exon 5 can select from an array of 48 complementary downstream "docking" sequences. Each "docking" sequence can potentially base-pair with the "selector" sequence, thereby bringing an alternate version of exon 6 to splice to exon 5 ( Secondary structure in RNA can be identified experimentally or computationally. There are currently around a thousand publicly available structures-53% determined by X-ray crystallography and 47% bysolution NMR (Bernstein et al. 1977). Therehavebeen a great many advances in computational approaches t...
Pre-mRNA molecules can form a variety of structures, and both secondary and tertiary structures have important effects on processing, function and stability of these molecules. The prediction of RNA secondary structure is a challenging problem and various algorithms that use minimum free energy, maximum expected accuracy and comparative evolutionary based methods have been developed to predict secondary structures. However, these tools are not perfect, and this remains an active area of research. The secondary structure of pre-mRNA molecules can have an enhancing or inhibitory effect on pre-mRNA splicing. An example of enhancing structure can be found in a novel class of introns in zebrafish. About 10% of zebrafish genes contain a structured intron that forms a bridging hairpin that enforces correct splice site pairing. Negative examples of splicing include local structures around splice sites that decrease splicing efficiency and potentially cause mis-splicing leading to disease. Splicing mutations are a frequent cause of hereditary disease. The transcripts of disease genes are significantly more structured around the splice sites, and point mutations that increase the local structure often cause splicing disruptions. Post-splicing, RNA secondary structure can also affect the stability of the spliced intron and regulatory RNA interference pathway intermediates, such as pre-microRNAs. Additionally, RNA secondary structure has important roles in the innate immune defense against viruses. Finally, tertiary structure can also play a large role in pre-mRNA splicing. One example is the G-quadruplex structure, which, similar to secondary structure, can either enhance or inhibit splicing through mechanisms such as creating or obscuring RNA binding protein sites.
Substitutions that disrupt pre-mRNA splicing are a common cause of genetic disease. On average, 13.4% of all hereditary disease alleles are classified as splicing mutations mapping to the canonical 5′ and 3′ splice sites. However, splicing mutations present in exons and deeper intronic positions are vastly underreported. A recent re-analysis of coding mutations in exon 10 of the Lynch Syndrome gene, MLH1, revealed an extremely high rate (77%) of mutations that lead to defective splicing. This finding is confirmed by extending the sampling to five other exons in the MLH1 gene. Further analysis suggests a more general phenomenon of defective splicing driving Lynch Syndrome. Of the 36 mutations tested, 11 disrupted splicing. Furthermore, analyzing past reports suggest that MLH1 mutations in canonical splice sites also occupy a much higher fraction (36%) of total mutations than expected. When performing a comprehensive analysis of splicing mutations in human disease genes, we found that three main causal genes of Lynch Syndrome, MLH1, MSH2, and PMS2, belonged to a class of 86 disease genes which are enriched for splicing mutations. Other cancer genes were also enriched in the 86 susceptible genes. The enrichment of splicing mutations in hereditary cancers strongly argues for additional priority in interpreting clinical sequencing data in relation to cancer and splicing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.