Specific short oligonucleotide sequences that enhance pre-mRNA splicing when present in exons, termed exonic splicing enhancers (ESEs), play important roles in constitutive and alternative splicing. A computational method, RESCUE-ESE, was developed that predicts which sequences have ESE activity by statistical analysis of exon-intron and splice site composition. When large data sets of human gene sequences were used, this method identified 10 predicted ESE motifs. Representatives of all 10 motifs were found to display enhancer activity in vivo, whereas point mutants of these sequences exhibited sharply reduced activity. The motifs identified enable prediction of the splicing phenotypes of exonic mutations in human genes.
A typical gene contains two levels of information: a sequence that encodes a particular protein and a host of other signals that are necessary for the correct expression of the transcript. While much attention has been focused on the effects of sequence variation on the amino acid sequence, variations that disrupt gene processing signals can dramatically impact gene function. A variation that disrupts an exonic splicing enhancer (ESE), for example, could cause exon skipping which would result in the exclusion of an entire exon from the mRNA transcript. RESCUE-ESE, a computational approach used in conjunction with experimental validation, previously identified 238 candidate ESE hexamers in human genes. The RESCUE-ESE method has recently been implemented in three additional species: mouse, zebrafish and pufferfish. Here we describe an online ESE analysis tool (http://genes.mit.edu/burgelab/rescue-ese/) that annotates RESCUE-ESE hexamers in vertebrate exons and can be used to predict splicing phenotypes by identifying sequence changes that disrupt or alter predicted ESEs.
The lack of tools to identify causative variants from sequencing data greatly limits the promise of precision medicine. Previous studies suggest that one-third of disease-associated alleles alter splicing. We discovered that the alleles causing splicing defects cluster in disease-associated genes (for example, haploinsufficient genes). We analyzed 4,964 published disease-causing exonic mutations using a massively parallel splicing assay (MaPSy), which showed an 81% concordance rate with splicing in patient tissue. Approximately 10% of exonic mutations altered splicing, mostly by disrupting multiple stages of spliceosome assembly. We present a large-scale characterization of exonic splicing mutations using a new technology that facilitates variant classification and keeps pace with variant discovery.
We present an intuitive strategy for predicting the effect of sequence variation on splicing. In contrast to transcriptional elements, splicing elements appear to be strongly position dependent. We demonstrated that exonic binding of the normally intronic splicing factor, U2AF65, inhibits splicing. Reasoning that the positional distribution of a splicing element is a signature of its function, we developed a method for organizing all possible sequence motifs into clusters based on the genomic profile of their positional distribution around splice sites. Binding sites for serine/arginine rich (SR) proteins tended to be exonic whereas heterogeneous ribonucleoprotein (hnRNP) recognition elements were mostly intronic. In addition to the known elements, novel motifs were returned and validated. This method was also predictive of splicing mutations. A mutation in a motif creates a new motif that sometimes has a similar distribution shape to the original motif and sometimes has a different distribution. We created an intraallelic distance measure to capture this property and found that mutations that created large intraallelic distances disrupted splicing in vivo whereas mutations with small distances did not alter splicing. Analyzing the dataset of human disease alleles revealed known splicing mutants to have high intraallelic distances and suggested that 22% of disease alleles that were originally classified as missense mutations may also affect splicing. This category together with mutations in the canonical splicing signals suggest that approximately one third of all disease-causing mutations alter pre-mRNA splicing. S plicing is catalyzed by the spliceosome, a riboprotein complex that rivals the ribosome in size and complexity. The ribosome has a large and small subunit whose assembly on the mRNA substrate corresponds to a functional switch from initiation to elongation. The spliceosome is composed of five subunits that appear to exist in at least four different stable configurations and, like the ribosomal subunits, transition between different assembled states corresponding to different stages of function (1-3). Mass spectroscopy has identified at least 300 RNA and protein components in this catalytic complex and studies have demonstrated heterogeneity in spliceosomal complexes isolated from different splicing substrates (4-6). The spliceosomal components that recognize the basic cis-elements of the splicing process are known. How the spliceosome assembles and reorganizes on these elements is also fairly well understood. However, several computational analyses estimate that these basic splicing elements contain at most half the information necessary for splice site recognition (7,8). The remaining information lies outside these splice sites presumably as enhancers or silencers.This information required to specify splicing presents a considerable mutational target-estimates of the fraction of disease mutations that affect splicing range from 15% (9) to 62% (10). Transcript analysis of genotyped cell lines has dis...
Predicting the effects of genetic variants on splicing is highly relevant for human genetics. We describe the framework MMSplice (modular modeling of splicing) with which we built the winning model of the CAGI5 exon skipping prediction challenge. The MMSplice modules are neural networks scoring exon, intron, and splice sites, trained on distinct large-scale genomics datasets. These modules are combined to predict effects of variants on exon skipping, splice site choice, splicing efficiency, and pathogenicity, with matched or higher performance than state-of-the-art. Our models, available in the repository Kipoi, apply to variants including indels directly from VCF files. Electronic supplementary material The online version of this article (10.1186/s13059-019-1653-z) contains supplementary material, which is available to authorized users.
Because deleterious alleles arising from mutation are filtered by natural selection, mutations that create such alleles will be underrepresented in the set of common genetic variation existing in a population at any given time. Here, we describe an approach based on this idea called VERIFY (variant elimination reinforces functionality), which can be used to assess the extent of natural selection acting on an oligonucleotide motif or set of motifs predicted to have biological activity. As an application of this approach, we analyzed a set of 238 hexanucleotides previously predicted to have exonic splicing enhancer (ESE) activity in human exons using the relative enhancer and silencer classification by unanimous enrichment (RESCUE)-ESE method. Aligning the single nucleotide polymorphisms (SNPs) from the public human SNP database to the chimpanzee genome allowed inference of the direction of the mutations that created present-day SNPs. Analyzing the set of SNPs that overlap RESCUE-ESE hexamers, we conclude that nearly one-fifth of the mutations that disrupt predicted ESEs have been eliminated by natural selection (odds ratio = 0.82 ± 0.05). This selection is strongest for the predicted ESEs that are located near splice sites. Our results demonstrate a novel approach for quantifying the extent of natural selection acting on candidate functional motifs and also suggest certain features of mutations/SNPs, such as proximity to the splice site and disruption or alteration of predicted ESEs, that should be useful in identifying variants that might cause a biological phenotype.
Objective: To identify causative genes for centronuclear myopathies (CNM), a heterogeneous group of rare inherited muscle disorders that often present in infancy or early life with weakness and hypotonia, using next-generation sequencing of whole exomes and genomes.Methods: Whole-exome or -genome sequencing was performed in a cohort of 29 unrelated patients with clinicopathologic diagnoses of CNM or related myopathy depleted for cases with mutations of MTM1, DNM2, and BIN1. Immunofluorescence analyses on muscle biopsies, splicing assays, and gel electrophoresis of patient muscle proteins were performed to determine the molecular consequences of mutations of interest.Results: Autosomal recessive compound heterozygous truncating mutations of the titin gene, TTN, were identified in 5 individuals. Biochemical analyses demonstrated increased titin degradation and truncated titin proteins in patient muscles, establishing the impact of the mutations.Conclusions: Our study identifies truncating TTN mutations as a cause of congenital myopathy that is reported as CNM. Unlike the classic CNM genes that are all involved in excitation-contraction coupling at the triad, TTN encodes the giant sarcomeric protein titin, which forms a myofibrillar backbone for the components of the contractile machinery. This study expands the phenotypic spectrum associated with TTN mutations and indicates that TTN mutation analysis should be considered in cases of possible CNM without mutations in the classic CNM genes.
The coding sequence of each human pre-mRNA is interrupted, on average, by 11 introns that must be spliced out for proper gene expression. Each intron contains three obligate signals: a 5' splice site, a branch site, and a 3' splice site. Splice site usage has been mapped exhaustively across different species, cell types, and cellular states. In contrast, only a small fraction of branch sites have been identified even once. The few reported annotations of branch site are imprecise as reverse transcriptase skips several nucleotides while traversing a 2-5 linkage. Here, we report large-scale mapping of the branchpoints from deep sequencing data in three different species and in the SF3B1 K700E oncogenic mutant background. We have developed a novel method whereby raw lariat reads are refined by U2snRNP/pre-mRNA base-pairing models to return the largest current data set of branchpoint sequences with quality metrics. This analysis discovers novel modes of U2snRNA:pre-mRNA base-pairing conserved in yeast and provides insight into the biogenesis of intron circles. Finally, matching branch site usage with isoform selection across the extensive panel of ENCODE RNA-seq data sets offers insight into the mechanisms by which branchpoint usage drives alternative splicing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.