“…Several splice-site effect predictors (discussed later) predict the impact of both intronic and exonic variations on splicing. The Silent Variation Analyzer (SilVA) is a method for pri-oritization of harmful synonymous variations [Buske et al, 2013]. The majority of the variations in the training data lead to a splicing defect but there are also variations that alter the methylation pattern or translational efficiency.…”
Section: Predictors For Synonymous Variationsmentioning
Next-generation sequencing methods have revolutionized the speed of generating variation information. Sequence data have a plethora of applications and will increasingly be used for disease diagnosis. Interpretation of the identified variants is usually not possible with experimental methods. This has caused a bottleneck that many computational methods aim at addressing. Fast and efficient methods for explaining the significance and mechanisms of detected variants are required for efficient precision/personalized medicine. Computational prediction methods have been developed in three areas to address the issue. There are generic tolerance (pathogenicity) predictors for filtering harmful variants. Gene/protein/disease-specific tools are available for some applications. Mechanism and effect-specific computer programs aim at explaining the consequences of variations. Here, we discuss the different types of predictors and their applications. We review available variation databases and prediction methods useful for variation interpretation. We discuss how the performance of methods is assessed and summarize existing assessment studies. A brief introduction is provided to the principles of the methods developed for variation interpretation as well as guidelines for how to choose the optimal tools and where the field is heading in the future.
“…Several splice-site effect predictors (discussed later) predict the impact of both intronic and exonic variations on splicing. The Silent Variation Analyzer (SilVA) is a method for pri-oritization of harmful synonymous variations [Buske et al, 2013]. The majority of the variations in the training data lead to a splicing defect but there are also variations that alter the methylation pattern or translational efficiency.…”
Section: Predictors For Synonymous Variationsmentioning
Next-generation sequencing methods have revolutionized the speed of generating variation information. Sequence data have a plethora of applications and will increasingly be used for disease diagnosis. Interpretation of the identified variants is usually not possible with experimental methods. This has caused a bottleneck that many computational methods aim at addressing. Fast and efficient methods for explaining the significance and mechanisms of detected variants are required for efficient precision/personalized medicine. Computational prediction methods have been developed in three areas to address the issue. There are generic tolerance (pathogenicity) predictors for filtering harmful variants. Gene/protein/disease-specific tools are available for some applications. Mechanism and effect-specific computer programs aim at explaining the consequences of variations. Here, we discuss the different types of predictors and their applications. We review available variation databases and prediction methods useful for variation interpretation. We discuss how the performance of methods is assessed and summarize existing assessment studies. A brief introduction is provided to the principles of the methods developed for variation interpretation as well as guidelines for how to choose the optimal tools and where the field is heading in the future.
“…One 17 of the challenges in bioinformatics is accurate identification of splice sites in DNA 18 sequences. The discovery of splicing has elucidated the diversity of protein production 19 and explained the increased coding potential of the genome. The DNA sequence is 20 formed of alternating introns and exons, in the first stage, the DNA sequence 21 transcribed into pre-mRNA, then, splicing process takes place by removing the 22 non-coding sequences (introns) from the pre-mRNA to form mRNA sequence.…”
The success of deep learning has been shown in various fields including computer vision, speech recognition, natural language processing and bioinformatics. The advance of Deep Learning in Computer Vision has been an important source of inspiration for other research fields. The objective of this work is to adapt known deep learning models borrowed from computer vision such as VGGNet, Resnet and AlexNet for the classification of biological sequences. In particular, we are interested by the task of splice site identification based on raw DNA sequences. We focus on the role of model architecture depth on model training and classification performance.We show that deep learning models outperform traditional classification methods (SVM, Random Forests, and Logistic Regression) for large training sets of raw DNA sequences. Three model families are analyzed in this work namely VGGNet, AlexNet and ResNet. Three depth levels are defined for each model family. The models are benchmarked using the following metrics: Area Under ROC curve (AUC), Number of model parameters, number of floating operations. Our extensive experimental evaluation show that shallow architectures have an overall better performance than deep models. We introduced a shallow version of ResNet, named S-ResNet. We show that it gives a good trade-off between model complexity and classification performance.
Author summaryDeep Learning has been widely applied to various fields in research and industry. It has 1 been also succesfully applied to genomics and in particular to splice site identification.
2We are interested in the use of advanced neural networks borrowed from computer
“…The splicing regulatory elements used in our models include ESE SR‐protein SF2/ASF from ESEfinder (Smith et al, ), ESS FAS‐hex3 hexamer from FAS‐ESS (Wang et al, ), and putative ESE and ESS pESE/pESS (Zhang, Kangsamaksin, Chao, Banerjee, & Chasin, ). These features were scored using scripts provided by SilVA program (Buske, Manickaraj, Mital, Ray, & Brudno, ). As SilVA was designed for only synonymous mutations, we slightly modified the scripts so that they can be applied to other single‐nucleotide variants (SNVs) or indels, in exons or introns.…”
Alternative splicing can be disrupted by genetic variants that are related to diseases like cancers. Discovering the influence of genetic variations on the alternative splicing will improve the understanding of the pathogenesis of variants. Here, we developed a new approach, PredPSI‐SVR to predict the impact of variants on exon skipping events by using the support vector regression. From the sequence of a particular exon and its flanking regions, 42 comprehensive features related to splicing events were extracted. By using a greedy feature selection algorithm, we found eight features contributing most to the prediction. The trained model achieved a Pearson correlation coefficient (PCC) of 0.570 in the 10‐fold cross‐validation based on the training data set provided by the “vex‐seq” challenge of the 5th Critical Assessment of Genome Interpretation. In the blind test also held by the challenge, our prediction ranked the 2nd with a PCC of 0.566 that demonstrates the robustness of our method. A further test indicated that the PredPSI‐SVR is helpful in prioritizing deleterious synonymous mutations.
The method is available on https://github.com/chenkenbio/PredPSI-SVR.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.