2017
DOI: 10.1101/185868
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A sequence-based, deep learning model accurately predicts RNA splicing branchpoints

Abstract: Experimental detection of RNA splicing branchpoints, the nucleotide serving as the nucleophile in the first catalytic step of splicing, is difficult. To date, annotations exist for only 16-21% of 3' splice sites in the human genome and even these limited annotations have been shown to be plagued by noise. We develop a sequence-only, deep learning based branchpoint predictor, LaBranchoR, which we conclude predicts a correct branchpoint for over 90% of 3' splice sites genome-wide. Our predicted branchpoints show… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

4
46
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 23 publications
(50 citation statements)
references
References 23 publications
4
46
0
Order By: Relevance
“…BP prediction tools (Table 1) have demonstrated poor specificity due to BP motif degeneracy combined with a lack of experimental data to train algorithms (Corvelo, Hallegger, Smith, & Eyras, 2010). BP characterization has lagged far behind that of 5′ and 3′ splice sites because of experimental difficulties in detecting BPs (Paggi & Bejerano, 2018). A large genome‐wide data set of experimentally confirmed BPs (Mercer et al, 2015) has been used to develop the BP prediction tools Branchpointer and LaBranchoR.…”
Section: Bioinformatic Prediction Of Bp Site Abrogationmentioning
confidence: 99%
“…BP prediction tools (Table 1) have demonstrated poor specificity due to BP motif degeneracy combined with a lack of experimental data to train algorithms (Corvelo, Hallegger, Smith, & Eyras, 2010). BP characterization has lagged far behind that of 5′ and 3′ splice sites because of experimental difficulties in detecting BPs (Paggi & Bejerano, 2018). A large genome‐wide data set of experimentally confirmed BPs (Mercer et al, 2015) has been used to develop the BP prediction tools Branchpointer and LaBranchoR.…”
Section: Bioinformatic Prediction Of Bp Site Abrogationmentioning
confidence: 99%
“…The deep learning tools, LaBranchoR and RNABS showed the maximum number of common predicted BPs from Ensembl (28.63 %) and from RNA-seq (33.57 %) data. Indeed, these two tools are both based on the same deep learning approach (bidirectional long short-term memory) and used the same sequence length (70 nt) as input [20,21]. By comparison, RNABPS employed a dilated convolution model explaining and showed an improvement of prediction compared to LaBranchoR (73.06 % against 64.77% of accuracy) using the Ensembl data (Table 3).…”
Section: Discussionmentioning
confidence: 99%
“…This collection of BPs was extended by two further studies: the first used 1.31 trillion reads from 17,164 RNA-seq data sets [16], and the second identified BPs by the spliceosome iCLIP method [17]. Thus, several bioinformatics tools for BP prediction have recently emerged: Branch Point Prediction (BPP) [18], Branchpointer [19], LaBranchoR [20] and RNA Branch Point Selection (RNABPS) [21] (Table 1). Briefly, HSF uses a position weighted matrix approach with a 7-mer motif as a reference (5 nt upstream and 1 nt downstream of the branch point A) (Figure 1).…”
Section: Introductionmentioning
confidence: 99%
“…We also included a feature to capture the change in 3-mer content induced by a variant 53 . Additionally, we included region-specific features, such as a branchpoint disruption term for the 3' intronic region 42 and a 5' cryptic splice site creation term for the 5' intronic region (see Online Methods for a complete description of features).…”
Section: S-cap Featuresmentioning
confidence: 99%
“…The 3' intronic S-CAP model includes a branchpoint feature modeled using LaBranchoR, a bi-directional LSTM (long short-term memory) model trained on the genome sequence surrounding experimentally validated branchpoint sites 42 . Specifically, we used the in silico mutagenesis scores available online (see URLs) as a feature.…”
Section: Region Specific Featuresmentioning
confidence: 99%