2016
DOI: 10.1186/s13040-016-0086-4
|View full text |Cite
|
Sign up to set email alerts
|

Prediction of donor splice sites using random forest with a new sequence encoding approach

Abstract: BackgroundDetection of splice sites plays a key role for predicting the gene structure and thus development of efficient analytical methods for splice site prediction is vital. This paper presents a novel sequence encoding approach based on the adjacent di-nucleotide dependencies in which the donor splice site motifs are encoded into numeric vectors. The encoded vectors are then used as input in Random Forest (RF), Support Vector Machines (SVM) and Artificial Neural Network (ANN), Bagging, Boosting, Logistic r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 40 publications
(28 citation statements)
references
References 33 publications
0
28
0
Order By: Relevance
“…The ML based techniques can predict non-canonical sites as well by appropriate training. Different ML approaches has been used such as Support Vector Machines (SVM) [16,17,18,19] Random Forest (RF) [20], Decision Trees (DT) [21], Naïve Bayesian (NB) [22], Markov Model [23] and AdaBoost [24] to identify splice or non-splice sites. Among them SVM models have been used very often due to their capability to handle high-dimensional datasets.…”
Section: Splice Site Recognition Problemmentioning
confidence: 99%
“…The ML based techniques can predict non-canonical sites as well by appropriate training. Different ML approaches has been used such as Support Vector Machines (SVM) [16,17,18,19] Random Forest (RF) [20], Decision Trees (DT) [21], Naïve Bayesian (NB) [22], Markov Model [23] and AdaBoost [24] to identify splice or non-splice sites. Among them SVM models have been used very often due to their capability to handle high-dimensional datasets.…”
Section: Splice Site Recognition Problemmentioning
confidence: 99%
“…Due to complex dependencies existing among the bases around splice sites, none of the frequently used programs perfectly predict the impact on pre-mRNA splicing. Learning algorithms such as AdaBoost (Pashaei, Yilmaz, Ozen, & Aydin, 2016) and Random Forest (Meher, Sahu, & Rao, 2016) are now emerging and need to be validated for their use in research and clinical practice.…”
Section: Introductionmentioning
confidence: 99%
“…Matrix metalloproteinase (MMP) are key factors for the degradation of extracellular matrix components and modification of cytokines, protease inhibitors, and cell surface signaling systems [103][104][105][106]. Polymorphisms on the MMP-9 promoter can affect the development of visceral involvement in Korean people with BD [107].…”
Section: Cell Adhesion and Signal Transduction Related Genesmentioning
confidence: 99%
“…The independent test dataset was frequently constructed to evaluate the performances of protein function predictors in recent years [99][100][101][102][103][104]. To construct a valid set of data for building the predictor of each family, the datasets of the training, testing and independent test were generated by a strictly defined process after the data collection described in Section 2.1.…”
Section: Construction Of the Training And Testing Datasetsmentioning
confidence: 99%