2014
DOI: 10.2174/1574893608999140109121721
|View full text |Cite
|
Sign up to set email alerts
|

Hybrid Approach Using SVM and MM2 in Splice Site Junction Identification

Abstract: Prediction of coding region from genomic DNA sequence is the foremost step in the quest of gene identification. In the eukaryotic organism, the gene structure consists of promoter, intron, start codon, exon and stop codon, etc. In the prediction of splice site, which is the separation between exons and introns, the accuracy is lower than 90% even when the sequences adjacent to the splice sites have a high conservation. Therefore, the algorithms used in the splice sites identification must be improved in order … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
7
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
10

Relationship

1
9

Authors

Journals

citations
Cited by 19 publications
(7 citation statements)
references
References 55 publications
(52 reference statements)
0
7
0
Order By: Relevance
“…Most of them exploit machine learning (ML) algorithms and use several features to describe SS, covering the consensus motifs or other nucleotides in proximity to the SS [29]. The most widely used ML algorithms include Support Vector Machines [30][31][32], Markov models [33,34], Random Forest [35,36] and Bayesian networks [37]. However, these methods are limited by the lack of knowledge about the input sequence (patterns, secondary structures, etc.…”
mentioning
confidence: 99%
“…Most of them exploit machine learning (ML) algorithms and use several features to describe SS, covering the consensus motifs or other nucleotides in proximity to the SS [29]. The most widely used ML algorithms include Support Vector Machines [30][31][32], Markov models [33,34], Random Forest [35,36] and Bayesian networks [37]. However, these methods are limited by the lack of knowledge about the input sequence (patterns, secondary structures, etc.…”
mentioning
confidence: 99%
“…29 global and intrinsic folding features were extracted from secondary structures of real/pseudo pre-miRNAs defined in miPred. These features include the following: (i) %G + C content and 16 dinucleotide frequencies defined as %XY, where X, Y in {A, C, G, U}; (ii) adjusted base pairing propensity denoted as dP [ 20 ]; (iii) the MFE of folding denoted as dG [ 21 ]; (iv) the adjusted base pair distance denoted as dD [ 22 ]; (v) the adjusted Shannon entropy denoted as dQ [ 23 ]; (vi) the MFE index denoted as MFEI1 and MFEI2 [ 24 ], a topological descriptor of the degree of compactness denoted as dF; and (vii) 5 normalized variants of dP, dG, dQ, dD and dF denoted as zP, zG, zQ, zD and zF, respectively [ 25 ].…”
Section: Methodsmentioning
confidence: 99%
“…Although relatively high accuracy has been achieved with the methods currently available (e.g., the accuracy for most donor splice site prediction based on the HS 3 D dataset has exceeded 90% [6, 10, 12, 13, 19, 24, 31]), further study is still necessary due to the following factors: 1) Determining a suitable window size prior to the application of any prediction method is essential [32]. Overly long window size may introduce some irrelevant features that would reduce predictive accuracy, and may take more computational time and memory space.…”
Section: Introductionmentioning
confidence: 99%