2015
DOI: 10.1371/journal.pone.0119721
|View full text |Cite
|
Sign up to set email alerts
|

Assessing the Effects of Data Selection and Representation on the Development of Reliable E. coli Sigma 70 Promoter Region Predictors

Abstract: As the number of sequenced bacterial genomes increases, the need for rapid and reliable tools for the annotation of functional elements (e.g., transcriptional regulatory elements) becomes more desirable. Promoters are the key regulatory elements, which recruit the transcriptional machinery through binding to a variety of regulatory proteins (known as sigma factors). The identification of the promoter regions is very challenging because these regions do not adhere to specific sequence patterns or motifs and are… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
5
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
1
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 64 publications
(93 reference statements)
1
5
0
Order By: Relevance
“…As can be seen from the results above, new tools have emerged with enhanced performance compared to widely used ones. Although the best performing tool uses just sequence-based features (a result that corroborates with Abbas et al [ 42 ]), in general, algorithms using feature extraction that combines attributes derived from sequence together with physicochemical properties of DNA achieved better results. It is also clear from our results that choosing the appropriate control (or negative) data set to construct these algorithms is crucial to avoid false-positive results.…”
Section: Resultssupporting
confidence: 77%
“…As can be seen from the results above, new tools have emerged with enhanced performance compared to widely used ones. Although the best performing tool uses just sequence-based features (a result that corroborates with Abbas et al [ 42 ]), in general, algorithms using feature extraction that combines attributes derived from sequence together with physicochemical properties of DNA achieved better results. It is also clear from our results that choosing the appropriate control (or negative) data set to construct these algorithms is crucial to avoid false-positive results.…”
Section: Resultssupporting
confidence: 77%
“…The average accuracy of the training and test data were obtained after 10 operations. Tables 1 and 2 (Model 2–11) show that the average accuracy of the training and test data are 100% and 100%, respectively, which indicates that the model has good classification ability 26 , 27 .…”
Section: Resultsmentioning
confidence: 97%
“…where True positive is the protein complexes predicted as complexes, False Positive is the non-protein complexes predicted as complexes, False Negative is protein complexes predicted as non-protein complexes [40]. In the statistical prediction, independent data set test, K-fold cross validation test, and jackknife cross-validation are usually used to assess the prediction capability of the model.…”
Section: B Performance Measuresmentioning
confidence: 99%