2019
DOI: 10.3390/molecules25010098
|View full text |Cite
|
Sign up to set email alerts
|

PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction

Abstract: Interactions between proteins and DNAs play essential roles in many biological processes. DNA binding proteins can be classified into two categories. Double-stranded DNA-binding proteins (DSBs) bind to double-stranded DNA and are involved in a series of cell functions such as gene expression and regulation. Single-stranded DNA-binding proteins (SSBs) are necessary for DNA replication, recombination, and repair and are responsible for binding to the single-stranded DNA. Therefore, the effective classification o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 10 publications
(15 citation statements)
references
References 71 publications
0
15
0
Order By: Relevance
“…The binary classification assembled model generated using the methodology proposed in this work had 84.19% accuracy and 84.96% precision, achieving a significant improvement with respect to the models previously developed by Rahman et al (2018), Wei et al (2017), andAdilina et al (2019) with 77.42%, 79.00%, and 82.26% accuracy, respectively (See Table 3 for more details). However, it was not possible to obtain the best performance compared to the previously reported methods since Tan et al (2020) achieved an accuracy of 91.20%. Despite this, our method is practically the second-best, being a significant achievement for a generic strategy, applied to a specific and highly complex problem such as the DNA-Binding proteins classification, demonstrating the advantages of the combination of digital signal processing and assembled strategies for the development of predictive models.…”
Section: Case Study: Dna-binding Proteins (Dbp)mentioning
confidence: 83%
“…The binary classification assembled model generated using the methodology proposed in this work had 84.19% accuracy and 84.96% precision, achieving a significant improvement with respect to the models previously developed by Rahman et al (2018), Wei et al (2017), andAdilina et al (2019) with 77.42%, 79.00%, and 82.26% accuracy, respectively (See Table 3 for more details). However, it was not possible to obtain the best performance compared to the previously reported methods since Tan et al (2020) achieved an accuracy of 91.20%. Despite this, our method is practically the second-best, being a significant achievement for a generic strategy, applied to a specific and highly complex problem such as the DNA-Binding proteins classification, demonstrating the advantages of the combination of digital signal processing and assembled strategies for the development of predictive models.…”
Section: Case Study: Dna-binding Proteins (Dbp)mentioning
confidence: 83%
“…Compared to the large number of publications on prediction of DNA binding proteins, the investigation on ssDNA binding protein prediction is limited so far. To our knowledge, currently there are only four published studies on SSB prediction using machine learning-based approaches [ 95 , 96 , 97 , 98 ]. These methods typically consist of four major steps as shown in Figure 3 : (1) dataset generation for training and testing; (2) features for learning and prediction; (3) classification models; and (4) performance evaluation.…”
Section: Machine Learning-based Methods For Ssb Predictionmentioning
confidence: 99%
“…DNABPs identification is considered a major challenge of genome annotation because they have several linked cellular functions. The identification process may include: identifying the DNABPs (positive sample) from the non-DNABPs (negative sample) [1], identifying the singlestranded DNABPs from the double-stranded DNABPs [2], or identifying the DNABPs from the Ribonucleic acid-binding proteins (RNABPs) [3][4][5]. In this paper, the identification process is formulated as a binary classification problem to identify DNABPs and non-DNABPs.…”
Section: Introductionmentioning
confidence: 99%