2020
DOI: 10.1101/2020.08.24.264267
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

StackPDB: predicting DNA-binding proteins based on XGB-RFE feature optimization and stacked ensemble classifier

Abstract: DNA binding proteins (DBPs) not only play an important role in all aspects of genetic activities such as DNA replication, recombination, repair, and modification but also are used as key components of antibiotics, steroids, and anticancer drugs in the field of drug discovery. Identifying DBPs becomes one of the most challenging problems in the domain of proteomics research. Considering the high-priced and inefficient of the experimental method, constructing a detailed DBPs prediction model becomes an urgent pr… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 72 publications
0
5
0
Order By: Relevance
“…Performance Comparison with Existing Predictors. In previous studies, some researchers predicted interaction sites only from a single feature and method of constructing the single feature space rather than the ways used in our experiments [38,39]. To further evaluate the effectiveness of the prediction method in this work, three additional experiments are implemented to predict interaction sites by utilizing the methods of Wang et al's [8], Nguyen and Rajapakse's [27], Ofran and Rost's [6] studies and the present study.…”
Section: Performance Evaluation With Different Parametersmentioning
confidence: 95%
“…Performance Comparison with Existing Predictors. In previous studies, some researchers predicted interaction sites only from a single feature and method of constructing the single feature space rather than the ways used in our experiments [38,39]. To further evaluate the effectiveness of the prediction method in this work, three additional experiments are implemented to predict interaction sites by utilizing the methods of Wang et al's [8], Nguyen and Rajapakse's [27], Ofran and Rost's [6] studies and the present study.…”
Section: Performance Evaluation With Different Parametersmentioning
confidence: 95%
“…The stacking ensemble classifier is an ensemble method that uses the prediction results of multiple classifiers as new features for retraining. By integrating information from multiple prediction models, the stacking ensemble classifier can achieve the purpose of minimizing the generalization error and obtain better prediction performance than the single classifier [ 28 30 ]. In this study, we build BERT-m7G based on the stacking ensemble classifier to identify m7G sites.…”
Section: Methodsmentioning
confidence: 99%
“…Second, the feature extraction of these methods is complex and computationally expensive, e.g., the generation of PSSM requires PSI-BLAST [28] search in a huge protein database, the protein secondary structure and relative solvent accessibility information requires extra 2/9 prediction in advance by using specific software, e.g., SSPro, ACCPro [29]. Third, the fusion of multiple heterogeneous features may bring redundancy and noise which reduce the efficiency of the model [30]. Fourth, for the methods that use the 3D structure information, it is only applicable when the high-resolution 3D structure of protein is available, however, there only exists a small portion of samples that have 3D structure data, which will limit their further application [27].…”
Section: /9mentioning
confidence: 99%