PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction

Tan, Changgeng; Wang, Tong; Yang, Wenyi; Deng, Lei

doi:10.3390/molecules25010098

Cited by 10 publications

(15 citation statements)

References 71 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The binary classification assembled model generated using the methodology proposed in this work had 84.19% accuracy and 84.96% precision, achieving a significant improvement with respect to the models previously developed by Rahman et al (2018), Wei et al (2017), andAdilina et al (2019) with 77.42%, 79.00%, and 82.26% accuracy, respectively (See Table 3 for more details). However, it was not possible to obtain the best performance compared to the previously reported methods since Tan et al (2020) achieved an accuracy of 91.20%. Despite this, our method is practically the second-best, being a significant achievement for a generic strategy, applied to a specific and highly complex problem such as the DNA-Binding proteins classification, demonstrating the advantages of the combination of digital signal processing and assembled strategies for the development of predictive models.…”

Section: Case Study: Dna-binding Proteins (Dbp)mentioning

confidence: 83%

Combination of digital signal processing and assembled predictive models facilitates the rational design of proteins

Medina-Ortiz,

Contreras,

Amado-Hinojosa

et al. 2020

Preprint

View full text Add to dashboard Cite

Predicting the effect of mutations in proteins is one of the most critical challenges in protein engineering; by knowing the effect a substitution of one (or several) residues in the protein's sequence has on its overall properties, could design a variant with a desirable function. New strategies and methodologies to create predictive models are continually being developed. However, those that claim to be general often do not reach adequate performance, and those that aim to a particular task improve their predictive performance at the cost of the method's generality. Moreover, these approaches typically require a particular decision to encode the amino acidic sequence, without an explicit methodological agreement in such endeavor. To address these issues, in this work, we applied clustering, embedding, and dimensionality reduction techniques to the AAIndex database to select meaningful combinations of physicochemical properties for the encoding stage. We then used the chosen set of properties to obtain several encodings of the same sequence, to subsequently apply the Fast Fourier Transform (FFT) on them. We perform an exploratory stage of Machine-Learning models in the frequency space, using different algorithms and hyperparameters. Finally, we select the best performing predictive models in each set of properties and create an assembled model. We extensively tested the proposed methodology on different datasets and demonstrated that the generated assembled model achieved notably better performance metrics than those models based on a single encoding and, in most cases, better than those previously reported. The proposed method is available as a Python library for non-commercial use under the GNU General Public License (GPLv3) license.K eywords Protein Engineering -predictive models • Protein Engineering -rational design • machinelearning algorithms • digital signal processing • assembled models

show abstract

Section: Case Study: Dna-binding Proteins (Dbp)mentioning

confidence: 83%

Combination of digital signal processing and assembled predictive models facilitates the rational design of proteins

Medina-Ortiz,

Contreras,

Amado-Hinojosa

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Compared to the large number of publications on prediction of DNA binding proteins, the investigation on ssDNA binding protein prediction is limited so far. To our knowledge, currently there are only four published studies on SSB prediction using machine learning-based approaches [ 95 , 96 , 97 , 98 ]. These methods typically consist of four major steps as shown in Figure 3 : (1) dataset generation for training and testing; (2) features for learning and prediction; (3) classification models; and (4) performance evaluation.…”

Section: Machine Learning-based Methods For Ssb Predictionmentioning

confidence: 99%

Single-Stranded DNA Binding Proteins and Their Identification Using Machine Learning-Based Approaches

Guo

Malik

2022

Biomolecules

View full text Add to dashboard Cite

Single-stranded DNA (ssDNA) binding proteins (SSBs) are critical in maintaining genome stability by protecting the transient existence of ssDNA from damage during essential biological processes, such as DNA replication and gene transcription. The single-stranded region of telomeres also requires protection by ssDNA binding proteins from being attacked in case it is wrongly recognized as an anomaly. In addition to their critical roles in genome stability and integrity, it has been demonstrated that ssDNA and SSB–ssDNA interactions play critical roles in transcriptional regulation in all three domains of life and viruses. In this review, we present our current knowledge of the structure and function of SSBs and the structural features for SSB binding specificity. We then discuss the machine learning-based approaches that have been developed for the prediction of SSBs from double-stranded DNA (dsDNA) binding proteins (DSBs).

show abstract

“…DNABPs identification is considered a major challenge of genome annotation because they have several linked cellular functions. The identification process may include: identifying the DNABPs (positive sample) from the non-DNABPs (negative sample) [1], identifying the singlestranded DNABPs from the double-stranded DNABPs [2], or identifying the DNABPs from the Ribonucleic acid-binding proteins (RNABPs) [3][4][5]. In this paper, the identification process is formulated as a binary classification problem to identify DNABPs and non-DNABPs.…”

Section: Introductionmentioning

confidence: 99%

DTLM-DBP: Deep Transfer Learning Models for DNA Binding Proteins Identification

Saber¹,

Khairuddin²,

Yusof³

et al. 2021

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

The identification of DNA binding proteins (DNABPs) is considered a major challenge in genome annotation because they are linked to several important applied and research applications of cellular functions e.g., in the study of the biological, biophysical, and biochemical effects of antibiotics, drugs, and steroids on DNA. This paper presents an efficient approach for DNABPs identification based on deep transfer learning, named "DTLM-DBP." Two transfer learning methods are used in the identification process. The first is based on the pre-trained deep learning model as a feature's extractor and classifier. Two different pre-trained Convolutional Neural Networks (CNN), AlexNet 8 and VGG 16, are tested and compared. The second method uses the deep learning model as a feature's extractor only and two different classifiers for the identification process. Two classifiers, Support Vector Machine (SVM) and Random Forest (RF), are tested and compared. The proposed approach is tested using different DNA proteins datasets. The performance of the identification process is evaluated in terms of identification accuracy, sensitivity, specificity and MCC, with four available DNA proteins datasets: PDB1075, PDB186, PDNA-543, and PDNA-316. The results show that the RF classifier, with VGG-Net pre-trained deep transfer learning features, gives the highest performance. DTLM-DBP was compared with other published methods and it provides a considerable improvement in the performance of DNABPs identification.

show abstract

PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction

Cited by 10 publications

References 71 publications

Combination of digital signal processing and assembled predictive models facilitates the rational design of proteins

Combination of digital signal processing and assembled predictive models facilitates the rational design of proteins

Single-Stranded DNA Binding Proteins and Their Identification Using Machine Learning-Based Approaches

DTLM-DBP: Deep Transfer Learning Models for DNA Binding Proteins Identification

Contact Info

Product

Resources

About