2017
DOI: 10.1186/s12859-017-1715-8
|View full text |Cite
|
Sign up to set email alerts
|

Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences

Abstract: Background: DNA-binding proteins perform important functions in a great number of biological activities. DNA-binding proteins can interact with ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA), and DNA-binding proteins can be categorized as single-stranded DNA-binding proteins (SSBs) and double-stranded DNA-binding proteins (DSBs). The identification of DNA-binding proteins from amino acid sequences can help to annotate protein functions and understand the binding specificity. In this study, we syste… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
8
1

Relationship

2
7

Authors

Journals

citations
Cited by 13 publications
(22 citation statements)
references
References 56 publications
0
20
0
Order By: Relevance
“…A most recent work predicted DNA-binding proteins interacting with ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA) using OAAC (overall amino acid composition) features, dipeptide compositions, PSSM (position-specific scoring matrix profiles) and split amino acid composition (SAA) [ 33 ]. Testing by SVM (support vector machine) and RF (random forest) classification model, their method can achieve the accuracy of 88.7% and AUC of 0.919.…”
Section: Discussionmentioning
confidence: 99%
“…A most recent work predicted DNA-binding proteins interacting with ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA) using OAAC (overall amino acid composition) features, dipeptide compositions, PSSM (position-specific scoring matrix profiles) and split amino acid composition (SAA) [ 33 ]. Testing by SVM (support vector machine) and RF (random forest) classification model, their method can achieve the accuracy of 88.7% and AUC of 0.919.…”
Section: Discussionmentioning
confidence: 99%
“…There are some results showing that sequence-based calculation methods are of great use to predict binding sites ( Wang et al, 2017 ; Wang et al, 2019c ). The evolutionary information of the protein sequence is encoded by the position-specific scoring matrix (PSSM).…”
Section: Methodsmentioning
confidence: 99%
“…For sequence-based feature calculation, we extracted 8833 DNA-binding proteins. Which contains 2136 DSBs and 339 SSBs obtained from the literature of Wang et al [37] And the other part is collected from UniProtKB/Swiss-Prot (www.uniprot.org). To eliminate redundancy, CD-HIT was used to remove proteins with a sequence similarity > 70% [40].…”
Section: Datasetsmentioning
confidence: 99%
“…Because the gap between available sequences and structures of DNA binding proteins in UniProtKB/Swiss-Prot (www.uniprot.org) and the PDB (www.rcsb.org/pdb/) has been growing exponentially, structure-based methods can no longer meet the needs of high-throughput research [35,36]. Subsequently, Wei Wang et al [37] developed a machine learning method (Wang, 2017) with only single sequence information such as overall amino acid composition (OAAC) features, dipeptide compositions, and position-specific scoring matrix profiles (PSSMs). The results showed an accuracy of 88.7% and an AUC (area under the curve) of 0.919 on the benchmark datasets.…”
Section: Introductionmentioning
confidence: 99%