2015
DOI: 10.1107/s1600576715018531
|View full text |Cite
|
Sign up to set email alerts
|

Logistic regression models to predict solvent accessible residues using sequence- and homology-based qualitative and quantitative descriptors applied to a domain-complete X-ray structure learning set

Abstract: A working example of relative solvent accessibility (RSA) prediction for proteins is presented. Novel logistic regression models with various qualitative descriptors that include amino acid type and quantitative descriptors that include 20-and six-term sequence entropy have been built and validated. A domain-complete learning set of over 1300 proteins is used to fit initial models with various sequence homology descriptors as well as query residue qualitative descriptors. Homology descriptors are derived from … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 56 publications
0
7
0
Order By: Relevance
“…For simplicity, this problem is usually structured as a binary classification task in which Bioinformatics methods aim at predicting the exposed/buried state of each residue in the protein sequence. Residues with more than 25% of relative exposure to solvent belongs to the first class and all the others to the second 4 . For more details about this task, see 4,5 .…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…For simplicity, this problem is usually structured as a binary classification task in which Bioinformatics methods aim at predicting the exposed/buried state of each residue in the protein sequence. Residues with more than 25% of relative exposure to solvent belongs to the first class and all the others to the second 4 . For more details about this task, see 4,5 .…”
Section: Resultsmentioning
confidence: 99%
“…Residues with more than 25% of relative exposure to solvent belongs to the first class and all the others to the second 4 . For more details about this task, see 4,5 .…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Fan et al [ 22 ] used gradient boosted regression trees to predict rASA and achieved a state-of-the-art performance, which is 9.4% MAE and 0.73 Pearson’s correlation coefficient (PCC) on the CB502 dataset. Another benchmark dataset Manesh215 [ 25 ] is also widely used by researchers [ 10 , 13 , 14 , 22 , 24 , 26 ] to validate prediction methods. Table 1 summarizes the recent developments in predicting the values of rASA.…”
Section: Introductionmentioning
confidence: 99%
“…Applying the aforementioned descriptors, our binary event describes whether or not a protein chain residue in question has switch-like characteristics. Using logistic regression with proteins is not new, and was done previously by the Lustig group predicting from sequence which R groups were on the inside of a folded protein [17] as well as residues of one protein interacting with residues on another.…”
Section: Applying Logistic Regressionmentioning
confidence: 99%