2007
DOI: 10.1110/ps.062523907
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation of features for catalytic residue prediction in novel folds

Abstract: Structural genomics projects are determining the three-dimensional structure of proteins without full characterization of their function. A critical part of the annotation process involves appropriate knowledge representation and prediction of functionally important residue environments. We have developed a method to extract features from sequence, sequence alignments, three-dimensional structure, and structural environment conservation, and used support vector machines to annotate homologous and nonhomologous… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

6
96
1

Year Published

2008
2008
2017
2017

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 66 publications
(107 citation statements)
references
References 34 publications
6
96
1
Order By: Relevance
“…For the benchmark data set, an MCC of 0.23 was reported (compared with an MCC of 0.31 for THEMATICS-SVM). Youn et al (2007) used SVM with sequence alignments, 3D structure properties, and structural environment conservation; these authors also reported a 57% residue recall rate and an 18.5% precision with the ASTRAL 40 database (compared with a 61% recall and 20% precision for THEMATICS-SVM). THEMATICS-SVM performance, achieved with the 3D structure of the query protein alone, is therefore of similar quality to that of methods that take advantage of similarities in sequence and structure, if quality is measured by performance on sets of well-characterized enzymes that tend to have many sequence homologues.…”
Section: Some Specific Examplesmentioning
confidence: 99%
See 1 more Smart Citation
“…For the benchmark data set, an MCC of 0.23 was reported (compared with an MCC of 0.31 for THEMATICS-SVM). Youn et al (2007) used SVM with sequence alignments, 3D structure properties, and structural environment conservation; these authors also reported a 57% residue recall rate and an 18.5% precision with the ASTRAL 40 database (compared with a 61% recall and 20% precision for THEMATICS-SVM). THEMATICS-SVM performance, achieved with the 3D structure of the query protein alone, is therefore of similar quality to that of methods that take advantage of similarities in sequence and structure, if quality is measured by performance on sets of well-characterized enzymes that tend to have many sequence homologues.…”
Section: Some Specific Examplesmentioning
confidence: 99%
“…Other methods use structural relationships in conjunction with sequence analysis (de Rinaldis et al 1998;Aloy et al 2001;Carter et al 2001;Landgraf et al 2001;Gutteridge et al 2003;Ota et al 2003;Innis et al 2004;Meng et al 2004;Petrova and Wu 2006;Youn et al 2007). …”
mentioning
confidence: 99%
“…Our method significantly outperforms existing methods for functional site prediction both in terms of precision and recall (Table 1). In comparison with the existing ML methods for functional site prediction, we obtained 18.78 percentage point improvement over relational classifier and CRF used by Sankararaman et al [30], 39 percentage point over neural network based predictor [11] and 31 percentage point improvement over SVM based predictor [40] at 18% precision. The recall obtained by our method is 70 percentage point more than the methods using sequence conservation features alone [31,3,21].…”
Section: Performance Of Our Methodsmentioning
confidence: 79%
“…These methods classify residues into functional and non-functional classes based on their structural, physicochemical, evolutionary and electrostatic features. The following are some of the examples of ML techniques used for addressing the problem: Wei et al [37] used naïve Bayes classifier, Gutteridge et al [11] used neural network, Youn et al [40] used support vector machines (SVMs) and Sankararaman et al [30] used L1 logistic regression classifier. Among all the existing methods, the L1 logistic regression classifier achieves the best performance on standard benchmark datasets.…”
Section: Introductionmentioning
confidence: 99%
“…In the prediction PTM sites, this speci ic problem is particularly prominent due to the sequence diversity. For instance, some motifs are very weak and some are not available without the sequence evolutionary information [30][31][32][33][34][35]. To address this Figure 1: A brief fl owchart template for computational prediction PTM sites.…”
Section: Feature Representationmentioning
confidence: 99%