Prediction of protein surface accessibility with information theory

Naderi-Manesh, Hossein; Sadeghi, Mehdi; Arab, Seyed Shahriar; Moosavi-Movahedi, Ali Akbar

doi:10.1002/1097-0134(20010301)42:4<452::aid-prot40>3.0.co;2-q

Cited by 125 publications

(109 citation statements)

References 47 publications

Supporting

Mentioning

106

Contrasting

Order By: Relevance

“…Competitors use very different approaches to predict the exposure state of residues. For each such approach we selected the best performing tool: [6] for the Information Theory (IT) approach, [8] for Probability Profiles (PP), SARpred [14] for Neural Networks (NN), RSA-PRP [20] for Support Vector Regression (SVR), [23] for a combination of Linear Regression and Support Vector Regression (LR+SVR), and SABLE [13] for a combination of Neural Networks and Linear Regression (NN+LR). We did not compare our tool against those that used a Real Values approach [33,21,15] (including the look-up table approach by Carugo et al [5]), as these are not binary classifiers, which makes output comparison not straightforward.…”

Section: Resultsmentioning

confidence: 99%

“…This dataset consists of 215 non-homologous protein chains (50878 residues) with no more than 25% pairwise-sequence identity and crystallographic resolution < 2.5Å [6].…”

Section: Dataset 1 (Nm215)mentioning

confidence: 99%

“…Depending on this percent value, amino acids are then classified as either buried or exposed on a binary basis, or with discrete exposure levels in case of multiple threshold systems [4]. Several different approaches have been proposed to cope with the solvent accessibility problem: Information Theory [5,6], Bayesian Statistics [7], Probability Profiles [8], Neural Networks [4,9,11,12,13,14,15], Linear Regression [16,17], Support Vector Machines [18,19], Support Vector Regression [20], Look-up Tables [21], meta-methods [22] and many others [23]. However, exploiting sequence similarity to known structures, namely sequence homology, proved to be a substantial improvement strategy for all these methods, both for secondary structure and Solvent Accessibility prediction [9,24].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A High Performing Tool for Residue Solvent Accessibility Prediction

Palmieri

Federico

Leoncini

et al. 2011

Information Technology in Bio- And Medical Informatics

View full text Add to dashboard Cite

Abstract. Many efforts were spent in the last years in bridging the gap between the huge number of sequenced proteins and the relatively few solved structures. Relative Solvent Accessibility (RSA) prediction of residues in protein complexes is a key step towards secondary structure and protein-protein interaction sites prediction. With very different approaches, a number of software tools for RSA prediction have been produced throughout the last twenty years. Here, we present a binary classifier which implements a new method mainly based on sequence homology and implemented by means of look-up tables. The tool exploits residue similarity in solvent exposure pattern of neighboring context in similar protein chains, using BLAST search and DSSP structure. A two-state classification with 89.5% accuracy and 0.79 correlation coefficient against the real data is achieved on a widely used dataset.

show abstract

Section: Resultsmentioning

confidence: 99%

“…This dataset consists of 215 non-homologous protein chains (50878 residues) with no more than 25% pairwise-sequence identity and crystallographic resolution < 2.5Å [6].…”

Section: Dataset 1 (Nm215)mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A High Performing Tool for Residue Solvent Accessibility Prediction

Palmieri

Federico

Leoncini

et al. 2011

Information Technology in Bio- And Medical Informatics

View full text Add to dashboard Cite

show abstract

“…Knowledge of surface topology and the geometric neighbors of residues used in the other studies were not used in our study. Several authors have reported success in predicting surface residues from the amino acid sequence [2,10,12,19,20,21]. This raises the possibility of first predicting surface residues based on sequence information, and then using the predicted surface residue information to predict the interaction sites using an SVM classifier.…”

Section: Discussionmentioning

confidence: 99%

“…After the removal of redundant proteins and proteins with fewer than ten residues, we obtained a data set of 115 proteins belonging to six different categories of complexes. The six categories and the number of proteins in each category are: antibody-antigen (31), protease-inhibitor (19), enzyme complexes (14), large protease complexes (8), G-proteins, cell cycle, signal transduction (22), and miscellaneous (21). In the study described here, we focused on the proteins from two categories: 19 proteins from protease-inhibitor complexes and 31 proteins from antibodyantigen complexes (the protein list is available at http://www.public.iastate.edu/~chhyan/isda2003/sup.htm).…”

Section: Protein Complexes Proteins and Amino Acid Residuesmentioning

confidence: 99%

Identification of interface residues in protease-inhibitor and antigen-antibody complexes: a support vector machine approach

Yan

Honavar

Dobbs

2004

Neural Comput & Applic

View full text Add to dashboard Cite

In this paper, we describe a machine learning approach for sequence-based prediction of proteinprotein interaction sites. A support vector machine (SVM) classifier was trained to predict whether or not a surface residue is an interface residue (i.e., is located in the protein-protein interaction surface), based on the identity of the target residue and its ten sequence neighbors. Separate classifiers were trained on proteins from two categories of complexes, antibody-antigen and protease-inhibitor. The effectiveness of each classifier was evaluated using leave-one-out (jack-knife) cross-validation. Interface and non-interface residues were classified with relatively high sensitivity (82.3% and 78.5%) and specificity (81.0% and 77.6%) for proteins in the antigen-antibody and protease-inhibitor complexes, respectively. The correlation between predicted and actual labels was 0.430 and 0.462, indicating that the method performs substantially better than chance (zero correlation). Combined with recently developed methods for identification of surface residues from sequence information, this offers a promising approach to predict residues involved in protein-protein interactions from sequence information alone.

show abstract