We present a method for predicting protein folding class based on global protein chain description and a voting process. Selection of the best descriptors was achieved by a computer-simulated neural network trained on a data base consisting of 83 folding classes. Protein-chain descriptors include overall composition, transition, and distribution of amino acid attributes, such as relative hydrophobicity, predicted secondary structure, and predicted solvent exposure. Cross-validation testing was performed on 15 of the largest classes. The test shows that proteins were assigned to the correct class (correct positive prediction) with an average accuracy of 71.7%, whereas the inverse prediction of proteins as not belonging to a particular class (correct negative prediction) was 90-95% accurate. When tested on 254 structures used in this study, the top two predictions contained the correct class in 91% of the cases.Examination of three-dimensional (3D) structures of proteins determined by x-ray diffraction and NMR has shown that the variety of folding patterns of proteins is significantly restricted (1, 2). Since protein sequence information grows significantly faster than information on protein 3D structure, the need for predicting the folding pattern of a given protein sequence naturally arises. Since the first relatively full classification of folding patterns of globular proteins (3), researchers have developed various schemes for classification of protein 3D structures (4-6) that are essentially based on the same spatial motifs.If the prediction is restricted to a small number of structural classes (less than five), a prediction performance >70% can be easily achieved by using various methods based on a simple representation of sequences as vectors of a small number of general parameters. In the simplest classification, proteins are usually described in terms of the following "tertiary super classes:" all a (proteins have only a-helix secondary structure), all 13 (mainly 3-sheet secondary structure), a+0 (a-helix and {3-strand secondary structure segments that do not mix), a/13 (mixed or alternating segments of a-helical and 13-strand secondary structure), and irregular (7-9). Several statistical methods were developed to predict whether a protein belongs to one of these classes (10)(11)(12)(13)(14)(15)(16)(17). In a recent study on predicting protein structural class (all a, all 1, or composed of a and 1 elements) from amino acid composition and hydrophobic pattern frequency information using computer-simulated neural networks (NNs) and statistical clustering, Metfessel et al.(18) obtained a prediction accuracy of 80.2%. Consideration of specific features of folding classes in the form of so-called hidden Markov models or probabilistic grammars allows a >2-fold increase in the number of classes of recognition (9). This method accurately predicts 12 classes; however, the study gives test results only for 16 sequences.It is obvious that difficulty of folding pattern prediction grows rapidly with the number...
A computational method has been developed for the assignment of a protein sequence to a folding class in the Structural Classification of Proteins (SCOP). This method uses global descriptors of a primary protein sequence in terms of the physical, chemical, and structural properties of the constituent amino acids. Neural networks are utilized to combine these descriptors in a way to discriminate members of a given fold from members of all other folds. An extensive testing of the method has been performed to evaluate its prediction accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.