The main-chain conformations of 237 384 amino acids in 1042 protein subunits from the PDB were analyzed with Ramachandran plots. The populated areas of the empirical Ramachandran plot differed markedly from the classical plot in all regions. All amino acids in -helices are found within a very narrow range of 9, 2 angles. As many as 40% of all amino acids are found in this most populated region, covering only 2% of the Ramachandran plot. The -sheet region is clearly subdivided into two distinct regions. These do not arise from the parallel and antiparallel -strands, which have quite similar conformations. One region is mainly from amino acids in random coil. The third and smallest populated area of the Ramachandran plot, often denoted left-handed -helix, has a different position than that originally suggested by Ramachandran. Each of the 20 amino acids has its own very characteristic Ramachandran plot. Most of the glycines have conformations that were considered to be less favoured. These results may be useful for checking secondary-structure assignments in the PDB and for predicting protein folding.
The program is available for download at www.fos.su.se/~nanjiang/zincpred/download/
By exploiting the vast protein sequence and protein structure data available, we have brought secondary-structure prediction closer to the expected theoretical limit. When tested by a leave-one-out cross validation on a non-redundant set of PDB cutting at 30% sequence identity containing 5860 protein chains, the overall per-residue accuracy for secondary-structure prediction, i.e. Q3 is 82.9%. The overall per-residue accuracy for three- and eight-state Shape Strings are 85.1 and 71.5%, respectively. We have also benchmarked our program with the latest version of PSIPRED for secondary structure prediction and our program predicted 0.3% better in Q3 when tested on 2241 chains with the same training set. For Shape Strings, we compared our method with a recently published method with the same dataset and definition as used by that method. Our program predicted at 2.2% better in accuracy for three-state Shape Strings. By quantitatively investigating the effect of data base size on 1D structure prediction we show that the accuracy increases by approximately 1% with every doubling of the database size.
Protein folding starts before the whole polypeptide has been synthesized by the ribosome. No matter how long the polypeptide is or how intricate the fold, both ends of the chain always end up on the surface. From a topological point of view, this is surprising; one would have expected to find the starting (N-terminal) end inside the core of the folded protein, just as in a ball of yarn. We suggest here that the reason for this apparent paradox is that the first amino acid of the emerging polypeptide chain is gripped during protein synthesis, perhaps by the ribosome, and is not released until the whole polypeptide has been synthesized. This binding would greatly decrease the degrees of freedom for the protein-folding process and could also explain why knots are so rare in proteins. Gripping would also guarantee that the N-terminal is accessible on the protein surface as required for binding of ubiquitin, which regulates the natural degradation of proteins and avoids buildup of protein aggregates, such as those found in Huntington's, Alzheimer's, Parkinson's, and other neurodegenerative diseases.
Different methods for describing and comparing the structures of the tens of thousands of proteins that have been determined by X-ray crystallography are reviewed. Such comparisons are important for understanding the structures and functions of proteins and facilitating structure prediction, as well as assessing structure prediction methods. We summarize methods in this field emphasizing ways of representing protein structures as one-dimensional geometrical strings. Such strings are based on the shape symbols of clustered regions of phi/Psi dihedral angle pairs of the polypeptide backbones as described by the Ramachandran plot. These one-dimensional expressions are as compact as secondary structure description but contain more information in loop regions. They can be used for fast searching for similar structures in databases and for comparing similarities between proteins and between the predicted and native structures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.