Recent ab initio folding simulations for a limited number of small proteins have corroborated a previous suggestion that atomic burial information obtainable from sequence could be sufficient for tertiary structure determination when combined to sequence-independent geometrical constraints. Here, we use simulations parameterized by native burials to investigate the required amount of information in a diverse set of globular proteins comprising different structural classes and a wide size range. Burial information is provided by a potential term pushing each atom towards one among a small number L of equiprobable concentric layers. An upper bound for the required information is provided by the minimal number of layers L(min) still compatible with correct folding behavior. We obtain L(min) between 3 and 5 for seven small to medium proteins with 50 ≤ Nr ≤ 110 residues while for a larger protein with Nr = 141 we find that L ≥ 6 is required to maintain native stability. We additionally estimate the usable redundancy for a given L ≥ L(min) from the burial entropy associated to the largest folding-compatible fraction of "superfluous" atoms, for which the burial term can be turned off or target layers can be chosen randomly. The estimated redundancy for small proteins with L = 4 is close to 0.8. Our results are consistent with the above-average quality of burial predictions used in previous simulations and indicate that the fraction of approachable proteins could increase significantly with even a mild, plausible, improvement on sequence-dependent burial prediction or on sequence-independent constraints that augment the detectable redundancy during simulations.
The dataset of protein structures and the prediction implementations are available at http://www.btc.unb.br/ (in 'Software').
The three-dimensional structure of proteins is determined by their linear amino acid sequences but decipherment of the underlying protein folding code has remained elusive. Recent studies have suggested that burials, as expressed by atomic distances to the molecular center, are sufficiently informative for structural determination while potentially obtainable from sequences. Here we provide direct evidence for this distinctive role of burials in the folding code, demonstrating that burial propensities estimated from local sequence can indeed be used to fold globular proteins in ab initio simulations. We have used a statistical scheme based on a Hidden Markov Model (HMM) to classify all heavy atoms of a protein into a small number of burial atomic types depending on sequence context. Molecular dynamics simulations were then performed with a potential that forces all atoms of each type towards their predicted burial level, while simple geometric constraints were imposed on covalent structure and hydrogen bond formation. The correct folded conformation was obtained and distinguished in simulations that started from extended chains for a selection of structures comprising all three folding classes and high burial prediction quality. These results demonstrate that atomic burials can act as informational intermediates between sequence and structure, providing a new conceptual framework for improving structural prediction and understanding the fundamentals of protein folding.
The connection between protein sequences and tertiary structures has intrigued investigators for decades. A plausible hypothesis for the coding scheme postulates that atomic burial information obtainable from the sequence could be sufficient for structural determination when combined to sequence-independent constraints. Accordingly, folding simulations using native burial information expressed by atomic central distances, discretized into a small number L of equiprobable burial layers, have indeed been successful in reaching and distinguishing the native structure of several globular proteins. Attempted predictions of layers from sequence, however, turned out to be insufficiently accurate for most proteins. Here we explore the possibility that a nonuniform assignment of layers, which is intended to account for constraints imposed by chain connectivity, might provide a more efficient burial encoding of tertiary structures. We consider the condition that adjacent Cα-atoms along the sequence cannot occupy nonadjacent layers, in which case the information required to specify sequences of burials would be smaller. It is shown that appropriate folding behavior can still be observed in this explicitly more constrained scenario with a structure-dependent assignment intended to produce the thinnest possible layers still compatible with the imposed burial constraint. This thinnest assignment turns out to be sufficiently restrictive for the observed examples and provides appropriately thinner layers or, equivalently, a larger number of layers, for examples previously observed to indeed require more restrictive constraints when compared to counterparts of similar size, as well as the appropriate increase in number of layers for larger proteins. Implications for the general understanding of the protein folding code are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.