Protein tertiary structures are known to be encoded in amino acid sequences, but the problem of structure prediction from sequence continues to be a challenge. With this question in mind, recent simulations have shown that atomic burials, as expressed by atom distances to the molecular geometrical center, are sufficiently informative for determining native conformations of small globular proteins. Here we use a simple computational experiment to estimate the amount of this required burial information and find it to be surprisingly small, actually comparable with the stringent limit imposed by sequence statistics. Atomic burials appear to satisfy, therefore, minimal requirements for a putative dominating property in the folding code because they provide an amount of information sufficiently large for structural determination but, at the same time, sufficiently small to be encodable in sequences. In a simple analogy with human communication, atomic burials could correspond to the actual "language" encoded in the amino acid "script" from which the complexity of native conformations is recovered during the folding process.protein folding | structure prediction | information theory | folding code D uring the last two decades, a physical picture of the folding process has emerged with the advent of energy landscape theory (1-5) but, despite many recent advances, a general solution to the problem of structure prediction from sequence has remained elusive. Most attempts in this direction have assumed sequences to encode partial information about many structural properties, such as likelihood of tertiary contacts or secondary structure propensities, that could eventually be combined to provide a general predictive algorithm (6-10). An alternative scheme would assume a single (or few) conformational property to be directly encoded in sequences, resulting in a small number of sequence-dependent parameters, whereas other conformational features would arise from sequence-independent constraints. The importance of such constraints has been recently emphasized by Banavar and collaborators (11).The amount of information provided by a putative single property dominating the code should satisfy two conditions: It should be sufficiently large for structural determination but sufficiently small for being encodable in sequences (12). The widely recognized importance of hydrophobic interactions on protein structure formation (13,14) suggests atomic burials to constitute a natural candidate for this putative dominant property. There has been some discussion, in the simplified context of lattice models, on the possibility that intrinsically unspecific hydrophobicity could satisfy the first condition (15), including a dependence on the choice of native conformation (16,17). Encouraging results from recent Monte Carlo simulations, on the other hand, indicate that the first condition is satisfied by atomic burials, as measured by distances from the molecular geometrical center, for small globular proteins represented by off-lattice, geome...
By Monte Carlo simulations, we explored the effect of single mutations on the thermodynamics and kinetics of the folding of a two-dimensional, energetically frustrated, hydrophobic protein model. Phi-Value analysis, corroborated by simulations beginning from given sets of judiciously chosen initial contacts, suggests that the transition state of the model consists of a limited region of the native structure, that is, a folding nucleus. It seems that the most important contacts in the transition state (large and positive Phi) are not the ones with the highest contact order, because in this case the entropic cost of their formation would be too high, but exactly the ones that decrease the entropic cost of difficult contacts, reducing their effective contact order. Mutations of internal monomers involved in high-order contacts were actually the ones resulting in the fastest kinetics (and Phi < 0), indicating they tend to make low order, non-native contacts of low entropic cost that stabilize the unfolded state with respect to the transition state. Folding acceleration by other non-native interactions was also observed and a simple general mechanism is proposed according to which non-native contacts can act indirectly over the folding nucleus, "chelating" out potentially harmful contacts. The polymer graph of our model, which facilitates the visualization of effective contact orders, successfully suggests the relative kinetic importance of different contacts and is reasonably consistent with analogous graphs for the well characterized family of SH3 domains.
Recent ab initio folding simulations for a limited number of small proteins have corroborated a previous suggestion that atomic burial information obtainable from sequence could be sufficient for tertiary structure determination when combined to sequence-independent geometrical constraints. Here, we use simulations parameterized by native burials to investigate the required amount of information in a diverse set of globular proteins comprising different structural classes and a wide size range. Burial information is provided by a potential term pushing each atom towards one among a small number L of equiprobable concentric layers. An upper bound for the required information is provided by the minimal number of layers L(min) still compatible with correct folding behavior. We obtain L(min) between 3 and 5 for seven small to medium proteins with 50 ≤ Nr ≤ 110 residues while for a larger protein with Nr = 141 we find that L ≥ 6 is required to maintain native stability. We additionally estimate the usable redundancy for a given L ≥ L(min) from the burial entropy associated to the largest folding-compatible fraction of "superfluous" atoms, for which the burial term can be turned off or target layers can be chosen randomly. The estimated redundancy for small proteins with L = 4 is close to 0.8. Our results are consistent with the above-average quality of burial predictions used in previous simulations and indicate that the fraction of approachable proteins could increase significantly with even a mild, plausible, improvement on sequence-dependent burial prediction or on sequence-independent constraints that augment the detectable redundancy during simulations.
The dataset of protein structures and the prediction implementations are available at http://www.btc.unb.br/ (in 'Software').
We investigate the possibility that atomic burials, as measured by their distances from the structural geometrical center, contain sufficient information to determine the tertiary structure of globular proteins. We report Monte Carlo simulated annealing results of all-atom hard-sphere models in continuous space for four small proteins: the all-beta WW-domain 1E0L, the alpha/beta protein-G 1IGD, the all-alpha engrailed homeo-domain 1ENH, and the alpha + beta engineered monomeric form of the Cro protein 1ORC. We used as energy function the sum over all atoms, labeled by i, of |R(i) - R(i) (*)|, where R(i) is the atomic distance from the center of coordinates, or central distance, and R(i) (*) is the "ideal" central distance obtained from the native structure. Hydrogen bonds were taken into consideration by the assignment of two ideal distances for backbone atoms forming hydrogen bonds in the native structure depending on the formation of a geometrically defined bond, independently of bond partner. Lowest energy final conformations turned out to be very similar to the native structure for the four proteins under investigation and a strong correlation was observed between energy and distance root mean square deviation (DRMS) from the native in the case of all-beta 1E0L and alpha/beta 1IGD. For all alpha 1ENH and alpha + beta 1ORC the overall correlation between energy and DRMS among final conformations was not as high because some trajectories resulted in high DRMS but low energy final conformations in which alpha-helices adopted a non-native mutual orientation. Comparison between central distances and actual accessible surface areas corroborated the implicit assumption of correlation between these two quantities. The Z-score obtained with this native-centric potential in the discrimination of native 1ORC from a set of random compact structures confirmed that it contains a much smaller amount of native information when compared to a traditional contact Go potential but indicated that simple sequence-dependent burial potentials still need some improvement in order to attain a similar discriminability. Taken together, our results suggest that central distances, in conjunction to physically motivated hydrogen bond constraints, contain sufficient information to determine the native conformation of these small proteins and that a solution to the folding problem for globular proteins could arise from sufficiently accurate burial predictions from sequence followed by minimization of a burial-dependent energy function.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.