General principles of protein structure, stability, and folding kinetics have recently been explored in computer simulations of simple exact lattice models. These models represent protein chains at a rudimentary level, but they involve few parameters, approximations, or implicit biases, and they allow complete explorations of conformational and sequence spaces. Such simulations have resulted in testable predictions that are sometimes unanticipated: The folding code is mainly binary and delocalized throughout the amino acid sequence. The secondary and tertiary structures of a protein are specified mainly by the sequence of polar and nonpolar monomers. More specific interactions may refine the structure, rather than dominate the folding code. Simple exact models can account for the properties that characterize protein folding: two-state cooperativity, secondary and tertiary structures, and multistage folding kinetics-fast hydrophobic collapse followed by slower annealing. These studies suggest the possibility of creating "foldable" chain molecules other than proteins. The encoding of a unique compact chain conformation may not require amino acids; it may require only the ability to synthesize specific monomer sequences in which at least one monomer type is solvent-averse.Keywords: chain collapse; hydrophobic interactions; lattice models; protein conformations; protein folding; protein stabilityWe review the principles of protein structure, stability, and folding kinetics from the perspective of simple exact models. We focus on the "folding code''-how the tertiary structure and folding pathway of a protein are encoded in its amino acid sequence. Although native proteins are specific, compact, and often remarkably symmetrical structures, ordinary synthetic polymers in solution, glasses, or melts adopt large ensembles of more expanded conformations, with little intrachain organization. With simple exact models, we ask what are the fundamental causes of the differences between proteins and other polymers-What makes proteins special?One view of protein folding assumes that the "local" interactions among the near neighbors in the amino acid sequence, the interactions that form helices and turns, are the main determinants of protein structure. This assumption implies that isolated helices form early in the protein folding pathway and then assemble into the native tertiary structure (see Fig. 1). It is the premise behind the paradigm, primary + secondary -+ tertiary structure, that seeks computer algorithms to predict secondary structures from the sequence, and then to assemble them into the tertiary native structure.Here we review a simple model of an alternative view, its basis in experimental results, and its implications. We show how the nonlocal interactions that drive collapse processes in heteropolymers can give rise to protein structure, stability, and folding kinetics. This perspective is based on evidence that the folding code is not predominantly localized in short windows of the amino acid sequence. It...
We report a blind test of lattice-model-based search strategies for finding global minima of model protein chains. One of us (E.I.S.) selected 10 compact conformations of 48-mer chains on the three-dimensional cubic lattice and used their inverse folding algorithm to design HP (H, hydrophobic; P, polar) sequences that should fold to those "target" structures. The sequences, but not the structures, were sent to the UCSF group (K.Y., K.M.F., P.D.T., H.S.C., and K.A.D.), who used two methods to attempt to find the globally optimal conformations: "hydrophobic zippers" and a constraintbased hydrophobic core construction (CHCC) method. The CHCC method found global minima in all cases, and the hydrophobic zippers method found global minima in some cases, in minutes to hours on workstations. In 9 out of 10 sequences, the CHCC method found lower energy conformations than the 48-mers were designed to fold to. Thus the search strategies succeed for the HP model but the design strategy does not. For every sequence the global energy minimum was found to have multiple degeneracy with 103 to 106 conformations. We discuss the implications of these results for (i) searching conformational spaces of simple models of proteins and (ii) how these simple models relate to proteins.
The tertiary structures of globular proteins have remarkable and complex symmetries. What forces cause them? We find that a very simple model reproduces some of those symmetries. Proteins are modeled as copolymers of specific sequences of hydrophobic (H) and polar (P) monomers (HP model) configured as self-avoiding flights on simple three-dimensional cubic lattices. The model has no parameters; we just seek the conformations that have the global maximum number of HH contacts for any given sequence. Finding global optima for chains in this model has not been computationally possible before for chains longer than 36-mers. We report here a procedure that can find all the globally optimal conformations, the number of which defines the degeneracy of a sequence, for chains up to 88 monomers long. It is about 37 orders of magnitude faster than previous exact methods. We find that degeneracy is an important aspect of sequence design. So far, we have found that four-helix bundles, alpha/beta-barrels, and parallel beta-helices are globally optimal conformations of polar/nonpolar sequences that have minimal degeneracy.
We consider the question of how to design proteins. How can we find "good" amino acid sequences (D) that fold to a desired "target" structure as a native conformation of lowest accessible free energy and (ii) that will not simultaneously fold to many other conformations of the same free energy? Current protein designs often focus on helix propensities and turns. We focus here on designing the hydrophobicity. For a model of self-avoiding hydrophobic/polar chains on two-dimensional square lattices, geometric proofs and exhaustive enumerations show the following results. (i) The strategy hydrophobic residues inside/polar residues outside is not optimal. Placement of additional hydrophobic residues on the surface is often necessary. (u) To avoid unwanted conformations, the designed sequence must have neither too many nor too few hydrophobic residues. (iii) The computational complexity of inverse folding appears to be in a different class than folding: unlike the folding problem, the design problem does not scale exponentially with chain length. Some design strategies, described here for the lattice model, produce good sequences and scale only linearly with chain length.Recently there have been major advances in protein design [i.e., in the design of amino acid sequences that will fold to desired "target" native conformations (1-11)]. However, the following several hurdles remain to be overcome. (i) So far, most designed proteins have only simple symmetries-4-helix bundles and all-sheet conformations, for example (2-4, 9, 10).(ii) Most of the "rational" aspects of design currently focus on the interactions among the "connected" neighbors [i.e., on the intrinsic propensities of monomeric and dimeric amino acids to form helices and turns (12-16)]. However, major forces of folding are due to hydrophobic and other "nonlocal" interactions [i.e., among monomers far apart in the sequence (17-19)]. The main design principle currently used for them is hydrophobic (H) residues inside/polar (P) residues outside. Nevertheless, many real proteins have exposed nonpolar and buried polar units (20,21), and some single-site mutations contradict the hydrophobic residues inside/pol~ar residues outside principle (22). Are there more subtle principles that are important? (iii) A major design problem has been how to avoid simultaneous folding to wrong conformations; designed sequences sometimes appear to fold to "gemiche" states, involving multiple or wrong native structures (ref. 4 and T. Handel, personal communication). It is not known how to control the multiplicity of stable structures (i.e., how to design the desired structure uniquely into the sequence). The problem of sequence design is the "inverse protein folding" problem (23)(24)(25). Whereas the input of a protein folding algorithm would be an amino acid sequence and the output would be a native structure, the input for an inverse folding algorithm would be a desired native structure and the output would be a sequence that will fold to it. Can heuristic rules be f...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.