Long chains of the HP lattice protein model are studied by the multi-self-overlap ensemble Monte Carlo method, which was developed recently by Iba, Chikenji, and Kikuchi. This method successfully finds the lowest energy states reported before for sequences of the chain length N 42 100 in two and three dimensions. Moreover, the method realizes the lowest energy state that was ever found in a case of N 100. Finite-temperature properties of these sequences are also investigated by this method. Two successive transitions are observed between the native and random coil states. Thermodynamic analysis suggests that the ground state degeneracy is relevant to the order of the transitions.
A novel family of dynamical Monte Carlo algorithms for lattice polymers is proposed. Our central idea is to simulate an extended ensemble in which the self-avoiding condition is systematically weakened. The degree of self-overlap is controlled in a similar manner as the multicanonical ensemble. As a consequence, the ensemble -the multi-self-overlap ensemble -contains adequate portions of self-overlapping conformations as well as higher energy ones. It is shown that the multi-self-overlap ensemble algorithm correctly reproduces the canonical averages at finite temperatures of the HP model of lattice proteins. Moreover, it is superior in performance to the standard multicanonical algorithm when applied to a complicated problem of a polymer with eight-stickers. An alternative algorithm based on the exchange Monte Carlo method is also discussed.KEYWORDS: heteropolymers, Monte Carlo, slow relaxation, self-avoidance, extended ensemble, multicanonical, exchange Monte Carlo 3327Monte Carlo simulations of lattice polymers are an important subject in various fields of scientific research, for example, statistical physics, physical chemistry and theoretical biology. Simulations of lattice heteropolymers, which consist of several different types of monomers, are particularly interesting because they are minimal models of protein folding.1, 2) Such simulations, however, often suffer from slow relaxation and metastability caused by the competition between short-ranged interactions and connectivity constraints among the monomers. For other systems with metastability, such as a spin system which exhibits a first-order phase transition, the multicanonical ensemble method is known to work well.3, 4) But for the lattice heteropolymer, the multicanonical ensemble is not the best solution. In fact, even a self-avoiding walk, which is the simplest lattice polymer, is difficult to generate, 5) although no interaction energy is assumed between monomers other than the constraint of self-avoidance.In this letter, we propose a novel approach to the dynamical Monte Carlo simulation of lattice polymers. The present approach is a variant of Monte Carlo algorithms based on extended ensembles 3,4,[6][7][8][9][10][11][12][13] and can be applied to a wide range of models including lattice heteropolymers and protein models on lattices. Our starting point is the introduction of an artificial ensemble that contains conformations with finite self-overlaps. With this relaxation of the self-avoidance constraint, the conformations with self-overlaps play the role of "bridges" between metastable states and a rapid mixing of the Markov chain is expected. In fact, Shakhnovich et al. have reported that folding becomes considerably faster than usual in the lattice protein model that allowed double and triple self-overlapping conformations. 14, 15)But it is not easy to produce an adequate amount of * E-mail: iba@ism.ac.jp * * E-mail: chikenji@hyperion.phys.sci.osaka-u.ac.jp self-avoiding conformations. Consider a dynamical simulation of a self-avoiding walk....
Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of ''chimera proteins.'' In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape.computational protein design ͉ energy landscape ͉ fragment assembly ͉ Go model ͉ SIMFOLD N atural proteins have the algorithm of finding the global minimum of their free energy surface within a biologically relevant timescale (1). One of the most stringent tests of how much we understand the algorithm may be to predict protein tertiary structures by simulating processes that are analogous to folding, which is often called de novo structure prediction. Recently, significant progress has been made in de novo structure prediction, in which the most successful method is the fragment assembly (FA) method developed by Baker and coworkers (2) and others (3-6). The FA method shows considerable promise for new fold targets of recent Critical Assessments of Techniques for Protein Structure Prediction (CASPs), the community-wide blind tests of structure prediction (7-11). In the FA method, the protocol is separated into two stages: First, we collect structural candidates for every short segment of the target sequence, retrieving them from the structural database. The second stage is to assemble͞fold these fragments for constructing tertiary structures that have low energies.Simple questions arose as to why the FA method is so successful and what we can learn about protein folding from the success of the FA method. According to Baker and coworkers (12,13), the FA method is based on the experimental observation that local sequence of a protein biases but does not uniquely decide its local structure. To what extent does the modest local bias influence tertiary structures generated? How is the FA method related to recently developed protein folding theory (14, 15)? In this work, we address these questions.For this purpose, we need structure prediction software that uses the FA method. Here, we use the in-house-developed software, SIMFOLD. SIMFOLD employs a coarse-grained protei...
The folding energy landscape of proteins has been suggested to be funnel-like with some degree of ruggedness on the slope. How complex the landscape, however, is still rather unclear. Many experiments for globular proteins suggested relative simplicity, whereas molecular simulations of shorter peptides implied more complexity. Here, by using complete conformational sampling of 2 globular proteins, protein G and src SH3 domain and 2 related random peptides, we investigated their energy landscapes, topological properties of folding networks, and folding dynamics. The projected energy surfaces of globular proteins were funneled in the vicinity of the native but also have other quite deep, accessible minima, whereas the randomized peptides have many local basins, including some leading to seriously misfolded forms. Dynamics in the denatured part of the network exhibited basin-hopping itinerancy among many conformations, whereas the protein reached relatively well-defined final stages that led to their native states. We also found that the folding network has the hierarchic nature characterized by the scale-free and the small-world properties.contact maps ͉ folding pathways ͉ multiple pathways ͉ principal coordinates P roteins fold on large-dimensional energy landscapes through myriads of conformations. One energy-landscape theory suggests that the global shape of the landscape is primarily funnel-like with some degree of ruggedness on the slope of the funnel (1, 2). How complex/rugged the energy landscape is and how diverse the folding-pathway ensemble is are still rather controversial. Experimentally, many small fast-folding proteins exhibit single-exponential behavior, suggesting simplicity (3). For such proteins, a perfect funnel model, Go model, has been used, as an extreme of simplicity, to model folding routes, often showing modestly good agreement with experiments (4). Conversely, there exist several clear evidences of complexity in folding. Under some conditions, proteins show strange and glassy kinetics, suggesting ruggedness of the landscape (5). Some -sheet proteins, such as -lactoglobulin, form nonnative ␣-helices at early stages of folding (6, 7).The computational approach has been the most direct to elucidate the complexity of folding energy landscapes. Methods developed in other areas, such as atomic clusters, have been applied to peptides and proteins, illustrating the multiple minima on the landscape (8-11). Recently, with background (10, 11), Krivov and Karplus developed the transition disconnectivity graph to visualize quantitatively the free-energy landscape and applied it for peptides finding a highly rugged non-funnel-like landscape with competing minima (12, 13). Caflisch and coworkers (14, 15) constructed a folding network for a designed peptide and uncovered a highly heterogeneous denatured ensemble. They both used network analyses without the data reduction to lower dimension and warned that the projection to low dimension, as is often done in conventional folding studies, can hide the co...
BackgroundProtein pairs that have the same secondary structure packing arrangement but have different topologies have attracted much attention in terms of both evolution and physical chemistry of protein structures. Further investigation of such protein relationships would give us a hint as to how proteins can change their fold in the course of evolution, as well as a insight into physico-chemical properties of secondary structure packing. For this purpose, highly accurate sequence order independent structure comparison methods are needed.ResultsWe have developed a novel protein structure alignment algorithm, MICAN (a structure alignment algorithm that can handle Multiple-chain complexes, Inverse direction of secondary structures, Cα only models, Alternative alignments, and Non-sequential alignments). The algorithm was designed so as to identify the best structural alignment between protein pairs by disregarding the connectivity between secondary structure elements (SSE). One of the key feature of the algorithm is utilizing the multiple vector representation for each SSE, which enables us to correctly treat bent or twisted nature of long SSE. We compared MICAN with other 9 publicly available structure alignment programs, using both reference-dependent and reference-independent evaluation methods on a variety of benchmark test sets which include both sequential and non-sequential alignments. We show that MICAN outperforms the other existing methods for reproducing reference alignments of non-sequential test sets. Further, although MICAN does not specialize in sequential structure alignment, it showed the top level performance on the sequential test sets. We also show that MICAN program is the fastest non-sequential structure alignment program among all the programs we examined here.ConclusionsMICAN is the fastest and the most accurate program among non-sequential alignment programs we examined here. These results suggest that MICAN is a highly effective tool for automatically detecting non-trivial structural relationships of proteins, such as circular permutations and segment-swapping, many of which have been identified manually by human experts so far. The source code of MICAN is freely download-able at http://www.tbp.cse.nagoya-u.ac.jp/MICAN.
Predicting protein tertiary structures by in silico folding is still very difficult for proteins that have new folds. Here, we developed a coarse-grained energy function, SimFold, for de novo structure prediction, performed a benchmark test of prediction with fragment assembly simulations for 38 test proteins, and proposed consensus prediction with Rosetta. The SimFold energy consists of many terms that take into account solvent-induced effects on the basis of physicochemical consideration. In the benchmark test, SimFold succeeded in predicting native structures within 6.5 A for 12 of 38 proteins; this success rate was the same as that by the publicly available version of Rosetta (ab initio version 1.2) run with default parameters. We investigated which energy terms in SimFold contribute to structure prediction performance, finding that the hydrophobic interaction is the most crucial for the prediction, whereas other sequence-specific terms have weak but positive roles. In the benchmark, well-predicted proteins by SimFold and by Rosetta were not the same for 5 of 12 proteins, which led us to introduce consensus prediction. With combined decoys, we succeeded in prediction for 16 proteins, four more than SimFold or Rosetta separately. For each of 38 proteins, structural ensembles generated by SimFold and by Rosetta were qualitatively compared by mapping sampled structural space onto two dimensions. For proteins of which one of the two methods succeeded and the other failed in prediction, the former had a less scattered ensemble located around the native. For proteins of which both methods succeeded in prediction, often two ensembles were mixed up.
The mechanism of ␣3 transition in folding of -lactoglobulin is discussed based on free energy landscape analysis of a long lattice model. It is found that helical propensity of -lactoglobulin is driven by conformational entropy and is intrinsically coded in its native structure. We propose a view on a role of folding intermediate, which is ''on-pathway'' but rich in non-native structures. The present results suggest that the native structure topology plays an important role in ␣3 transition.T ransitions from ␣-helix to -sheet have been observed in folding processes of some mainly -sheet proteins, such as -lactoglobulin (LG) (1) and plasminogen activator inhibitor type I (2). In these cases, ␣-rich structures appear as folding intermediates. Such folding processes via non-native intermediates are rather unusual and thus their mechanism has been attracting much attention. From the theoretical point of view, ␣3 transitions during the folding processes are of great interest because they seem not to be described by the funnel picture, although it successfully describes folding mechanism of some small proteins (3-10). To get more general insight into folding mechanism, it is important to understand the mechanism of the ␣3 transition.In this paper, we discuss the folding mechanism of LG as a prototype of the ␣3 transition. LG has the following properties: (i) it has a predominantly -sheet native structure [upand-down  barrel (UD) topology] (11), and (ii) the refolding intermediate contains a significantly large amount of non-native ␣-helical structures. In other words, the folding process consists of two processes, namely, a fast process U3I(␣) followed by a slow process I(␣)3N, where U and N are the fully unfolded state and the native state, respectively, and I(␣) is the highly helical intermediate (1), and (iii) accumulation of ␣-helical intermediate also is observed in the equilibrium unfolding experiment using guanidine hydrochloride ref. 13).Why does the folding intermediate of LG contain non-native highly ␣-helical structures? And what is their role? To understand the nature of ␣3 transition, we study a simple lattice protein model that has properties similar to those of LG. The ModelThe model we study here is the modified HP model (13) in which a protein consists of a self-avoiding chain on a three-dimensional cubic lattice with two types of amino acids: H (hydrophobic) and P (polar). The energy (or solvent averaged free energy) E of a chain conformation is determined by the number of HϪ-H contact n h and that of PϪ-P contact n p as E ϭ Ϫ (n h ϩ n p ), where Ͼ 0 is a constant (we measure the energy in the unit of hereafter). The chain length is 80 and the sequence isThis sequence was originally designed by O'Toole and Panagiotopoulos (13) so as to have a four-helix bundle-like native structure (shown in figure 4 of ref. 14), whose energy E is Ϫ94. In the following sections, we will show that this lattice model is indeed a good model of LG for the purpose of qualitative discussions on ␣3 tra...
The fragment assembly method is currently one of the most successful methods for the de novo protein structure prediction, where conformational change by fragment replacement is repeated with the simulated annealing scheme. We point out here that the conventional fragment replacement algorithm violates the detailed balance condition. This precludes application of various generalized ensemble algorithms, which would have made conformational sampling more efficient. We develop here a reversible variant of the fragment assembly algorithm which satisfies the detailed balance and thus is applicable to the generalized ensemble techniques. We combine this method with the multicanonical ensemble Monte Carlo, one of the generalized ensemble approaches, and test its performance in the structure prediction of three proteins. We show that the new method can find low energy conformations more efficiently than the conventional simulated annealing method. Also importantly, the lowest energy structures found by the new method are closer to the native than those by the simulated annealing. It seems that conformations with more complex topology can be generated by the new algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.