The empirical force field Fold-X was developed previously to allow rapid free energy calculations in proteins. Here, we present an enhanced version of the force field allowing prediction of the position of structural water molecules and metal ions, together called single atom ligands. Fold-X picks up 76% of water molecules found to interact with two or more polar atoms of proteins in high-resolution crystal structures and predicts their position to within 0.8 Å on average. The prediction of metal ion-binding sites have success rates between 90% and 97% depending on the metal, with an overall standard deviation on the position of binding of 0.3-0.6 Å. The following metals were included in the force field: Mg 2؉ , Ca 2؉ , Zn 2؉ , Mn 2؉ , and Cu 2؉ . As a result, the current version of Fold-X can accurately decorate a protein structure with biologically important ions and water molecules. Additionally, the free energy of binding of Ca 2؉ and Zn 2؉ (i.e., the natural logarithm of the dissociation constant) and its dependence on ionic strength correlate reasonably well with the experimental data available in the literature, allowing one to discriminate between high-and low-affinity binding sites. Importantly, the accuracy of the energy prediction presented here is sufficient to efficiently discriminate between Mg 2؉ , Ca 2؉ , and Zn 2؉ binding.ion ͉ water bridge ͉ calcium ͉ zinc ͉ structural water A wide range of strategies have been developed for estimating interaction energies in proteins. These methods generally either derive pseudoenergies from the statistical analysis of protein structural databases or, alternatively, aim at calculating energies based on explicit physical models (1, 2). Empirical force fields such as Fold-X, conversely, rely directly on structureactivity data from protein-engineering experiments to calculate interaction energies (1, 3). There are multiple advantages to empirical force fields. First, they are perfectly geared to the simulation of biological macromolecules, because the calibration data and simulated model systems are on a similar scale of complexity. Second, because empirical force fields rely on structure-activity information, they provide a rationale for the physical interpretation of changes in free energy. Finally, empirical force fields such as Fold-X are designed to allow fast and accurate estimations of free energy changes upon mutation in proteins or protein complexes. Fold-X has similar accuracy as physical force fields for prediction of free energy changes, yet it is many orders of magnitude faster, because the estimation of entropic contributions to protein interactions is directly derived from the structure using a statistical thermodynamics approach. As such, Fold-X provides a powerful tool for high-throughput structure-activity analyses of proteomes (4, 5), prediction of protein-folding pathways (6, 7), or protein design (8). The Fold-X force field is composed of a solvation term, a van der Waals term, H-bond, and electrostatic terms and entropic terms for the backbo...
A key component of computational biology is to compare the results of computer modelling with experimental measurements. Despite substantial progress in the models and algorithms used in many areas of computational biology, such comparisons sometimes reveal that the computations are not in quantitative agreement with experimental data. The principle of maximum entropy is a general procedure for constructing probability distributions in the light of new data, making it a natural tool in cases when an initial model provides results that are at odds with experiments. The number of maximum entropy applications in our field has grown steadily in recent years, in areas as diverse as sequence analysis, structural modelling, and neurobiology. In this Perspectives article, we give a broad introduction to the method, in an attempt to encourage its further adoption. The general procedure is explained in the context of a simple example, after which we proceed with a real-world application in the field of molecular simulations, where the maximum entropy procedure has recently provided new insight. Given the limited accuracy of force fields, macromolecular simulations sometimes produce results that are at not in complete and quantitative accordance with experiments. A common solution to this problem is to explicitly ensure agreement between the two by perturbing the potential energy function towards the experimental data. So far, a general consensus for how such perturbations should be implemented has been lacking. Three very recent papers have explored this problem using the maximum entropy approach, providing both new theoretical and practical insights to the problem. We highlight each of these contributions in turn and conclude with a discussion on remaining challenges.
Despite significant progress in recent years, protein structure prediction maintains its status as one of the prime unsolved problems in computational biology. One of the key remaining challenges is an efficient probabilistic exploration of the structural space that correctly reflects the relative conformational stabilities. Here, we present a fully probabilistic, continuous model of local protein structure in atomic detail. The generative model makes efficient conformational sampling possible and provides a framework for the rigorous analysis of local sequence-structure correlations in the native state. Our method represents a significant theoretical and practical improvement over the widely used fragment assembly technique by avoiding the drawbacks associated with a discrete and nonprobabilistic approach.conformational sampling ͉ directional statistics ͉ probabilistic model ͉ TorusDBN ͉ Bayesian network P rotein structure prediction remains one of the greatest challenges in computational biology. The problem itself is easily posed: predict the three-dimensional structure of a protein given its amino acid sequence. Significant progress has been made in the last decade, and, especially, knowledge-based methods are becoming increasingly accurate in predicting structures of small globular proteins (1). In such methods, an explicit treatment of local structure has proven to be an important ingredient. The search through conformational space can be greatly simplified through the restriction of the angular degrees of freedom in the protein backbone by allowing only angles that are known to appear in the native structures of real proteins. In practice, the angular preferences are typically enforced by using a technique called fragment assembly. The idea is to select a set of small structural fragments with strong sequence-structure relationships from the database of solved structures and subsequently assemble these building blocks to form complete structures. Although the idea was originally conceived in crystallography (2), it had a great impact on the protein structureprediction field when it was first introduced a decade ago (3). Today, fragment assembly stands as one of the most important single steps forward in tertiary structure prediction, contributing significantly to the progress we have seen in this field in recent years (4, 5).Despite their success, fragment-assembly approaches generally lack a proper statistical foundation, or equivalently, a consistent way to evaluate their contributions to the global free energy. When a fragment-assembly method is used, structure prediction normally proceeds by a Markov Chain Monte Carlo (MCMC) algorithm, where candidate structures are proposed by the fragment assembler and then accepted or rejected based on an energy function. The theoretical basis of MCMC is the existence of a stationary probability distribution dictating the transition probabilities of the Markov chain. In the context of statistical physics, this stationary distribution is given by the conformational ...
SummaryCancer cells acquire pathological phenotypes through accumulation of mutations that perturb signaling networks. However, global analysis of these events is currently limited. Here, we identify six types of network-attacking mutations (NAMs), including changes in kinase and SH2 modulation, network rewiring, and the genesis and extinction of phosphorylation sites. We developed a computational platform (ReKINect) to identify NAMs and systematically interpreted the exomes and quantitative (phospho-)proteomes of five ovarian cancer cell lines and the global cancer genome repository. We identified and experimentally validated several NAMs, including PKCγ M501I and PKD1 D665N, which encode specificity switches analogous to the appearance of kinases de novo within the kinome. We discover mutant molecular logic gates, a drift toward phospho-threonine signaling, weakening of phosphorylation motifs, and kinase-inactivating hotspots in cancer. Our method pinpoints functional NAMs, scales with the complexity of cancer genomes and cell signaling, and may enhance our capability to therapeutically target tumor-specific networks.
The determination of conformational preferences in unfolded and disordered proteins is an important challenge in structural biology. We here describe an algorithm to optimize energy functions for the simulation of unfolded proteins. The procedure is based on the maximum likelihood principle and employs a fast and efficient gradient descent method to find the set of parameters of the energy function that best explain the experimental data. We first validate the method by using synthetic reference data, and subsequently apply the algorithms to data from nuclear magnetic resonance spin-labeling experiments on the Delta131Delta fragment of Staphylococcal nuclease. A significant strength of the procedure that we present is that it directly uses experimental data to optimize the energy parameters, without relying on the availability of high resolution structures. The procedure is fully general and can be applied to a range of experimental data and energy functions including the force fields used in molecular dynamics simulations.
Single nucleotide polymorphisms (SNPs) are an increasingly important tool for genetic and biomedical research. However, the accumulated sequence information on allelic variation is not matched by an understanding of the effect of SNPs on the functional attributes or 'molecular phenotype' of a protein.Towards this aim we developed SNPeffect, an online resource of human non-synonymous coding SNPs (nsSNPs) mapping phenotypic effects of allelic variation in human genes. SNPeffect contains 31 659 nsSNPs from 12 480 human proteins. The current release of SNPeffect incorporates data on protein stability, integrity of functional sites, protein phosphorylation and glycosylation, subcellular localization, protein turnover rates, protein aggregation, amyloidosis and chaperone interaction. The SNP entries are accessible through both a search and browse interface and are linked to most major biological databases. The data can be displayed as detailed descriptions of individual SNPs or as an overview of all SNPs for a given protein. SNPeffect will be regularly updated and can be accessed at http://snpeffect.vib.be/.
SmartCell has been developed to be a general framework for modelling and simulation of diffusion-reaction networks in a whole-cell context. It supports localisation and diffusion by using a mesoscopic stochastic reaction model. The SmartCell package can handle any cell geometry, considers different cell compartments, allows localisation of species, supports DNA transcription and translation, membrane diffusion and multistep reactions, as well as cell growth. Moreover, different temporal and spatial constraints can be applied to the model. A GUI interface that facilitates model making is also available. In this work we discuss limitations and advantages arising from the approach used in SmartCell and determine the impact of localisation on the behaviour of simple well-defined networks, previously analysed with differential equations. Our results show that this factor might play an important role in the response of networks and cannot be neglected in cell simulations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.