Supplemental methods Homology ModellingThe homology model of the NatA WT complex was built with Modeller version 9.8 (1) using the NatA complex of S. pombe as a template. The sequence identity between the target and the template is around 67% and 37% for the subunits Naa10 and Naa15, respectively (Figure S1). The sequence alignments were obtained using ClustalW. Thirty models were generated and evaluated using the Discrete Optimized Protein Energy potential (DOPE score). The models with the lowest overall DOPE score were selected. The S37P NatA complex was designed from the human WT NatA complex using SCWRL instead of Modeller, in order to have starting structures as similar as possible as to avoid bias in the comparison between the wild type and mutant NatA complex. System PreparationPROPKA (2) was used to determine the protonation state of histidines. All other titratable groups were modelled in their standard protonation states. Hydrogen atoms were constructed using the HBUILD module of the CHARMM program (3). The complexes were solvated in cubic boxes of TIP3P water (4) with 120 Å-long edges. Water molecules overlapping the proteins, determined by a cut-off of 2.8Å, were removed. Molecular dynamicsMolecular dynamics (MD) simulations were used to explore the conformational space. As the aim was to uncover differences between the two systems, we ran long simulations (100 ns) in order to allow the systems to rearrange. These simulations were performed at a temperature of 300K using the NAMD program (5) and the CHARMM27 force field (4). The SHAKE algorithm was used to constrain all bonds between hydrogen and heavy atoms. Non-bonded interactions were truncated at a cut-off of 14Å, using a switch function for both the van der Waals, and electrostatic interactions (6). The particle-mesh Ewald algorithm (7) was used to evaluate the long range electrostatic interactions. The system was subjected to an energy minimization of 1000 steps using the conjugated gradient algorithm, followed by a gradual heating consisting of four successive simulations at temperatures of 10K, 100K, 200K and 300K. This was followed by a 1 ns equilibration phase during which velocities were reassigned every picosecond. The production phase consisted of a 100 ns simulation in the NPT ensemble, with a time step of 1fs. Two simulations (replicas) using a different set of
Protein structures in the Protein Data Bank provide a wealth of data about the interactions that determine the native states of proteins. Using the probability theory, we derive an atomic distancedependent statistical potential from a sample of native structures that does not depend on any adjustable parameters (Discrete Optimized Protein Energy, or DOPE). DOPE is based on an improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. The DOPE potential was extracted from a nonredundant set of 1472 crystallographic structures. We tested DOPE and five other scoring functions by the detection of the native state among six multiple target decoy sets, the correlation between the score and model error, and the identification of the most accurate non-native structure in the decoy set. For all decoy sets, DOPE is the best performing function in terms of all criteria, except for a tie in one criterion for one decoy set. To facilitate its use in various applications, such as model assessment, loop modeling, and fitting into cryo-electron microscopy mass density maps combined with comparative protein structure modeling, DOPE was incorporated into the modeling package MODELLER-8.
Functional characterization of a protein sequence is a common goal in biology, and is usually facilitated by having an accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Genome sequencing projects have resulted in a rapid increase in the number of known protein sequences. In contrast, only about one-hundredth of these sequences have been characterized using experimental structure determination methods. Computational protein structure modeling techniques have the potential to bridge this sequence-structure gap. This chapter presents an example that illustrates the use of MODELLER to construct a comparative model for a protein with unknown structure. Automation of similar protocols (correction of protcols) has resulted in models of useful accuracy for domains in more than half of all known protein sequences.
Reliable prediction of model accuracy is an important unsolved problem in protein structure modeling. To address this problem, we studied 24 individual assessment scores, including physics-based energy functions, statistical potentials, and machine learning-based scoring functions. Individual scores were also used to construct ;85,000 composite scoring functions using support vector machine (SVM) regression. The scores were tested for their abilities to identify the most native-like models from a set of 6000 comparative models of 20 representative protein structures. Each of the 20 targets was modeled using a template of <30% sequence identity, corresponding to challenging comparative modeling cases. The best SVM score outperformed all individual scores by decreasing the average RMSD difference between the model identified as the best of the set and the model with the lowest RMSD (DRMSD) from 0.63 Å to 0.45 Å , while having a higher Pearson correlation coefficient to RMSD (r ¼ 0.87) than any other tested score. The most accurate score is based on a combination of the DOPE non-hydrogen atom statistical potential; surface, contact, and combined statistical potentials from MODPIPE; and two PSIPRED/DSSP scores. It was implemented in the SVMod program, which can now be applied to select the final model in various modeling problems, including fold assignment, target-template alignment, and loop modeling.Keywords: model assessment; comparative modeling; fold assignment; statistical potentials; support vector machine; protein structure prediction Supplemental material: see www.proteinscience.org Genomics efforts are providing researchers with the genomes of many species, including Homo sapiens. More difficult tasks lie ahead in annotating, understanding, and modifying the functions of the proteins encoded by these genomes. The structures of proteins aid in these efforts, as the biochemical function of a protein is determined by its structure and dynamics. Atomic structures can be determined for a small subset of proteins by X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy. However, for many proteins of interest, such methods are often costly, time-consuming, and challenging. In the absence of an experimentally determined structure, structure models are often valuable for rationalizing existing evidence and guiding new experiments (Baker and Sali 2001 1653The accuracy of a model determines its utility, making a means of reliably determining the accuracy of a model an important problem in protein structure prediction (Baker and Sali 2001;Ginalski et al. 2005). Model assessment has been previously applied to (1) determine whether or not a model has the correct fold (Miyazawa and Jernigan 1996;Domingues et al. 1999;Melo et al. 2002;McGuffin and Jones 2003), (2) discriminate between the native and near-native states (Lazaridis and Karplus 1999a;Gatchell et al. 2000;Vorobjev and Hermans 2001;Seok et al. 2003;Tsai et al. 2003;Zhu et al. 2003), and (3) select the most native-like model in a set o...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.