We describe an automated procedure for protein design, implemented in a flexible software package, called Proteus. System setup and calculation of an energy matrix are done with the XPLOR modeling program and its sophisticated command language, supporting several force fields and solvent models. A second program provides algorithms to search sequence space. It allows a decomposition of the system into groups, which can be combined in different ways in the energy function, for both positive and negative design. The whole procedure can be controlled by editing 2-4 scripts. Two applications consider the tyrosyl-tRNA synthetase enzyme and its successful redesign to bind both O-methyl-tyrosine and D-tyrosine. For the latter, we present Monte Carlo simulations where the D-tyrosine concentration is gradually increased, displacing L-tyrosine from the binding pocket and yielding the binding free energy difference, in good agreement with experiment. Complete redesign of the Crk SH3 domain is presented. The top 10000 sequences are all assigned to the correct fold by the SUPERFAMILY library of Hidden Markov Models. Finally, we report the acid/base behavior of the SNase protein. Sidechain protonation is treated as a form of mutation; it is then straightforward to perform constant-pH Monte Carlo simulations, which yield good agreement with experiment. Overall, the software can be used for a wide range of application, producing not only native-like sequences but also thermodynamic properties with errors that appear comparable to other current software packages.
Titratable residues determine the acid/base behavior of proteins, strongly influencing their function; in addition, proton binding is a valuable reporter on electrostatic interactions. We describe a method for pK(a) calculations, using constant-pH Monte Carlo (MC) simulations to explore the space of sidechain conformations and protonation states, with an efficient and accurate generalized Born model (GB) for the solvent effects. To overcome the many-body dependency of the GB model, we use a "Native Environment" approximation, whose accuracy is shown to be good. It allows the precalculation and storage of interactions between all sidechain pairs, a strategy borrowed from computational protein design, which makes the MC simulations themselves very fast. The method is tested for 12 proteins and 167 titratable sidechains. It gives an rms error of 1.1 pH units, similar to the trivial "Null" model. The only adjustable parameter is the protein dielectric constant. The best accuracy is achieved for values between 4 and 8, a range that is physically plausible for a protein interior. For sidechains with large pKa shifts, ≥2, the rms error is 1.6, compared to 2.5 with the Null model and 1.5 with the empirical PROPKA method.
The acid/base properties of proteins are essential in biochemistry, and proton binding is a valuable reporter on electrostatic interactions. We propose a constant-pH Monte Carlo strategy to compute protonation free energies and pK(a)'s. The solvent is described implicitly, through a generalized Born model. The electronic polarizability and backbone motions of the protein are included through the protein dielectric constant. Side chain motions are described explicitly, by the Monte Carlo scheme. An efficient computational algorithm is described, which allows us to treat the fluctuating shape of the protein/solvent boundary in a way that is numerically exact (within the GB framework); this contrasts with several previous constant-pH approaches. For a test set of six proteins and 78 titratable groups, the model performs well, with an rms error of 1.2 pH units. While this is slightly greater than a simple Null model (rms error of 1.1) and a fully empirical model (rms error of 0.9), it is obtained using physically meaningful model parameters, including a low protein dielectric of four. Importantly, similar performance is obtained for side chains with large and small pK(a) shifts (relative to a standard model compound). The titration curve slopes and the conformations sampled are reasonable. Several directions to improve the method further are discussed.
The HLA locus is the strongest risk factor for anti-citrullinated protein antibody (ACPA) þ rheumatoid arthritis (RA). Despite considerable efforts in the last 35 years, this association is poorly understood. Here we identify (citrullinated) vinculin, present in the joints of ACPA þ RA patients, as an autoantigen targeted by ACPA and CD4 þ T cells. These T cells recognize an epitope with the core sequence DERAA, which is also found in many microbes and in protective HLA-DRB1*13 molecules, presented by predisposing HLA-DQ molecules. Moreover, these T cells crossreact with vinculin-derived and microbial-derived DERAA epitopes. Intriguingly, DERAA-directed T cells are not detected in HLA-DRB1*13 þ donors, indicating that the DERAA epitope from HLA-DRB1*13 mediates (thymic) tolerance in these donors and explaining the protective effects associated with HLA-DRB1*13. Together our data indicate the involvement of pathogen-induced DERAA-directed T cells in the HLA-RA association and provide a molecular basis for the contribution of protective/predisposing HLA alleles.
Computational Protein Design (CPD) is a promising method for high throughput protein and ligand mutagenesis. Recently, we developed a CPD method that used a polar-hydrogen energy function for protein interactions and a Coulomb/Accessible Surface Area (CASA) model for solvent effects. We applied this method to engineer aspartyl-adenylate (AspAMP) specificity into Asparaginyl-tRNA synthetase (AsnRS), whose substrate is asparaginyl-adenylate (AsnAMP). Here, we implement a more accurate function, with an all-atom energy for protein interactions and a residue-pairwise generalized Born model for solvent effects. As a first test, we compute aminoacid affinities for several point mutants of Aspartyl-tRNA synthetase (AspRS) and Tyrosyl-tRNA synthetase and stability changes for three helical peptides and compare with experiment. As a second test, we readdress the problem of AsnRS aminoacid engineering. We compare three design criteria, which optimize the folding free-energy, the absolute AspAMP affinity, and the relative (AspAMP-AsnAMP) affinity. The sequences and conformations are improved with respect to our previous, polar-hydrogen/CASA study: For several designed complexes, the AspAMP carboxylate forms three interactions with a conserved arginine and a designed lysine, as in the active site of the AspRS:AspAMP complex. The conformations and interactions are well maintained in molecular dynamics simulations and the sequences have an inverted specificity, favoring AspAMP over AsnAMP. The method is not fully successful, since experimental measurements with the seven most promising sequences show that they do not catalyze at a detectable level the adenylation of Asp (or Asn) with ATP. This may be due to weak AspAMP binding and/or disruption of transition-state stabilization.
Generalized Born (GB) solvent models are common in acid/base calculations and protein design. With GB, the interaction between a pair of solute atoms depends on the shape of the protein/solvent boundary and, therefore, the positions of all solute atoms, so that GB is a many-body potential. For compute-intensive applications, the model is often simplified further, by introducing a mean, native-like protein/solvent boundary, which removes the many-body property. We investigate a method for both acid/base calculations and protein design that uses Monte Carlo simulations in which side chains can explore rotamers, bind/release protons, or mutate. The fluctuating protein/solvent dielectric boundary is treated in a way that is numerically exact (within the GB framework), in contrast to a mean boundary. Its originality is that it captures the many-body character while retaining the residue-pairwise complexity given by a fixed boundary. The method is implemented in the Proteus protein design software. It yields a slight but systematic improvement for acid/base constants in nine proteins and a significant improvement for the computational design of three PDZ domains. It eliminates a source of model uncertainty, which will facilitate the analysis of other model limitations. © 2017 Wiley Periodicals, Inc.
Implicit solvent models are important for many biomolecular simulations. The polarity of aqueous solvent is essential and qualitatively captured by continuum electrostatics methods like Generalized Born (GB). However, GB does not account for the solvent-induced interactions between exposed hydrophobic sidechains or solute-solvent dispersion interactions. These "nonpolar" effects are often modeled through surface area (SA) energy terms, which lack realism, create mathematical singularities, and have a many-body character. We have explored an alternate, Lazaridis-Karplus (LK) gaussian energy density for nonpolar effects and a dispersion (DI) energy term proposed earlier, associated with GB electrostatics. We parameterized several combinations of GB, SA, LK, and DI energy terms, to reproduce 62 small molecule solvation free energies, 387 protein stability changes due to point mutations, and the structures of 8 protein loops. With optimized parameters, the models all gave similar results, with GBLK and GBDILK giving no performance loss compared to GBSA, and mean errors of 1.7 kcal/mol for the stability changes and 2 Å deviations for the loop conformations. The optimized GBLK model gave poor results in MD of the Trpcage mini-protein, but parameters optimized specifically for MD performed well for Trpcage and three other small proteins. Overall, the LK and DI nonpolar terms are valid alternatives to SA treatments for a range of applications. © 2017 Wiley Periodicals, Inc.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.