Water clusters (H2O)20 and (H2O)25 are explored at the Møller-Plesset second-order perturbation (MP2) level of theory. Geometry optimization is carried out on favorable structures, initially generated by the temperature basin paving (TBP) method, utilizing the fragment-based molecular tailoring approach (MTA). MTA-based stabilization energies at the complete basis set limit are accurately estimated by grafting the energy correction using a smaller basis set. For prototypical cases, the minima are established via MTA-based vibrational frequency calculations at the MP2/aug-cc-pVDZ level. The potential of MTA in tackling large clusters is further demonstrated by performing geometry optimization at MP2/aug-cc-pVDZ starting with the global minimum of (H2O)30 reported by Monte Carlo (MC) and molecular dynamics (MD) investigations. The present study brings out the efficacy of MTA in performing computationally expensive ab initio calculations with minimal off-the-shelf hardware without significant loss of accuracy.
Determining low-energy structures of large water clusters is a challenge for any optimization algorithm. In this work, we have developed a new Monte Carlo (MC)-based method, temperature basin paving (TBP), which is related to the well-known basin hopping method. In the TBP method, the Boltzmann weight factor used in MC methods is dynamically modified based on the history of the simulation. The states that are visited more are given a lower probability by increasing their temperatures and vice versa. This allows faster escapes from the states frequently visited in the simulation. We have used the TBP method to find a large number of low-energy minima of water clusters of size 20 and 25. We have found structures energetically same to the global minimum structures known for these two clusters. We have compared the efficiency of this method to the basin-hopping method and found that it can locate the minima faster. Statistical efficiency of the new method has been investigated by running a large number of trajectories. The new method can locate low-energy structures of both the clusters faster than some of the reported algorithms for water clusters and can switch between high energy and low-energy structures multiple times in a simulation illustrating its efficiency. The large number of minima obtained from the simulations is used to get both general and specific features of the minima. The distribution of minima for these two clusters based on the similarity of their oxygen frames shows that the (H(2)O)(20) can have different variety of structures, but for (H(2)O)(25), low-energy structures are mostly cagelike. Several (H(2)O)(25) structures are found with similar energy but with different cage architectures. Noncage structures of (H(2)O)(25) are also found but they are 6-7 kcal/mol higher in energy from the global minimum. The TBP method is likely to play an important role for exploring the complex energy landscape of large molecules.
As a common protein modification, asparagine-linked (N-linked) glycosylation has the capacity to greatly influence the biological and biophysical properties of proteins. However, the routine use of glycosylation as a strategy for engineering proteins with advantageous properties is limited by our inability to construct and screen large collections of glycoproteins for cataloguing the consequences of glycan installation. To address this challenge, we describe a combinatorial strategy termed shotgun scanning glycomutagenesis in which DNA libraries encoding all possible glycosylation site variants of a given protein are constructed and subsequently expressed in glycosylation-competent bacteria, thereby enabling rapid determination of glycosylatable sites in the protein. The resulting neoglycoproteins can be readily subjected to available high-throughput assays, making it possible to systematically investigate the structural and functional consequences of glycan conjugation along a protein backbone. The utility of this approach was demonstrated with three different acceptor proteins, namely bacterial immunity protein Im7, bovine pancreatic ribonuclease A, and human anti-HER2 single-chain Fv antibody, all of which were found to tolerate N-glycan attachment at a large number of positions and with relatively high efficiency. The stability and activity of many glycovariants was measurably altered by N-linked glycans in a manner that critically depended on the precise location of the modification. Structural models suggested that affinity was improved by creating novel interfacial contacts with a glycan at the periphery of a protein–protein interface. Importantly, we anticipate that our glycomutagenesis workflow should provide access to unexplored regions of glycoprotein structural space and to custom-made neoglycoproteins with desirable properties.
The accurate prediction of protein structures achieved by deep learning (DL) methods is a significant milestone and has deeply impacted structural biology. Shortly after its release, AlphaFold2 has been evaluated for predicting protein–peptide interactions and shown to significantly outperform RoseTTAfold as well as a conventional blind docking method: PIPER-FlexPepDock. Since then, new AlphaFold2 models, trained specifically to predict multimeric assemblies, have been released and a new ab initio folding model OmegaFold has become available. Here, we assess docking success rates for these new DL folding models and compare their performance with our state-of-the-art, focused peptide-docking software AutoDock CrankPep (ADCP). The evaluation is done using the same dataset and performance metric for all methods. We show that, for a set of 99 nonredundant protein–peptide complexes, the new AlphaFold2 model outperforms other Deep Learning approaches and achieves remarkable docking success rates for peptides. While the docking success rate of ADCP is more modest when considering the top-ranking solution only, it samples correct solutions for around 62% of the complexes. Interestingly, different methods succeed on different complexes, and we describe a consensus docking approach using ADCP and AlphaFold2, which achieves a remarkable 60% for the top-ranking results and 66% for the top 5 results for this set of 99 protein–peptide complexes.
Carbohydrates dynamically and transiently interact with proteins for cell–cell recognition, cellular differentiation, immune response, and many other cellular processes. Despite the molecular importance of these interactions, there are currently few reliable computational tools to predict potential carbohydrate-binding sites on any given protein. Here, we present two deep learning (DL) models named CArbohydrate–Protein interaction Site IdentiFier (CAPSIF) that predicts non-covalent carbohydrate-binding sites on proteins: (1) a 3D-UNet voxel-based neural network model (CAPSIF:V) and (2) an equivariant graph neural network model (CAPSIF:G). While both models outperform previous surrogate methods used for carbohydrate-binding site prediction, CAPSIF:V performs better than CAPSIF:G, achieving test Dice scores of 0.597 and 0.543 and test set Matthews correlation coefficients (MCCs) of 0.599 and 0.538, respectively. We further tested CAPSIF:V on AlphaFold2-predicted protein structures. CAPSIF:V performed equivalently on both experimentally determined structures and AlphaFold2-predicted structures. Finally, we demonstrate how CAPSIF models can be used in conjunction with local glycan-docking protocols, such as GlycanDock, to predict bound protein–carbohydrate structures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.