The accuracy of sequence‐based tertiary contact predictions was assessed in a blind prediction experiment at the CASP13 meeting. After 4 years of significant improvements in prediction accuracy, another dramatic advance has taken place since CASP12 was held 2 years ago. The precision of predicting the top L/5 contacts in the free modeling category, where L is the corresponding length of the protein in residues, has exceeded 70%. As a comparison, the best‐performing group at CASP12 with a 47% precision would have finished below the top 1/3 of the CASP13 groups. Extensively trained deep neural network approaches dominate the top performing algorithms, which appear to efficiently integrate information on coevolving residues and interacting fragments or possibly utilize memories of sequence similarities and sometimes can deliver accurate results even in the absence of virtually any target specific evolutionary information. If the current performance is evaluated by F‐score on L contacts, it stands around 24% right now, which, despite the tremendous impact and advance in improving its utility for structure modeling, also suggests that there is much room left for further improvement.
With the advance of experimental procedures obtaining chemical crosslinking information is becoming a fast and routine practice. Information on crosslinks can greatly enhance the accuracy of protein structure modeling. Here, we review the current state of the art in modeling protein structures with the assistance of experimentally determined chemical crosslinks within the framework of the 13th meeting of Critical Assessment of Structure Prediction approaches. This largest‐to‐date blind assessment reveals benefits of using data assistance in difficult to model protein structure prediction cases. However, in a broader context, it also suggests that with the unprecedented advance in accuracy to predict contacts in recent years, experimental crosslinks will be useful only if their specificity and accuracy further improved and they are better integrated into computational workflows.
Conformational sampling is one of the bottlenecks in fragment-based protein structure prediction approaches. They generally start with a coarse-grained optimization where mainchain atoms and centroids of side chains are considered, followed by a fine-grained optimization with an all-atom representation of proteins. It is during this coarse-grained phase that fragment-based methods sample intensely the conformational space. If the native-like region is sampled more, the accuracy of the final all-atom predictions may be improved accordingly. In this work we present EdaFold, a new method for fragment-based protein structure prediction based on an Estimation of Distribution Algorithm. Fragment-based approaches build protein models by assembling short fragments from known protein structures. Whereas the probability mass functions over the fragment libraries are uniform in the usual case, we propose an algorithm that learns from previously generated decoys and steers the search toward native-like regions. A comparison with Rosetta AbInitio protocol shows that EdaFold is able to generate models with lower energies and to enhance the percentage of near-native coarse-grained decoys on a benchmark of proteins. The best coarse-grained models produced by both methods were refined into all-atom models and used in molecular replacement. All atom decoys produced out of EdaFold’s decoy set reach high enough accuracy to solve the crystallographic phase problem by molecular replacement for some test proteins. EdaFold showed a higher success rate in molecular replacement when compared to Rosetta. Our study suggests that improving low resolution coarse-grained decoys allows computational methods to avoid subsequent sampling issues during all-atom refinement and to produce better all-atom models. EdaFold can be downloaded from http://www.riken.jp/zhangiru/software/.
Ab initio phasing is one of the remaining challenges in protein crystallography. Recent progress in computational structure prediction has enabled the generation of de novo models with high enough accuracy to solve the phase problem ab initio. This `ab initio phasing with de novo models' method first generates a huge number of de novo models and then selects some lowest energy models to solve the phase problem using molecular replacement. The amount of CPU time required is huge even for small proteins and this has limited the utility of this method. Here, an approach is described that significantly reduces the computing time required to perform ab initio phasing with de novo models. Instead of performing molecular replacement after the completion of all models, molecular replacement is initiated during the course of each simulation. The approach principally focuses on avoiding the refinement of the best and the worst models and terminating the entire simulation early once suitable models for phasing have been obtained. In a benchmark data set of 20 proteins, this method is over two orders of magnitude faster than the conventional approach. It was observed that in most cases molecular-replacement solutions were determined soon after the coarse-grained models were turned into full-atom representations. It was also found that all-atom refinement was hardly able to change the models sufficiently to enable successful molecular replacement if the coarse-grained models were not very close to the native structure. Therefore, it remains critical to generate good-quality coarse-grained models to enable subsequent all-atom refinement for successful ab initio phasing by molecular replacement.
Our C++ software is released under the GNU General Public License. It can be downloaded from http://www.riken.jp/zhangiru/software/durandal_released.tgz.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.