Motivation With the breakthrough of AlphaFold2, the protein structure prediction problem has made remarkable progress through deep learning end-to-end techniques, in which correct folds could be built for nearly all single-domain proteins. However, the full-chain modelling appears to be lower on average accuracy than that for the constituent domains and requires higher demand on computing hardware, indicating the performance of full-chain modelling still needs to be improved. In this study, we investigate whether the predicted accuracy of the full-chain model can be further improved by domain assembly assisted by deep learning. Results In this article, we developed a structural analogue-based protein structure domain assembly method assisted by deep learning, named SADA. In SADA, a multi-domain protein structure database (MPDB) was constructed for the full-chain analogue detection using individual domain models. Starting from the initial model constructed from the analogue, the domain assembly simulation was performed to generate the full-chain model through a two-stage differential evolution algorithm guided by the energy function with an inter-residue distance potential predicted by deep learning. SADA was compared with the state-of-the-art domain assembly methods on 356 benchmark proteins, and the average TM-score of SADA models is 8.1% and 27.0% higher than that of DEMO and AIDA, respectively. We also assembled 293 human multi-domain proteins, where the average TM-score of the full-chain model after the assembly by SADA is 1.1% higher than that of the model by AlphaFold2. To conclude, we find that the domains often interact in the similar way in the quaternary orientations if the domains have similar tertiary structures. Furthermore, homologous templates and structural analogues are complementary for multi-domain protein full-chain modelling. Availability http://zhanglab-bioinf.com/SADA Supplementary information Supplementary data are available at Bioinformatics online.
Motivation: With the breakthrough of AlphaFold2, the protein structure prediction problem has made a remarkable progress through end-to-end deep learning techniques, in which correct folds could be built for nearly all single-domain proteins. However, the full-chain modelling appears to be lower on average accuracy than that for the constituent domains and requires higher demand on computing hardware, indicating the performance of full-chain modelling still needs to be improved. In this study, we investigate whether the predicted accuracy of full-chain model can be further improved by domain assembly assisted by deep learning. Results: In this article, we developed a structural analogue-based protein structure domain assembly method assisted by deep learning, named SADA. In SADA, a multi-domain protein structure database (MPDB) was constructed for the full-chain analogue detection using individual domain models. Starting from the initial model constructed from the analogue, the domain assembly simulation was performed to generate the full-chain model through a two-stage differential evolution algorithm guided by the physics-based force field and an inter-residue distance potential predicted by deep learning. SADA was compared with the state-of-the-art domain assembly methods on 356 benchmark proteins, and the average TM-score of SADA models is 8.1% and 27.0% higher than that of DEMO and AIDA, respectively. We also assembled full-chain models of 20 human multi-domain proteins using individual domain models independently predicted by AlphaFold2, where the SADA full-chain models obtained a 4.8% higher average TM-score than full-chain models directly predicted by AlphaFold2 and fewer computing resources were required. In addition, we also find that the domains often interact in the similar way in the quaternary orientations if the domains have similar tertiary structures. Furthermore, homologous templates and structural analogues are complementary for multi-domain protein full-chain modelling.
Motivation: With the breakthrough of AlphaFold2 and the publication of AlphaFold DB, the protein structure prediction has made remarkable progress, which may further promote many potential applications of proteomics in all areas of life. However, it should be noted that AlphaFold2 models tend to represent only a single static structure, and accurately predicting multiple conformations remains a challenge. Therefore, it is essential to develop methods for predicting multiple conformations, which enable us to gain knowledge of multiple conformational states and the broader conformational landscape to better understand the mechanism of action. Results: In this work, we proposed a multiple conformational states folding method using the distance-based multi-objective evolutionary algorithm framework, named MultiSFold. First, a multi-objective energy landscape with multiple competing constraints generated by deep learning is constructed. Then, an iterative modal exploration and exploitation strategy based on multi-objective optimization, geometric optimization and structural similarity clustering is designed to perform conformational sampling. Finally, the final population is generated using a loop-specific perturbation strategy to adjust the spatial orientations. MultiSFold was compared with state-of-the-art methods on a developed benchmark testset containing 81 proteins with two representative conformational states. Based on the proposed metric, the success ratio of MultiSFold predicting multiple conformations was 70.4% while that of AlphaFold2 was 9.88%, which may indicate that conformational sampling combined with knowledge gained through deep learning has the potential to produce conformations spanned the range between two experimental structures. In addition, MultiSFold was tested on 244 human proteins with low structural accuracy in AlphaFold DB to test whether it could further improve the accuracy of static structures. The experimental results demonstrate that the TM-score of MultiSFold is 2.97% and 7.72% higher than that of AlphaFold2 and RoseTTAFold, respectively, supporting our hypothesis that multiple competing optimization objectives can further assist conformational search to improve prediction accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.