Motivation After the outstanding breakthrough of AlphaFold in predicting protein 3D models, new questions appeared and remain unanswered. The ensemble nature of proteins, for example, challenges the structural prediction methods because the models should represent a set of conformers instead of single structures. The evolutionary and structural features captured by effective deep learning techniques may unveil the information to generate several diverse conformations from a single sequence. Here we address the performance of AlphaFold2 predictions obtained through ColabFold under this ensemble paradigm. Results Using a curated collection of apo-holo pairs of conformers, we found that AlphaFold2 predicts the holo form of a protein in ∼70% of the cases, being unable to reproduce the observed conformational diversity with the same error for both conformers. More importantly, we found that AlphaFold2's performance worsens with the increasing conformational diversity of the studied protein. This impairment is related to the heterogeneity in the degree of conformational diversity found between different members of the homologous family of the protein under study. Finally, we found that main-chain flexibility associated with apo-holo pairs of conformers negatively correlates with the predicted local model quality score plDDT, indicating that plDDT values in a single 3D model could be used to infer local conformational changes linked to ligand binding transitions. Availability Data and code used in this manuscript are publicly available at https://gitlab.com/sbgunq/publications/af2confdiv-oct2021 Supplementary Information Supplementary data is available at the journal's web site.
Protein motions are a key feature to understand biological function. Recently, a large-scale analysis of protein conformational diversity showed a positively skewed distribution with a peak at 0.5 Å C-alpha root-mean-square-deviation (RMSD). To understand this distribution in terms of structure-function relationships, we studied a well curated and large dataset of ~5,000 proteins with experimentally determined conformational diversity. We searched for global behaviour patterns studying how structure-based features change among the available conformer population for each protein. This procedure allowed us to describe the RMSD distribution in terms of three main protein classes sharing given properties. The largest of these protein subsets (~60%), which we call “rigid” (average RMSD = 0.83 Å), has no disordered regions, shows low conformational diversity, the largest tunnels and smaller and buried cavities. The two additional subsets contain disordered regions, but with differential sequence composition and behaviour. Partially disordered proteins have on average 67% of their conformers with disordered regions, average RMSD = 1.1 Å, the highest number of hinges and the longest disordered regions. In contrast, malleable proteins have on average only 25% of disordered conformers and average RMSD = 1.3 Å, flexible cavities affected in size by the presence of disordered regions and show the highest diversity of cognate ligands. Proteins in each set are mostly non-homologous to each other, share no given fold class, nor functional similarity but do share features derived from their conformer population. These shared features could represent conformational mechanisms related with biological functions.
Protein-protein interactions are essential to all aspects of life. Specific interactions result from evolutionary pressure at the interacting interfaces of partner proteins. However, evolutionary pressure is not homogeneous within the interface: for instance, each residue does not contribute equally to the binding energy of the complex. To understand functional differences between residues within the interface, we analyzed their properties in the core and rim regions. Here, we characterized protein interfaces with two evolutionary measures, conservation and coevolution, using a comprehensive dataset of 896 protein complexes. These scores can detect different selection pressures at a given position in a multiple sequence alignment. We also analyzed how the number of interactions in which a residue is involved influences those evolutionary signals. We found that the coevolutionary signal is higher in the interface core than in the interface rim region. Additionally, the difference in coevolution between core and rim regions is comparable to the known difference in conservation between those regions. Considering proteins with multiple interactions, we found that conservation and coevolution increase with the number of different interfaces in which a residue is involved, suggesting that more constraints (i.e., a residue that must satisfy a greater number of interactions) allow fewer sequence changes at those positions, resulting in higher conservation and coevolution values. These findings shed light on the evolution of protein interfaces and provide information useful for identifying protein interfaces and predicting protein-protein interactions.
Structural differences between conformers sustain protein biological function. Here, we studied in a large dataset of 745 intrinsically disordered proteins, how ordered-disordered transitions modulate structural differences between conformers as derived from crystallographic data. We found that almost 50% of the proteins studied show no transitions and have low conformational diversity while the rest show transitions and a higher conformational diversity. In this last subset, 60% of the proteins become more ordered after ligand binding, while 40% more disordered. As protein conformational diversity is inherently connected with protein function our analysis suggests differences in structure-function relationships related to order-disorder transitions.
Supplementary data are available at Bioinformatics online.
Interprotein contact prediction using multiple sequence alignments (MSAs) is a useful approach to help detect protein–protein interfaces. Different computational methods have been developed in recent years as an approximation to solve this problem. However, as there are discrepancies in the results provided by them, there is still no consensus on which is the best performing methodology. To address this problem, I-COMS (interprotein COrrelated Mutations Server) is presented. I-COMS allows to estimate covariation between residues of different proteins by four different covariation methods. It provides a graphical and interactive output that helps compare results obtained using different methods. I-COMS automatically builds the required MSA for the calculation and produces a rich visualization of either intraprotein and/or interprotein covariating positions in a circos representation. Furthermore, comparison between any two methods is available as well as the overlap between any or all four methodologies. In addition, as a complementary source of information, a matrix visualization of the corresponding scores is made available and the density plot distribution of the inter, intra and inter+intra scores are calculated. Finally, all the results can be downloaded (including MSAs, scores and graphics) for comparison and visualization and/or for further analysis.
Native state of proteins is better represented by an ensemble of conformers in equilibrium than by only one structure. The extension of structural differences between conformers characterizes the conformational diversity of the protein. In this study, we found a negative correlation between conformational diversity and protein evolutionary rate. Conformational diversity was expressed as the maximum root mean square deviation (RMSD) between the available conformers in Conformational Diversity of Native State database. Evolutionary rate estimations were calculated using 16 different species compared with human sharing at least 700 orthologous proteins with known conformational diversity extension. The negative correlation found is independent of the protein expression level and comparable in magnitude and sign with the correlation between gene expression level and evolutionary rate. Our findings suggest that the structural constraints underlying protein dynamism, essential for protein function, could modulate protein divergence.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.