MISTIC (mutual information server to infer coevolution) is a web server for graphical representation of the information contained within a MSA (multiple sequence alignment) and a complete analysis tool for Mutual Information networks in protein families. The server outputs a graphical visualization of several information-related quantities using a circos representation. This provides an integrated view of the MSA in terms of (i) the mutual information (MI) between residue pairs, (ii) sequence conservation and (iii) the residue cumulative and proximity MI scores. Further, an interactive interface to explore and characterize the MI network is provided. Several tools are offered for selecting subsets of nodes from the network for visualization. Node coloring can be set to match different attributes, such as conservation, cumulative MI, proximity MI and secondary structure. Finally, a zip file containing all results can be downloaded. The server is available at http://mistic.leloir.org.ar. In summary, MISTIC allows for a comprehensive, compact, visually rich view of the information contained within an MSA in a manner unique to any other publicly available web server. In particular, the use of circos representation of MI networks and the visualization of the cumulative MI and proximity MI concepts is novel.
Identification of catalytic residues (CR) is essential for the characterization of enzyme function. CR are, in general, conserved and located in the functional site of a protein in order to attain their function. However, many non-catalytic residues are highly conserved and not all CR are conserved throughout a given protein family making identification of CR a challenging task. Here, we put forward the hypothesis that CR carry a particular signature defined by networks of close proximity residues with high mutual information (MI), and that this signature can be applied to distinguish functional from other non-functional conserved residues. Using a data set of 434 Pfam families included in the catalytic site atlas (CSA) database, we tested this hypothesis and demonstrated that MI can complement amino acid conservation scores to detect CR. The Kullback-Leibler (KL) conservation measurement was shown to significantly outperform both the Shannon entropy and maximal frequency measurements. Residues in the proximity of catalytic sites were shown to be rich in shared MI. A structural proximity MI average score (termed pMI) was demonstrated to be a strong predictor for CR, thus confirming the proposed hypothesis. A structural proximity conservation average score (termed pC) was also calculated and demonstrated to carry distinct information from pMI. A catalytic likeliness score (Cls), combining the KL, pC and pMI measures, was shown to lead to significantly improved prediction accuracy. At a specificity of 0.90, the Cls method was found to have a sensitivity of 0.816. In summary, we demonstrate that networks of residues with high MI provide a distinct signature on CR and propose that such a signature should be present in other classes of functional residues where the requirement to maintain a particular function places limitations on the diversification of the structural environment along the course of evolution.
Sialyltransferases are responsible for the synthesis of a diverse range of sialoglycoconjugates predicted to be pivotal to deuterostomes’ evolution. In this work, we reconstructed the evolutionary history of the metazoan α2,3-sialyltransferases family (ST3Gal), a subset of sialyltransferases encompassing six subfamilies (ST3Gal I–ST3Gal VI) functionally characterized in mammals. Exploration of genomic and expressed sequence tag databases and search of conserved sialylmotifs led to the identification of a large data set of st3gal-related gene sequences. Molecular phylogeny and large scale sequence similarity network analysis identified four new vertebrate subfamilies called ST3Gal III-r, ST3Gal VII, ST3Gal VIII, and ST3Gal IX. To address the issue of the origin and evolutionary relationships of the st3gal-related genes, we performed comparative syntenic mapping of st3gal gene loci combined to ancestral genome reconstruction. The ten vertebrate ST3Gal subfamilies originated from genome duplication events at the base of vertebrates and are organized in three distinct and ancient groups of genes predating the early deuterostomes. Inferring st3gal gene family history identified also several lineage-specific gene losses, the significance of which was explored in a functional context. Toward this aim, spatiotemporal distribution of st3gal genes was analyzed in zebrafish and bovine tissues. In addition, molecular evolutionary analyses using specificity determining position and coevolved amino acid predictions led to the identification of amino acid residues with potential implication in functional divergence of vertebrate ST3Gal. We propose a detailed scenario of the evolutionary relationships of st3gal genes coupled to a conceptual framework of the evolution of ST3Gal functions.
BackgroundA large panel of methods exists that aim to identify residues with critical impact on protein function based on evolutionary signals, sequence and structure information. However, it is not clear to what extent these different methods overlap, and if any of the methods have higher predictive potential compared to others when it comes to, in particular, the identification of catalytic residues (CR) in proteins. Using a large set of enzymatic protein families and measures based on different evolutionary signals, we sought to break up the different components of the information content within a multiple sequence alignment to investigate their predictive potential and degree of overlap.ResultsOur results demonstrate that the different methods included in the benchmark in general can be divided into three groups with a limited mutual overlap. One group containing real-value Evolutionary Trace (rvET) methods and conservation, another containing mutual information (MI) methods, and the last containing methods designed explicitly for the identification of specificity determining positions (SDPs): integer-value Evolutionary Trace (ivET), SDPfox, and XDET. In terms of prediction of CR, we find using a proximity score integrating structural information (as the sum of the scores of residues located within a given distance of the residue in question) that only the methods from the first two groups displayed a reliable performance. Next, we investigated to what degree proximity scores for conservation, rvET and cumulative MI (cMI) provide complementary information capable of improving the performance for CR identification. We found that integrating conservation with proximity scores for rvET and cMI achieved the highest performance. The proximity conservation score contained no complementary information when integrated with proximity rvET. Moreover, the signal from rvET provided only a limited gain in predictive performance when integrated with mutual information and conservation proximity scores. Combined, these observations demonstrate that the rvET and cMI scores add complementary information to the prediction system.ConclusionsThis work contributes to the understanding of the different signals of evolution and also shows that it is possible to improve the detection of catalytic residues by integrating structural and higher order sequence evolutionary information with sequence conservation.
Protein-protein interactions are essential to all aspects of life. Specific interactions result from evolutionary pressure at the interacting interfaces of partner proteins. However, evolutionary pressure is not homogeneous within the interface: for instance, each residue does not contribute equally to the binding energy of the complex. To understand functional differences between residues within the interface, we analyzed their properties in the core and rim regions. Here, we characterized protein interfaces with two evolutionary measures, conservation and coevolution, using a comprehensive dataset of 896 protein complexes. These scores can detect different selection pressures at a given position in a multiple sequence alignment. We also analyzed how the number of interactions in which a residue is involved influences those evolutionary signals. We found that the coevolutionary signal is higher in the interface core than in the interface rim region. Additionally, the difference in coevolution between core and rim regions is comparable to the known difference in conservation between those regions. Considering proteins with multiple interactions, we found that conservation and coevolution increase with the number of different interfaces in which a residue is involved, suggesting that more constraints (i.e., a residue that must satisfy a greater number of interactions) allow fewer sequence changes at those positions, resulting in higher conservation and coevolution values. These findings shed light on the evolution of protein interfaces and provide information useful for identifying protein interfaces and predicting protein-protein interactions.
The biosynthesis of sialylated molecules of crucial relevance for eukaryotic cell life is achieved by sialyltransferases (ST) of the CAZy family GT29. These enzymes are widespread in the Deuterostoma lineages and more rarely described in Protostoma, Viridiplantae and various protist lineages raising the question of their presence in the Last eukaryotes Common Ancestor (LECA). If so, it is expected that the main enzymes associated with sialic acids metabolism are also present in protists. We conducted phylogenomic and protein sequence analyses to gain insights into the origin and ancient evolution of ST and sialic acid pathway in eukaryotes, Bacteria and Archaea. Our study uncovered the unreported occurrence of bacterial GT29 ST and evidenced the existence of 2 ST groups in the LECA, likely originating from the endosymbiotic event that generated mitochondria. Furthermore, distribution of the major actors of the sialic acid pathway in the different eukaryotic phyla indicated that these were already present in the LECA, which could also access to this essential monosaccharide either endogenously or via a sialin/sialidase uptake mechanism involving vesicles. This pathway was lost in several basal eukaryotic lineages including Archaeplastida despite the presence of two different ST groups likely assigned to other functions.
Cell surface of eukaryotic cells is covered with a wide variety of sialylated molecules involved in diverse biological processes and taking part in cell–cell interactions. Although the physiological relevance of these sialylated glycoconjugates in vertebrates begins to be deciphered, the origin and evolution of the genetic machinery implicated in their biosynthetic pathway are poorly understood. Among the variety of actors involved in the sialylation machinery, sialyltransferases are key enzymes for the biosynthesis of sialylated molecules. This review focus on β-galactoside α2,3/6-sialyltransferases belonging to the ST3Gal and ST6Gal families. We propose here an outline of the evolutionary history of these two major ST families. Comparative genomics, molecular phylogeny and structural bioinformatics provided insights into the functional innovations in sialic acid metabolism and enabled to explore how ST-gene function evolved in vertebrates.
In higher vertebrates, sialyltransferases catalyze the transfer of sialic acid residues, either Neu5Ac or Neu5Gc or KDN from an activated sugar donor, which is mainly CMP-Neu5Ac in human tissues, to the hydroxyl group of another saccharide acceptor. In the human genome, 20 unique genes have been described that encode enzymes with remarkable specificity with regards to their acceptor substrates and the glycosidic linkage formed. A systematic search of sialyltransferase-related sequences in genome and EST databases and the use of bioinformatic tools enabled us to investigate the evolutionary history of animal sialyltransferases and propose original models of divergent evolution of animal sialyltransferases. In this chapter, we extend our phylogenetic studies to the comparative analysis of the environment of sialyltransferase gene loci (synteny and paralogy studies), the variations of tissue expression of these genes and the analysis of amino-acid position evolution after gene duplications, in order to assess their sequence-function relationships and the molecular basis underlying their functional divergence.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.