The dramatic increase in heterogeneous types of biological data—in particular, the abundance of new protein sequences—requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity—GPCRs and kinases from humans, and the crotonase superfamily of enzymes—we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships.
The group of proteins that contain a thioredoxin (Trx) fold is huge and diverse. Assessment of the variation in catalytic machinery of Trx fold proteins is essential in providing a foundation for understanding their functional diversity and predicting the function of the many uncharacterized members of the class. The proteins of the Trx fold class retain common features—including variations on a dithiol CxxC active site motif—that lead to delivery of function. We use protein similarity networks to guide an analysis of how structural and sequence motifs track with catalytic function and taxonomic categories for 4,082 representative sequences spanning the known superfamilies of the Trx fold. Domain structure in the fold class is varied and modular, with 2.8% of sequences containing more than one Trx fold domain. Most member proteins are bacterial. The fold class exhibits many modifications to the CxxC active site motif—only 56.8% of proteins have both cysteines, and no functional groupings have absolute conservation of the expected catalytic motif. Only a small fraction of Trx fold sequences have been functionally characterized. This work provides a global view of the complex distribution of domains and catalytic machinery throughout the fold class, showing that each superfamily contains remnants of the CxxC active site. The unifying context provided by this work can guide the comparison of members of different Trx fold superfamilies to gain insight about their structure-function relationships, illustrated here with the thioredoxins and peroxiredoxins.
Glutathione transferases (GSTs) are ubiquitous scavengers of toxic compounds that fall, structurally and functionally, within the thioredoxin fold suprafamily. The fundamental catalytic capability of GSTs is catalysis of the nucleophilic addition or substitution of glutathione at electrophilic centers in a wide range of small electrophilic compounds. While specific GSTs have been studied in detail, little else is known about the structural and functional relationships between different groupings of GSTs. Through a global analysis of sequence and structural similarity, it was determined that variation in the binding of glutathione between the two major subgroups of cytosolic (soluble) GSTs results in a different mode of glutathione activation. Additionally, the convergent features of glutathione binding between cytosolic GSTs and mitochondrial GST kappa are described. The identification of these structural and functional themes helps to illuminate some of the fundamental contributions of the thioredoxin fold to catalysis in the GSTs and clarify how the thioredoxin fold can be modified to enable new functions.
The accumulation of sequenced genomes has expanded the already sizeable population of cysteine peptidases from parasites. Characterization of a few of these enzymes has ascribed key roles to peptidases in parasite life cycles and also shed light on mechanisms of pathogenesis. Here, we discuss recent observations on the physiological activities of cysteine peptidases of parasitic organisms, paired with a global view of all cysteine peptidases from the MEROPS database grouped by similarity. This snapshot of the landscape of parasite cysteine peptidases is complex and highly populated, which suggests that expansion of research beyond the few ‘model’ parasite peptidases is now timely.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.