Protein phosphorylation by eukaryotic protein kinases (ePKs) is a fundamental mechanism of cell signaling in all organisms. In model vertebrates, ~10% of ePKs are classified as pseudokinases, which have amino acid changes within the catalytic machinery of the kinase domain that distinguish them from their canonical kinase counterparts. However, pseudokinases still regulate various signaling pathways, usually doing so in the absence of their own catalytic output. To investigate the prevalence, evolutionary relationships, and biological diversity of these pseudoenzymes, we performed a comprehensive analysis of putative pseudokinase sequences in available eukaryotic, bacterial, and archaeal proteomes. We found that pseudokinases are present across all domains of life, and we classified nearly 30,000 eukaryotic, 1500 bacterial, and 20 archaeal pseudokinase sequences into 86 pseudokinase families, including ~30 families that were previously unknown. We uncovered a rich variety of pseudokinases with notable expansions not only in animals but also in plants, fungi, and bacteria, where pseudokinases have previously received cursory attention. These expansions are accompanied by domain shuffling, which suggests roles for pseudokinases in plant innate immunity, plant-fungal interactions, and bacterial signaling. Mechanistically, the ancestral kinase fold has diverged in many distinct ways through the enrichment of unique sequence motifs to generate new families of pseudokinases in which the kinase domain is repurposed for noncanonical nucleotide binding or to stabilize unique, inactive kinase conformations. We further provide a collection of annotated pseudokinase sequences in the Protein Kinase Ontology (ProKinO) as a new mineable resource for the signaling community.
Glycosyltransferases (GTs) are prevalent across the tree of life and regulate nearly all aspects of cellular functions. The evolutionary basis for their complex and diverse modes of catalytic functions remain enigmatic. Here, based on deep mining of over half million GT-A fold sequences, we define a minimal core component shared among functionally diverse enzymes. We find that variations in the common core and emergence of hypervariable loops extending from the core contributed to GT-A diversity. We provide a phylogenetic framework relating diverse GT-A fold families for the first time and show that inverting and retaining mechanisms emerged multiple times independently during evolution. Using evolutionary information encoded in primary sequences, we trained a machine learning classifier to predict donor specificity with nearly 90% accuracy and deployed it for the annotation of understudied GTs. Our studies provide an evolutionary framework for investigating complex relationships connecting GT-A fold sequence, structure, function and regulation.
1 Glycosyltransferases (GTs) are prevalent across the tree of life and regulate nearly all aspects of 2 cellular functions by catalyzing synthesis of glycosidic linkages between diverse donor and 3 acceptor substrates. Despite the availability of GT sequences from diverse organisms, the 4 evolutionary basis for their complex and diverse modes of catalytic and regulatory functions 5 remain enigmatic. Here, based on deep mining of over half a million GT-A fold sequences from 6 diverse organisms, we define a minimal core component shared among functionally diverse 7 enzymes. We find that variations in the common core and the emergence of hypervariable loops 8 extending from the core contributed to the evolution of catalytic and functional diversity. We 9 provide a phylogenetic framework relating diverse GT-A fold families for the first time and show 10 that inverting and retaining mechanisms emerged multiple times independently during the course 11 of evolution. We identify conserved modes of donor and acceptor recognition in evolutionarily 12 divergent families and pinpoint the sequence and structural features for functional specialization. 13Using the evolutionary information encoded in primary sequences, we trained a machine learning 14 classifier to predict donor specificity with nearly 88% accuracy and deployed it for the annotation 15 of understudied GTs in five model organisms. Our studies provide an evolutionary framework for 16 investigating the complex relationships connecting GT-A fold sequence, structure, function and 17 regulation. 18
Skp1 is a subunit of the SCF (kp1/ullin 1/-box protein) class of E3 ubiquitin ligases that are important for eukaryotic protein degradation. Unlike its animal counterparts, Skp1 from is hydroxylated by an O-dependent prolyl-4-hydroxylase (PhyA), and the resulting hydroxyproline can subsequently be modified by a five-sugar chain. A similar modification is found in the social amoeba , where it regulates SCF assembly and O-dependent development. Homologous glycosyltransferases assemble a similar core trisaccharide in both organisms, and a bifunctional α-galactosyltransferase from CAZy family GT77 mediates the addition of the final two sugars in , generating Galα1, 3Galα1,3Fucα1,2Galβ1,3GlcNAcα1-. Here, we found that utilizes a cytoplasmic glycosyltransferase from an ancient clade of CAZy family GT32 to catalyze transfer of the fourth sugar. Catalytically active Glt1 was required for the addition of the terminal disaccharide in cells, and cytosolic extracts catalyzed transfer of [H]glucose from UDP-[H]glucose to the trisaccharide form of Skp1 in a -dependent fashion. Recombinant Glt1 catalyzed the same reaction, confirming that it directly mediates Skp1 glucosylation, and NMR demonstrated formation of a Glcα1,3Fuc linkage. Recombinant Glt1 strongly preferred the full core trisaccharide attached to Skp1 and labeled only Skp1 inΔ extracts, suggesting specificity for Skp1. -knock-out parasites exhibited a growth defect not rescued by catalytically inactive Glt1, indicating that the glycan acts in concert with the first enzyme in the pathway, PhyA, in cells. A genomic bioinformatics survey suggested that Glt1 belongs to the ancestral Skp1 glycosylation pathway in protists and evolved separately from related Golgi-resident GT32 glycosyltransferases.
Glycosyltransferases (GTs) play fundamental roles in nearly all cellular processes through the biosynthesis of complex carbohydrates and glycosylation of diverse protein and small molecule substrates. The extensive structural and functional diversification of GTs presents a major challenge in mapping the relationships connecting sequence, structure, fold and function using traditional bioinformatics approaches. Here, we present a convolutional neural network with attention (CNN-attention) based deep learning model that leverages simple secondary structure representations generated from primary sequences to provide GT fold prediction with high accuracy. The model learns distinguishing secondary structure features free of primary sequence alignment constraints and is highly interpretable. It delineates sequence and structural features characteristic of individual fold types, while classifying them into distinct clusters that group evolutionarily divergent families based on shared secondary structural features. We further extend our model to classify GT families of unknown folds and variants of known folds. By identifying families that are likely to adopt novel folds such as GT91, GT96 and GT97, our studies expand the GT fold landscape and prioritize targets for future structural studies.
PlantCAZyme is a database built upon dbCAN (database for automated carbohydrate active enzyme annotation), aiming to provide pre-computed sequence and annotation data of carbohydrate active enzymes (CAZymes) to plant carbohydrate and bioenergy research communities. The current version contains data of 43 790 CAZymes of 159 protein families from 35 plants (including angiosperms, gymnosperms, lycophyte and bryophyte mosses) and chlorophyte algae with fully sequenced genomes. Useful features of the database include: (i) a BLAST server and a HMMER server that allow users to search against our pre-computed sequence data for annotation purpose, (ii) a download page to allow batch downloading data of a specific CAZyme family or species and (iii) protein browse pages to provide an easy access to the most comprehensive sequence and annotation data.Database URL: http://cys.bios.niu.edu/plantcazyme/
The emergence of multicellularity is strongly correlated with the expansion of tyrosine kinases, a conserved family of signaling enzymes that regulates pathways essential for cell-to-cell communication. Although tyrosine kinases have been classified from several model organisms, a molecular-level understanding of tyrosine kinase evolution across all holozoans is currently lacking. Using a hierarchical sequence constraint-based classification of diverse holozoan tyrosine kinases, we construct a new phylogenetic tree that identifies two ancient clades of cytoplasmic and receptor tyrosine kinases separated by the presence of an extended insert segment in the kinase domain connecting the D and E-helices. Present in nearly all receptor tyrosine kinases, this fast-evolving insertion imparts diverse functionalities such as post-translational modification sites and regulatory interactions. Eph and EGFR receptor tyrosine kinases are two exceptions which lack this insert, each forming an independent lineage characterized by unique functional features. We also identify common constraints shared across multiple tyrosine kinase families which warrant the designation of three new subgroups: Src Module (SrcM), Insulin Receptor Kinase-Like (IRKL), and Fibroblast, Platelet-derived, Vascular, and growth factor Receptors (FPVR). Subgroup-specific constraints reflect shared autoinhibitory interactions involved in kinase conformational regulation. Conservation analyses describe how diverse tyrosine kinase signaling functions arose through the addition of family-specific motifs upon subgroup-specific features and co-evolving protein domains. We propose the oldest tyrosine kinases, IRKL, SrcM, and Csk, originated from unicellular pre-metazoans and were co-opted for complex multicellular functions. The increased frequency of oncogenic variants in more recent tyrosine kinases suggests that lineage-specific functionalities are selectively altered in human cancers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.