Glycosyltransferases (GTs) are prevalent across the tree of life and regulate nearly all aspects of cellular functions. The evolutionary basis for their complex and diverse modes of catalytic functions remain enigmatic. Here, based on deep mining of over half million GT-A fold sequences, we define a minimal core component shared among functionally diverse enzymes. We find that variations in the common core and emergence of hypervariable loops extending from the core contributed to GT-A diversity. We provide a phylogenetic framework relating diverse GT-A fold families for the first time and show that inverting and retaining mechanisms emerged multiple times independently during evolution. Using evolutionary information encoded in primary sequences, we trained a machine learning classifier to predict donor specificity with nearly 90% accuracy and deployed it for the annotation of understudied GTs. Our studies provide an evolutionary framework for investigating complex relationships connecting GT-A fold sequence, structure, function and regulation.
1 Glycosyltransferases (GTs) are prevalent across the tree of life and regulate nearly all aspects of 2 cellular functions by catalyzing synthesis of glycosidic linkages between diverse donor and 3 acceptor substrates. Despite the availability of GT sequences from diverse organisms, the 4 evolutionary basis for their complex and diverse modes of catalytic and regulatory functions 5 remain enigmatic. Here, based on deep mining of over half a million GT-A fold sequences from 6 diverse organisms, we define a minimal core component shared among functionally diverse 7 enzymes. We find that variations in the common core and the emergence of hypervariable loops 8 extending from the core contributed to the evolution of catalytic and functional diversity. We 9 provide a phylogenetic framework relating diverse GT-A fold families for the first time and show 10 that inverting and retaining mechanisms emerged multiple times independently during the course 11 of evolution. We identify conserved modes of donor and acceptor recognition in evolutionarily 12 divergent families and pinpoint the sequence and structural features for functional specialization. 13Using the evolutionary information encoded in primary sequences, we trained a machine learning 14 classifier to predict donor specificity with nearly 88% accuracy and deployed it for the annotation 15 of understudied GTs in five model organisms. Our studies provide an evolutionary framework for 16 investigating the complex relationships connecting GT-A fold sequence, structure, function and 17 regulation. 18
The wrongful murders of Black individuals during 2020 (including George Floyd, Breonna Taylor, Ahmaud Aubery, and others), compounded by a long history of similar incidents, inspired protests around the world against racism and police brutality. The growing anti-racism movement sparked conversations within science, technology, engineering, mathematics, and medicine (STEMM) surrounding ways to combat racial bias in our respective fields. A spotlight was placed on the discriminatory history of scientific research and medical practice, as well as the problematic modern-day policies that perpetuate the lack of racial diversity and equity in STEMM.While observing and participating in recent discussions about the racism that pervades institutions, departments, and scientific discourse, we have noticed a set of standard arguments against anti-racism action within STEMM. Ten of these arguments are laid out in this manuscript and paired with evidence-based counterarguments. Notably, while this manuscript is primarily centered around a United States perspective, most of our arguments and suggested actions remain applicable to other countries as well. It is crucial for a STEMM anti-racism movement to extend beyond national borders, reflecting the international nature of scientific research and collaboration.This team of authors represents a collaboration between scientists from historically marginalized groups and their allies. By compiling published academic literature, we hope to directly
The emergence of multicellularity is strongly correlated with the expansion of tyrosine kinases, a conserved family of signaling enzymes that regulates pathways essential for cell-to-cell communication. Although tyrosine kinases have been classified from several model organisms, a molecular-level understanding of tyrosine kinase evolution across all holozoans is currently lacking. Using a hierarchical sequence constraint-based classification of diverse holozoan tyrosine kinases, we construct a new phylogenetic tree that identifies two ancient clades of cytoplasmic and receptor tyrosine kinases separated by the presence of an extended insert segment in the kinase domain connecting the D and E-helices. Present in nearly all receptor tyrosine kinases, this fast-evolving insertion imparts diverse functionalities such as post-translational modification sites and regulatory interactions. Eph and EGFR receptor tyrosine kinases are two exceptions which lack this insert, each forming an independent lineage characterized by unique functional features. We also identify common constraints shared across multiple tyrosine kinase families which warrant the designation of three new subgroups: Src Module (SrcM), Insulin Receptor Kinase-Like (IRKL), and Fibroblast, Platelet-derived, Vascular, and growth factor Receptors (FPVR). Subgroup-specific constraints reflect shared autoinhibitory interactions involved in kinase conformational regulation. Conservation analyses describe how diverse tyrosine kinase signaling functions arose through the addition of family-specific motifs upon subgroup-specific features and co-evolving protein domains. We propose the oldest tyrosine kinases, IRKL, SrcM, and Csk, originated from unicellular pre-metazoans and were co-opted for complex multicellular functions. The increased frequency of oncogenic variants in more recent tyrosine kinases suggests that lineage-specific functionalities are selectively altered in human cancers.
New interdisciplinary biological sciences like bioinformatics, biophysics, and systems biology have become increasingly relevant in modern science. Many papers have suggested the importance of adding these subjects, particularly bioinformatics, to an undergraduate curriculum; however, most of their assertions have relied on qualitative arguments. In this paper, we will show our metadata analysis of a scientific literature database (PubMed) that quantitatively describes the importance of the subjects of bioinformatics, systems biology, and biophysics as compared with a well-established interdisciplinary subject, biochemistry. Specifically, we found that the development of each subject assessed by its publication volume was well described by a set of simple nonlinear equations, allowing us to characterize them quantitatively. Bioinformatics, which had the highest ratio of publications produced, was predicted to grow between 77% and 93% by 2025 according to the model. Due to the large number of publications produced in bioinformatics, which nearly matches the number published in biochemistry, it can be inferred that bioinformatics is almost equal in significance to biochemistry. Based on our analysis, we suggest that bioinformatics be added to the standard biology undergraduate curriculum. Adding this course to an undergraduate curriculum will better prepare students for future research in biology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.