Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.
SummaryReprogrammed cellular metabolism is a common characteristic observed in various cancers1,2. However, whether metabolic changes directly regulate cancer development and progression remains poorly understood. Here we show that BCAT1, a cytosolic aminotransferase for the branched-chain amino acids (BCAAs), is aberrantly activated and functionally required for chronic myeloid leukemia (CML). BCAT1 is up-regulated during CML progression and promotes BCAA production in leukemia cells by aminating the branched-chain keto acids. Blocking BCAT1 expression or enzymatic activity induces cellular differentiation and impairs the propagation of blast crisis CML (BC-CML) both in vitro and in vivo. Stable isotope tracer experiments combined with NMR-based metabolic analysis demonstrate the intracellular production of BCAAs by BCAT1. Direct supplementation with BCAAs ameliorates the defects caused by BCAT1 knockdown, indicating that BCAT1 exerts its oncogenic function via BCAA production in BC-CML cells. Importantly, BCAT1 expression not only is activated in human BC-CML and de novo acute myeloid leukemia but also predicts disease outcome in patients. As an upstream regulator of BCAT1 expression, we identified Musashi2 (MSI2), an oncogenic RNA binding protein that is required for BC-CML. MSI2 is physically associated with the BCAT1 transcript and positively regulates its protein expression in leukemia. Taken together, this work reveals that altered BCAA metabolism activated through the MSI2-BCAT1 axis drives cancer progression in myeloid leukemia.
The catalytic activities of eukaryotic protein kinases (EPKs) are regulated by movement of the C-helix, movement of the N and C lobes upon ATP binding, and movement of the activation loop upon phosphorylation. Statistical analysis of the selective constraints associated with AGC kinase functional divergence reveals conserved interactions between these regulatory regions and three regions of the C-terminal tail (C-tail): the N-lobe tether (NLT), the active-site tether (AST), and the C-lobe tether (CLT). The NLT serves as a docking site for an upstream kinase PDK1 and, upon activation, positions the C-helix within the ATP binding pocket. The AST directly interacts with the ATP binding pocket, and the CLT interacts with the interlobe linker and the αC–β4 loop, which appears to serve as a hinge for C-helix movement. The C-tail is a hallmark of AGC functional divergence inasmuch as most of the conserved core residues that distinguish AGC kinases from other EPKs are associated with the NLT, AST, or CLT. Moreover, several AGC catalytic core conserved residues that interact with the C-tail strikingly diverge from the canonical residues observed at corresponding positions in nearly all other EPKs, suggesting that the catalytic core may have coevolved with the C-tail in AGC kinases. These observations, along with the fact that the C-tail is needed for catalytic activity suggests that the C-tail is a cis-acting regulatory module that can also serve as a regulatory “handle,” to which trans-acting cellular components can bind to modulate activity.
The eukaryotic protein kinase (ePK) domain mediates the majority of signaling and coordination of complex events in eukaryotes. By contrast, most bacterial signaling is thought to occur through structurally unrelated histidine kinases, though some ePK-like kinases (ELKs) and small molecule kinases are known in bacteria. Our analysis of the Global Ocean Sampling (GOS) dataset reveals that ELKs are as prevalent as histidine kinases and may play an equally important role in prokaryotic behavior. By combining GOS and public databases, we show that the ePK is just one subset of a diverse superfamily of enzymes built on a common protein kinase–like (PKL) fold. We explored this huge phylogenetic and functional space to cast light on the ancient evolution of this superfamily, its mechanistic core, and the structural basis for its observed diversity. We cataloged 27,677 ePKs and 18,699 ELKs, and classified them into 20 highly distinct families whose known members suggest regulatory functions. GOS data more than tripled the count of ELK sequences and enabled the discovery of novel families and classification and analysis of all ELKs. Comparison between and within families revealed ten key residues that are highly conserved across families. However, all but one of the ten residues has been eliminated in one family or another, indicating great functional plasticity. We show that loss of a catalytic lysine in two families is compensated by distinct mechanisms both involving other key motifs. This diverse superfamily serves as a model for further structural and functional analysis of enzyme evolution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.