Annotation of metabolites remains a major challenge in liquid chromatography-mass spectrometry (LC-MS) based untargeted metabolomics. The current gold standard for metabolite identification is to match the detected feature with an authentic standard analyzed on the same equipment and using the same method as the experimental samples. However, there are substantial practical challenges in applying this approach to large data sets. One widely used annotation approach is to search spectral libraries in reference databases for matching metabolites; however, this approach is limited by the incomplete coverage of these libraries. An alternative computational approach is to match the detected features to candidate chemical structures based on their mass and predicted fragmentation pattern. Unfortunately, both of these approaches can match multiple identities with a single feature. Another issue is that annotations from different tools often disagree. This paper presents a novel LC-MS data annotation method, termed Biologically Consistent Annotation (BioCAn), that combines the results from database searches and in silico fragmentation analyses and places these results into a relevant biological context for the sample as captured by a metabolic model. We demonstrate the utility of this approach through an analysis of CHO cell samples. The performance of BioCAn is evaluated against several currently available annotation tools, and the accuracy of BioCAn annotations is verified using high-purity analytical standards.
Background Metabolic models are indispensable in guiding cellular engineering and in advancing our understanding of systems biology. As not all enzymatic activities are fully known and/or annotated, metabolic models remain incomplete, resulting in suboptimal computational analysis and leading to unexpected experimental results. We posit that one major source of unaccounted metabolism is promiscuous enzymatic activity. It is now well-accepted that most, if not all, enzymes are promiscuous—i.e., they transform substrates other than their primary substrate. However, there have been no systematic analyses of genome-scale metabolic models to predict putative reactions and/or metabolites that arise from enzyme promiscuity. Results Our workflow utilizes PROXIMAL—a tool that uses reactant–product transformation patterns from the KEGG database—to predict putative structural modifications due to promiscuous enzymes. Using iML1515 as a model system, we first utilized a computational workflow, referred to as Extended Metabolite Model Annotation (EMMA), to predict promiscuous reactions catalyzed, and metabolites produced, by natively encoded enzymes in Escherichia coli . We predict hundreds of new metabolites that can be used to augment iML1515. We then validated our method by comparing predicted metabolites with the Escherichia coli Metabolome Database (ECMDB). Conclusions We utilized EMMA to augment the iML1515 metabolic model to more fully reflect cellular metabolic activity. This workflow uses enzyme promiscuity as basis to predict hundreds of reactions and metabolites that may exist in E. coli but may have not been documented in iML1515 or other databases. We provide detailed analysis of 23 predicted reactions and 16 associated metabolites. Interestingly, nine of these metabolites, which are in ECMDB, have not previously been documented in any other E. coli databases. Four of the predicted reactions provide putative transformations parallel to those already in iML1515. We suggest adding predicted metabolites and reactions to iML1515 to create an extended metabolic model (EMM) for E. coli. Electronic supplementary material The online version of this article (10.1186/s12934-019-1156-3) contains supplementary material, which is available to authorized users.
BackgroundIncreasing understanding of metabolic and regulatory networks underlying microbial physiology has enabled creation of progressively more complex synthetic biological systems for biochemical, biomedical, agricultural, and environmental applications. However, despite best efforts, confounding phenotypes still emerge from unforeseen interplay between biological parts, and the design of robust and modular biological systems remains elusive. Such interactions are difficult to predict when designing synthetic systems and may manifest during experimental testing as inefficiencies that need to be overcome. Despite advances in tools and methodologies for strain engineering, there remains a lack of tools that can systematically identify incompatibilities between the native metabolism of the host and its engineered modifications.ResultsTransforming organisms such as Escherichia coli into microbial factories is achieved via a number of engineering strategies, used individually or in combination, with the goal of maximizing the production of chosen target compounds. One technique relies on suppressing or overexpressing selected genes; another involves on introducing heterologous enzymes into a microbial host. These modifications steer mass flux towards the set of desired metabolites but may create unexpected interactions. In this work, we develop a computational method, termed Metabolic Disruption Workflow (MDFlow), for discovering interactions and network disruption arising from enzyme promiscuity – the ability of enzymes to act on a wide range of molecules that are structurally similar to their native substrates. We apply MDFlow to two experimentally verified cases where strains with essential genes knocked out are rescued by interactions resulting from overexpression of one or more other genes. We then apply MDFlow to predict and evaluate a number of putative promiscuous reactions that can interfere with two heterologous pathways designed for 3-hydroxypropic acid (3-HP) production.ConclusionsUsing MDFlow, we can identify putative enzyme promiscuity and the subsequent formation of unintended and undesirable byproducts that are not only disruptive to the host metabolism but also to the intended end-objective of high biosynthetic productivity and yield. In addition, we show how enzyme promiscuity can potentially be responsible for the adaptability of cells to the disruption of essential pathways in terms of biomass growth.
Increasing understanding of metabolic and regulatory networks underlying microbial physiology has enabled creation of progressively more complex synthetic biological systems for biochemical, biomedical, agricultural, and environmental applications. However, despite best efforts, confounding phenotypes still emerge from unforeseen interplay between biological parts, and the design of robust and modular biological systems remains elusive. Such interactions are difficult to predict when designing synthetic systems and may manifest during experimental testing as inefficiencies that need to be overcome. Transforming organisms such as Escherichia coli into microbial factories is achieved via several engineering strategies, used individually or in combination, with the goal of maximizing the production of chosen target compounds. One technique relies on suppressing or overexpressing selected genes; another involves introducing heterologous enzymes into a microbial host. These modifications steer mass flux towards the set of desired metabolites but may create unexpected interactions. In this work, we develop a computational method, termed M etabolic D isruption Work flow ( MDFlow ), for discovering interactions and network disruptions arising from enzyme promiscuity – the ability of enzymes to act on a wide range of molecules that are structurally similar to their native substrates. We apply MDFlow to two experimentally verified cases where strains with essential genes knocked out are rescued by interactions resulting from overexpression of one or more other genes. We demonstrate how enzyme promiscuity may aid cells in adapting to disruptions of essential metabolic functions. We then apply MDFlow to predict and evaluate a number of putative promiscuous reactions that can interfere with two heterologous pathways designed for 3-hydroxypropionic acid (3-HP) production. Using MDFlow , we can identify putative enzyme promiscuity and the subsequent formation of unintended and undesirable byproducts that are not only disruptive to the host metabolism but also to the intended end-objective of high biosynthetic productivity and yield. As we demonstrate, MDFlow provides an innovative workflow to systematically identify incompatibilities between the native metabolism of the host and its engineered modifications due to enzyme promiscuity.
Motivation While traditionally utilized for identifying site-specific metabolic activity within a compound to alter its interaction with a metabolizing enzyme, predicting the Site-of-Metabolism (SOM) is essential in analyzing the promiscuity of enzymes on substrates. The successful prediction of SOMs and the relevant promiscuous products has a wide range of applications that include creating extended metabolic models that account for enzyme promiscuity and the construction of novel heterologous synthesis pathways. There is therefore a need to develop generalized methods that can predict molecular SOMs for a wide range of metabolizing enzymes. Results This paper develops a Graph Neural Network (GNN) model for the classification of an atom (or a bond) being an SOM. Our model, GNN-SOM, is trained on enzymatic interactions, available in the KEGG database, that span all enzyme commission numbers. We demonstrate that GNN-SOM consistently outperforms baseline Machine Learning (ML) models, when trained on all enzymes, on Cytochrome P450 (CYP) enzymes, or on non-CYP enzymes. We showcase the utility of GNN-SOM in prioritizing predicted enzymatic products due to enzyme promiscuity for two biological applications: the construction of Extended Metabolic Models (EMMs) and the construction of synthesis pathways. Availability A python implementation of the trained SOM predictor model can be found at https://github.com/HassounLab/GNN-SOM Supplementary information Not applicable
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.