Knowing the catalytic turnover numbers of enzymes is essential for understanding the growth rate, proteome composition, and physiology of organisms, but experimental data on enzyme turnover numbers is sparse and noisy. Here, we demonstrate that machine learning can successfully predict catalytic turnover numbers in Escherichia coli based on integrated data on enzyme biochemistry, protein structure, and network context. We identify a diverse set of features that are consistently predictive for both in vivo and in vitro enzyme turnover rates, revealing novel protein structural correlates of catalytic turnover. We use our predictions to parameterize two mechanistic genome-scale modelling frameworks for proteome-limited metabolism, leading to significantly higher accuracy in the prediction of quantitative proteome data than previous approaches. The presented machine learning models thus provide a valuable tool for understanding metabolism and the proteome at the genome scale, and elucidate structural, biochemical, and network properties that underlie enzyme kinetics.
Genome-scale models of metabolism and macromolecular expression (ME-models) explicitly compute the optimal proteome composition of a growing cell. ME-models expand upon the well-established genome-scale models of metabolism (M-models), and they enable a new fundamental understanding of cellular growth. ME-models have increased predictive capabilities and accuracy due to their inclusion of the biosynthetic costs for the machinery of life, but they come with a significant increase in model size and complexity. This challenge results in models which are both difficult to compute and challenging to understand conceptually. As a result, ME-models exist for only two organisms (Escherichia coli and Thermotoga maritima) and are still used by relatively few researchers. To address these challenges, we have developed a new software framework called COBRAme for building and simulating ME-models. It is coded in Python and built on COBRApy, a popular platform for using M-models. COBRAme streamlines computation and analysis of ME-models. It provides tools to simplify constructing and editing ME-models to enable ME-model reconstructions for new organisms. We used COBRAme to reconstruct a condensed E. coli ME-model called iJL1678b-ME. This reformulated model gives functionally identical solutions to previous E. coli ME-models while using 1/6 the number of free variables and solving in less than 10 minutes, a marked improvement over the 6 hour solve time of previous ME-model formulations. Errors in previous ME-models were also corrected leading to 52 additional genes that must be expressed in iJL1678b-ME to grow aerobically in glucose minimal in silico media. This manuscript outlines the architecture of COBRAme and demonstrates how ME-models can be created, modified, and shared most efficiently using the new software framework.
Escherichia coli is considered to be the best-known microorganism given the large number of published studies detailing its genes, genome, and biochemical functions of its molecular components. This vast literature has been systematically assembled into a reconstruction of the biochemical reaction networks that underlie E. coli's functions; a process which is now being applied to an increasing number of microorganisms. Genome-scale reconstructed networks represent organized and systematized knowledge-bases that have multiple uses, including conversion into computational models that interpret and predict phenotypic states and the consequences of environmental and genetic perturbations. These genome-scale models (GEMs) now enable us to develop pan-genome analyses that provide mechanistic insights, detail the selection pressures on proteome allocation, and address stress phenotypes. In this Review, we first discuss the overall development of GEMs and their applications. Next, we review the evolution of the most complete GEM that has been developed to date: the E. coli GEM. Finally, we explore three emerging areas in genome-scale modeling of microbial phenotypes: collections of strainspecific models, metabolic and macromolecular expression models, and simulation of stress responses.
Transcriptional regulatory networks (TRNs) have been studied intensely for >25 y. Yet, even for the TRN-probably the best characterized TRN-several questions remain. Here, we address three questions: () How complete is our knowledge of the TRN; () how well can we predict gene expression using this TRN; and () how robust is our understanding of the TRN? First, we reconstructed a high-confidence TRN (hiTRN) consisting of 147 transcription factors (TFs) regulating 1,538 transcription units (TUs) encoding 1,764 genes. The 3,797 high-confidence regulatory interactions were collected from published, validated chromatin immunoprecipitation (ChIP) data and RegulonDB. For 21 different TF knockouts, up to 63% of the differentially expressed genes in the hiTRN were traced to the knocked-out TF through regulatory cascades. Second, we trained supervised machine learning algorithms to predict the expression of 1,364 TUs given TF activities using 441 samples. The algorithms accurately predicted condition-specific expression for 86% (1,174 of 1,364) of the TUs, while 193 TUs (14%) were predicted better than random TRNs. Third, we identified 10 regulatory modules whose definitions were robust against changes to the TRN or expression compendium. Using surrogate variable analysis, we also identified three unmodeled factors that systematically influenced gene expression. Our computational workflow comprehensively characterizes the predictive capabilities and systems-level functions of an organism's TRN from disparate data types.
Genome-scale metabolic models (GEMs) are mathematically structured knowledge bases of metabolism that provide phenotypic predictions from genomic information. GEM-guided predictions of growth phenotypes rely on the accurate definition of a biomass objective function (BOF) that is designed to include key cellular biomass components such as the major macromolecules (DNA, RNA, proteins), lipids, coenzymes, inorganic ions and species-specific components. Despite its importance, no standardized computational platform is currently available to generate species-specific biomass objective functions in a data-driven, unbiased fashion. To fill this gap in the metabolic modeling software ecosystem, we implemented BOFdat, a Python package for the definition of a B iomass O bjective F unction from experimental dat a. BOFdat has a modular implementation that divides the BOF definition process into three independent modules defined here as steps: 1) the coefficients for major macromolecules are calculated, 2) coenzymes and inorganic ions are identified and their stoichiometric coefficients estimated, 3) the remaining species-specific metabolic biomass precursors are algorithmically extracted in an unbiased way from experimental data. We used BOFdat to reconstruct the BOF of the Escherichia coli model i ML1515, a gold standard in the field. The BOF generated by BOFdat resulted in the most concordant biomass composition, growth rate, and gene essentiality prediction accuracy when compared to other methods. Installation instructions for BOFdat are available in the documentation and the source code is available on GitHub ( https://github.com/jclachance/BOFdat ).
Adaptive laboratory evolution (ALE) experiments are often designed to maintain a static culturing environment to minimize confounding variables that could influence the adaptive process, but dynamic nutrient conditions occur frequently in natural and bioprocessing settings. To study the nature of carbon substrate fitness tradeoffs, we evolved batch cultures of Escherichia coli via serial propagation into tubes alternating between glucose and either xylose, glycerol, or acetate. Genome sequencing of evolved cultures revealed several genetic changes preferentially selected for under dynamic conditions and different adaptation strategies depending on the substrates being switched between; in some environments, a persistent "generalist" strain developed, while in another, two "specialist" subpopulations arose that alternated dominance. Diauxic lag phenotype varied across the generalists and specialists, in one case being completely abolished, while gene expression data distinguished the transcriptional strategies implemented by strains in pursuit of growth optimality. Genome-scale metabolic modeling techniques were then used to help explain the inherent substrate differences giving rise to the observed distinct adaptive strategies. This study gives insight into the population dynamics of adaptation in an alternating environment and into the underlying metabolic and genetic mechanisms. Furthermore, ALE-generated optimized strains have phenotypes with potential industrial bioprocessing applications.IMPORTANCE Evolution and natural selection inexorably lead to an organism's improved fitness in a given environment, whether in a laboratory or natural setting. However, despite the frequent natural occurrence of complex and dynamic growth environments, laboratory evolution experiments typically maintain simple, static culturing environments so as to reduce selection pressure complexity. In this study, we investigated the adaptive strategies underlying evolution to fluctuating environments by evolving Escherichia coli to conditions of frequently switching growth substrate. Characterization of evolved strains via a number of different data types revealed the various genetic and phenotypic changes implemented in pursuit of growth optimality and how these differed across the different growth substrates and switching protocols. This work not only helps to establish general principles of adaptation to complex environments but also suggests strategies for experimental design to achieve desired evolutionary outcomes.KEYWORDS adaptive laboratory evolution, Escherichia coli, adaptive mutations, phenotypic variation I n heterotrophs such as Escherichia coli, catabolism of carbon substrates is the driving force behind the energy generation and chemical synthesis necessary for homeostasis and anabolism (1). Although glucose is the most readily metabolized carbohydrate
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.