BackgroundComparative genomics is a powerful approach for studying variation in physiological traits as well as the evolution and ecology of microorganisms. Recent technological advances have enabled sequencing large numbers of related genomes in a single project, requiring computational tools for their integrated analysis. In particular, accurate annotations and identification of gene presence and absence are critical for understanding and modeling the cellular physiology of newly sequenced genomes. Although many tools are available to compare the gene contents of related genomes, new tools are necessary to enable close examination and curation of protein families from large numbers of closely related organisms, to integrate curation with the analysis of gain and loss, and to generate metabolic networks linking the annotations to observed phenotypes.ResultsWe have developed ITEP, an Integrated Toolkit for Exploration of microbial Pan-genomes, to curate protein families, compute similarities to externally-defined domains, analyze gene gain and loss, and generate draft metabolic networks from one or more curated reference network reconstructions in groups of related microbial species among which the combination of core and variable genes constitute the their "pan-genomes". The ITEP toolkit consists of: (1) a series of modular command-line scripts for identification, comparison, curation, and analysis of protein families and their distribution across many genomes; (2) a set of Python libraries for programmatic access to the same data; and (3) pre-packaged scripts to perform common analysis workflows on a collection of genomes. ITEP’s capabilities include de novo protein family prediction, ortholog detection, analysis of functional domains, identification of core and variable genes and gene regions, sequence alignments and tree generation, annotation curation, and the integration of cross-genome analysis and metabolic networks for study of metabolic network evolution.ConclusionsITEP is a powerful, flexible toolkit for generation and curation of protein families. ITEP's modular design allows for straightforward extension as analysis methods and tools evolve. By integrating comparative genomics with the development of draft metabolic networks, ITEP harnesses the power of comparative genomics to build confidence in links between genotype and phenotype and helps disambiguate gene annotations when they are evaluated in both evolutionary and metabolic network contexts.
Genome-scale metabolic models provide a powerful means to harness information from genomes to deepen biological insights. With exponentially increasing sequencing capacity, there is an enormous need for automated reconstruction techniques that can provide more accurate models in a short time frame. Current methods for automated metabolic network reconstruction rely on gene and reaction annotations to build draft metabolic networks and algorithms to fill gaps in these networks. However, automated reconstruction is hampered by database inconsistencies, incorrect annotations, and gap filling largely without considering genomic information. Here we develop an approach for applying genomic information to predict alternative functions for genes and estimate their likelihoods from sequence homology. We show that computed likelihood values were significantly higher for annotations found in manually curated metabolic networks than those that were not. We then apply these alternative functional predictions to estimate reaction likelihoods, which are used in a new gap filling approach called likelihood-based gap filling to predict more genomically consistent solutions. To validate the likelihood-based gap filling approach, we applied it to models where essential pathways were removed, finding that likelihood-based gap filling identified more biologically relevant solutions than parsimony-based gap filling approaches. We also demonstrate that models gap filled using likelihood-based gap filling provide greater coverage and genomic consistency with metabolic gene functions compared to parsimony-based approaches. Interestingly, despite these findings, we found that likelihoods did not significantly affect consistency of gap filled models with Biolog and knockout lethality data. This indicates that the phenotype data alone cannot necessarily be used to discriminate between alternative solutions for gap filling and therefore, that the use of other information is necessary to obtain a more accurate network. All described workflows are implemented as part of the DOE Systems Biology Knowledgebase (KBase) and are publicly available via API or command-line web interface.
Methanosarcina acetivorans strain C2A is a marine methanogenic archaeon notable for its substrate utilization, genetic tractability, and novel energy conservation mechanisms. To help probe the phenotypic implications of this organism's unique metabolism, we have constructed and manually curated a genome-scale metabolic model of M. acetivorans, iMB745, which accounts for 745 of the 4,540 predicted protein-coding genes (16%) in the M. acetivorans genome. The reconstruction effort has identified key knowledge gaps and differences in peripheral and central metabolism between methanogenic species. Using flux balance analysis, the model quantitatively predicts wild-type phenotypes and is 96% accurate in knockout lethality predictions compared to currently available experimental data. The model was used to probe the mechanisms and energetics of by-product formation and growth on carbon monoxide, as well as the nature of the reaction catalyzed by the soluble heterodisulfide reductase HdrABC in M. acetivorans. The genome-scale model provides quantitative and qualitative hypotheses that can be used to help iteratively guide additional experiments to further the state of knowledge about methanogenesis. Methanogenic archaea are unique in their ability to grow on low-energy substrates, such as acetic acid, by converting them into methane and other by-products. Methanogens are a critical part of the global carbon cycle, consuming by-products of other natural bioprocesses that would otherwise be recalcitrant in sulfate-poor, anaerobic environments (12). They also play an important role in global warming, since methane is a greenhouse gas 20 times as potent as carbon dioxide (42) and methanogenesis is the primary mechanism for the emission of methane into the atmosphere (2).Methanosarcina is the only known genus of methanogens with members that can utilize all of the known methanogenic pathways (acetoclastic, methylotrophic, hydrogenotrophic, and methyl reducing) (71). This metabolic diversity makes these species more permissive to metabolic and genetic manipulations than other methanogens. To capitalize on this characteristic, the genomes of three Methanosarcina species have been sequenced (15,22,38). In addition, genetic tools have been developed for several of these species, including methods for directed mutagenesis and regulated expression of specific genes (3,34,73,74).The constraint-based reconstruction and analysis (COBRA) strategy is a powerful paradigm for consolidating large amounts of metabolic knowledge and synthesizing that knowledge into quantitative phenotypic predictions (45, 51). For the performance of constraint-based analysis on an individual organism, its metabolic network is reconstructed from the bottom up, beginning with a sequenced and annotated genome and ending with a network of reactions and reaction-gene associations that directly link genotype and phenotype (68). Many metabolic reconstructions have been curated by hand and have been used to make useful predictions, such as the identification of put...
Methanosarcina barkeri is an Archaeon that produces methane anaerobically as the primary byproduct of its metabolism. M. barkeri can utilize several substrates for ATP and biomass production including methanol, acetate, methyl amines, and a combination of hydrogen and carbon dioxide. In 2006, a metabolic reconstruction of M. barkeri, iAF692, was generated based on a draft genome annotation. The iAF692 reconstruction enabled the first genome-Scale simulations for Archaea. Since the publication of the first metabolic reconstruction of M. barkeri, additional genomic, biochemical, and phenotypic data have clarified several metabolic pathways. We have used this newly available data to improve the M. barkeri metabolic reconstruction. Modeling simulations using the updated model, iMG746, have led to increased accuracy in predicting gene knockout phenotypes and simulations of batch growth behavior. We used the model to examine knockout lethality data and make predictions about metabolic regulation under different growth conditions. Thus, the updated metabolic reconstruction of M. barkeri metabolism is a useful tool for predicting cellular behavior, studying the methanogenic lifestyle, guiding experimental studies, and making predictions relevant to metabolic engineering applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.