Genome mining has become a key technology to exploit natural product diversity. While initially performed on a single-genome basis, the process is now being scaled up to mine entire genera, strain collections and microbiomes. However, no bioinformatic framework is currently available for effectively analyzing datasets of this size and complexity. Here, we provide a streamlined computational workflow consisting of two new software tools: The 'Biosynthetic Gene Similarity Clustering And Prospecting Engine' (BiG-SCAPE) facilitates fast and interactive sequence similarity network analysis of biosynthetic gene clusters and gene cluster families. 'CORe Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:
Fueled by the explosion of (meta)genomic data, genome mining of specialized metabolites has become a major technology for drug discovery and studying microbiome ecology. In these efforts, computational tools like antiSMASH have played a central role through the analysis of Biosynthetic Gene Clusters (BGCs). Thousands of candidate BGCs from microbial genomes have been identified and stored in public databases. Interpreting the function and novelty of these predicted BGCs requires comparison with a well-documented set of BGCs of known function. The MIBiG (Minimum Information about a Biosynthetic Gene Cluster) Data Standard and Repository was established in 2015 to enable curation and storage of known BGCs. Here, we present MIBiG 2.0, which encompasses major updates to the schema, the data, and the online repository itself. Over the past five years, 851 new BGCs have been added. Additionally, we performed extensive manual data curation of all entries to improve the annotation quality of our repository. We also redesigned the data schema to ensure the compliance of future annotations. Finally, we improved the user experience by adding new features such as query searches and a statistics page, and enabled direct link-outs to chemical structure databases. The repository is accessible online at https://mibig.secondarymetabolites.org/.
Natural products from microbes have provided humans with beneficial antibiotics for millennia. However, a decline in the pace of antibiotic discovery exerts pressure on human health as antibiotic resistance spreads, a challenge that may better faced by unveiling chemical diversity produced by microbes. Current microbial genome mining approaches have revitalized research into antibiotics, but the empirical nature of these methods limits the chemical space that is explored.Here, we address the problem of finding novel pathways by incorporating evolutionary principles into genome mining. We recapitulated the evolutionary history of twenty-three enzyme families previously uninvestigated in the context of natural product biosynthesis in Actinobacteria, the most proficient producers of natural products. Our genome evolutionary analyses where based on the assumption that expanded—repurposed enzyme families—from central metabolism, occur frequently and thus have the potential to catalyze new conversions in the context of natural products biosynthesis. Our analyses led to the discovery of biosynthetic gene clusters coding for hidden chemical diversity, as validated by comparing our predictions with those from state-of-the-art genome mining tools; as well as experimentally demonstrating the existence of a biosynthetic pathway for arseno-organic metabolites in Streptomyces coelicolor and Streptomyces lividans, Using a gene knockout and metabolite profile combined strategy.As our approach does not rely solely on sequence similarity searches of previously identified biosynthetic enzymes, these results establish the basis for the development of an evolutionary-driven genome mining tool termed EvoMining that complements current platforms. We anticipate that by doing so real ‘chemical dark matter’ will be unveiled.
We review known evolutionary mechanisms underlying the overwhelming chemical diversity of bacterial natural products biosynthesis, focusing on enzyme promiscuity and the evolution of enzymatic domains that enable metabolic traits.
Genome mining has become a key technology to explore and exploit natural product diversity through the identification and analysis of biosynthetic gene clusters (BGCs). Initially, this was performed on a single-genome basis; currently, the process is being scaled up to large-scale mining of pan-genomes of entire genera, complete strain collections and metagenomic datasets from which thousands of bacterial genomes can be extracted at once. However, no bioinformatic framework is currently available for the effective analysis of datasets of this size and complexity. Here, we provide a streamlined computational workflow, tightly integrated with antiSMASH and MIBiG, that consists of two new software tools, BiG-SCAPE and CORASON. BiG-SCAPE facilitates rapid calculation and interactive visual exploration of BGC sequence similarity networks, grouping gene clusters at multiple hierarchical levels, and includes a 'glocal' alignment mode that accurately groups both complete and fragmented BGCs. CORASON employs a phylogenomic approach to elucidate the detailed evolutionary relationships between gene clusters by computing high-resolution multi-locus phylogenies of all BGCs within and across gene cluster families (GCFs), and allows researchers to comprehensively identify all genomic contexts in which particular biosynthetic gene cassettes are found. We validate BiG-SCAPE by correlating its GCF output to metabolomic data across 403 actinobacterial strains. Furthermore, we demonstrate the discovery potential of the platform by using CORASON to comprehensively map the phylogenetic diversity of the large detoxin/rimosamide gene cluster clan, prioritizing three new detoxin families for subsequent characterization of six new analogs using isotopic labeling and analysis of tandem mass spectrometric data.
Desferrioxamines are hydroxamate siderophores widely conserved in both aquatic and soil-dwelling Actinobacteria. While the genetic and enzymatic bases of siderophore biosynthesis and their transport in model families of this phylum are well understood, evolutionary studies are lacking. Here, we perform a comprehensive desferrioxamine-centric (des genes) phylogenomic analysis, which includes the genomes of six novel strains isolated from an iron and phosphorous depleted oasis in the Chihuahuan desert of Mexico. Our analyses reveal previously unnoticed desferrioxamine evolutionary patterns, involving both biosynthetic and transport genes, likely to be related to desferrioxamines chemical diversity. The identified patterns were used to postulate experimentally testable hypotheses after phenotypic characterization, including profiling of siderophores production and growth stimulation of co-cultures under iron deficiency. Based in our results, we propose a novel des gene, which we term desG, as responsible for incorporation of phenylacetyl moieties during biosynthesis of previously reported arylated desferrioxamines. Moreover, a genomic-based classification of the siderophore-binding proteins responsible for specific and generalist siderophore assimilation is postulated. This report provides a much-needed evolutionary framework, with specific insights supported by experimental data, to direct the future ecological and functional analysis of desferrioxamines in the environment.
Cycads are the only early seed plants that have evolved a specialized root to host endophytic bacteria that fix nitrogen. To provide evolutionary and functional insights into this million-year old symbiosis, we investigate endophytic bacterial sub-communities isolated from coralloid roots of species from Dioon (Zamiaceae) sampled from their natural habitats. We employed a sub-community co-culture experimental strategy to reveal both predominant and rare bacteria, which were characterized using phylogenomics and detailed metabolic annotation. Diazotrophic plant endophytes, including Bradyrhizobium, Burkholderia, Mesorhizobium, Rhizobium, and Nostoc species, dominated the epiphyte-free sub-communities. Draft genomes of six cyanobacteria species were obtained after shotgun metagenomics of selected sub-communities. These data were used for whole-genome inferences that suggest two Dioon-specific monophyletic groups, and a level of specialization characteristic of co-evolved symbiotic relationships. Furthermore, the genomes of these cyanobacteria were found to encode unique biosynthetic gene clusters, predicted to direct the synthesis of specialized metabolites, mainly involving peptides. After combining genome mining with detection of pigment emissions using multiphoton excitation fluorescence microscopy, we also show that Caulobacter species co-exist with cyanobacteria, and may interact with them by means of a novel indigoidine-like specialized metabolite. We provide an unprecedented view of the composition of the cycad coralloid root, including phylogenetic and functional patterns mediated by specialized metabolites that may be important for the evolution of ancient symbiotic adaptations.
With an ever-increasing amount of (meta)genomic data being deposited in sequence databases, (meta)genome mining for natural product biosynthetic pathways occupies a critical role in the discovery of novel pharmaceutical drugs, crop protection agents and biomaterials. The genes that encode these pathways are often organised into biosynthetic gene clusters (BGCs). In 2015, we defined the Minimum Information about a Biosynthetic Gene cluster (MIBiG): a standardised data format that describes the minimally required information to uniquely characterise a BGC. We simultaneously constructed an accompanying online database of BGCs, which has since been widely used by the community as a reference dataset for BGCs and was expanded to 2021 entries in 2019 (MIBiG 2.0). Here, we describe MIBiG 3.0, a database update comprising large-scale validation and re-annotation of existing entries and 661 new entries. Particular attention was paid to the annotation of compound structures and biological activities, as well as protein domain selectivities. Together, these new features keep the database up-to-date, and will provide new opportunities for the scientific community to use its freely available data, e.g. for the training of new machine learning models to predict sequence-structure-function relationships for diverse natural products. MIBiG 3.0 is accessible online at https://mibig.secondarymetabolites.org/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.