Genome mining has become a key technology to exploit natural product diversity. While initially performed on a single-genome basis, the process is now being scaled up to mine entire genera, strain collections and microbiomes. However, no bioinformatic framework is currently available for effectively analyzing datasets of this size and complexity. Here, we provide a streamlined computational workflow consisting of two new software tools: The 'Biosynthetic Gene Similarity Clustering And Prospecting Engine' (BiG-SCAPE) facilitates fast and interactive sequence similarity network analysis of biosynthetic gene clusters and gene cluster families. 'CORe Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:
Fueled by the explosion of (meta)genomic data, genome mining of specialized metabolites has become a major technology for drug discovery and studying microbiome ecology. In these efforts, computational tools like antiSMASH have played a central role through the analysis of Biosynthetic Gene Clusters (BGCs). Thousands of candidate BGCs from microbial genomes have been identified and stored in public databases. Interpreting the function and novelty of these predicted BGCs requires comparison with a well-documented set of BGCs of known function. The MIBiG (Minimum Information about a Biosynthetic Gene Cluster) Data Standard and Repository was established in 2015 to enable curation and storage of known BGCs. Here, we present MIBiG 2.0, which encompasses major updates to the schema, the data, and the online repository itself. Over the past five years, 851 new BGCs have been added. Additionally, we performed extensive manual data curation of all entries to improve the annotation quality of our repository. We also redesigned the data schema to ensure the compliance of future annotations. Finally, we improved the user experience by adding new features such as query searches and a statistics page, and enabled direct link-outs to chemical structure databases. The repository is accessible online at https://mibig.secondarymetabolites.org/.
Genome mining has become a key technology to explore and exploit natural product diversity through the identification and analysis of biosynthetic gene clusters (BGCs). Initially, this was performed on a single-genome basis; currently, the process is being scaled up to large-scale mining of pan-genomes of entire genera, complete strain collections and metagenomic datasets from which thousands of bacterial genomes can be extracted at once. However, no bioinformatic framework is currently available for the effective analysis of datasets of this size and complexity. Here, we provide a streamlined computational workflow, tightly integrated with antiSMASH and MIBiG, that consists of two new software tools, BiG-SCAPE and CORASON. BiG-SCAPE facilitates rapid calculation and interactive visual exploration of BGC sequence similarity networks, grouping gene clusters at multiple hierarchical levels, and includes a 'glocal' alignment mode that accurately groups both complete and fragmented BGCs. CORASON employs a phylogenomic approach to elucidate the detailed evolutionary relationships between gene clusters by computing high-resolution multi-locus phylogenies of all BGCs within and across gene cluster families (GCFs), and allows researchers to comprehensively identify all genomic contexts in which particular biosynthetic gene cassettes are found. We validate BiG-SCAPE by correlating its GCF output to metabolomic data across 403 actinobacterial strains. Furthermore, we demonstrate the discovery potential of the platform by using CORASON to comprehensively map the phylogenetic diversity of the large detoxin/rimosamide gene cluster clan, prioritizing three new detoxin families for subsequent characterization of six new analogs using isotopic labeling and analysis of tandem mass spectrometric data.
Cinnamic acid is an aromatic compound commonly found in plants and functions as a central intermediate in lignin synthesis. Filamentous fungi are able to degrade cinnamic acid through multiple metabolic pathways. One of the best studied pathways is the non-oxidative decarboxylation of cinnamic acid to styrene. In Aspergillus niger, the enzymes cinnamic acid decarboxylase (CdcA, formally ferulic acid decarboxylase) and the flavin prenyltransferase (PadA) catalyze together the non-oxidative decarboxylation of cinnamic acid and sorbic acid. The corresponding genes, cdcA and padA, are clustered in the genome together with a putative transcription factor previously named sorbic acid decarboxylase regulator (SdrA). While SdrA was predicted to be involved in the regulation of the non-oxidative decarboxylation of cinnamic acid and sorbic acid, this was never functionally analyzed. In this study, A. niger deletion mutants of sdrA, cdcA, and padA were made to further investigate the role of SdrA in cinnamic acid metabolism. Phenotypic analysis revealed that cdcA, sdrA and padA are exclusively involved in the degradation of cinnamic acid and sorbic acid and not required for other related aromatic compounds. Whole genome transcriptome analysis of ΔsdrA grown on different cinnamic acid related compounds, revealed additional target genes, which were also clustered with cdcA, sdrA, and padA in the A. niger genome. Synteny analysis using 30 Aspergillus genomes demonstrated a conserved cinnamic acid decarboxylation gene cluster in most Aspergilli of the Nigri clade. Aspergilli lacking certain genes in the cluster were unable to grow on cinnamic acid, but could still grow on related aromatic compounds, confirming the specific role of these three genes for cinnamic acid metabolism of A. niger.
With an ever-increasing amount of (meta)genomic data being deposited in sequence databases, (meta)genome mining for natural product biosynthetic pathways occupies a critical role in the discovery of novel pharmaceutical drugs, crop protection agents and biomaterials. The genes that encode these pathways are often organised into biosynthetic gene clusters (BGCs). In 2015, we defined the Minimum Information about a Biosynthetic Gene cluster (MIBiG): a standardised data format that describes the minimally required information to uniquely characterise a BGC. We simultaneously constructed an accompanying online database of BGCs, which has since been widely used by the community as a reference dataset for BGCs and was expanded to 2021 entries in 2019 (MIBiG 2.0). Here, we describe MIBiG 3.0, a database update comprising large-scale validation and re-annotation of existing entries and 661 new entries. Particular attention was paid to the annotation of compound structures and biological activities, as well as protein domain selectivities. Together, these new features keep the database up-to-date, and will provide new opportunities for the scientific community to use its freely available data, e.g. for the training of new machine learning models to predict sequence-structure-function relationships for diverse natural products. MIBiG 3.0 is accessible online at https://mibig.secondarymetabolites.org/.
Type III polyketide synthases (PKSs) produce secondary metabolites with diverse biological activities, including antimicrobials. While they have been extensively studied in plants and bacteria, only a handful of type III PKSs from fungi has been characterized in the last 15 years. The exploitation of fungal type III PKSs to produce novel bioactive compounds requires understanding the diversity of these enzymes, as well as of their biosynthetic pathways. Here, phylogenetic and reconciliation analyses of 522 type III PKSs from 1,193 fungal genomes revealed complex evolutionary histories with massive gene duplications and losses, explaining their discontinuous distribution in the fungal tree of life. In addition, horizontal gene transfer events from bacteria to fungi and, to a lower extent, between fungi, could be inferred. Ancestral gene duplication events have resulted in the divergence of eight phylogenetic clades. Especially, two clades show ancestral linkage and functional co-evolution between a type III PKS and a reducing PKS genes. Investigation of the occurrence of protein domains in fungal type III PKS predicted gene clusters highlighted the diversity of biosynthetic pathways, likely reflecting a large chemical landscape. Type III PKS genes are most often located next to genes encoding cytochrome P450s, MFS transporters and transcription factors, defining ancestral core gene clusters. This analysis also allowed predicting gene clusters for the characterized fungal type III PKSs and provides working hypotheses for the elucidation of the full biosynthetic pathways. Altogether, our analyses provide the fundamental knowledge to motivate further characterization and exploitation of fungal type III PKS biosynthetic pathways.
To date little is known about the genetic background that drives the production and diversification of secondary metabolites in the Hypoxylaceae . With the recent availability of high-quality genome sequences for 13 representative species and one relative ( Xylaria hypoxylon ) we attempted to survey the diversity of biosynthetic pathways in these organisms to investigate their true potential as secondary metabolite producers. Manual search strategies based on the accumulated knowledge on biosynthesis in fungi enabled us to identify 783 biosynthetic pathways across 14 studied species, the majority of which were arranged in biosynthetic gene clusters (BGC). The similarity of BGCs was analysed with the BiG-SCAPE engine which organised the BGCs into 375 gene cluster families (GCF). Only ten GCFs were conserved across all of these fungi indicating that speciation is accompanied by changes in secondary metabolism. From the known compounds produced by the family members some can be directly correlated with identified BGCs which is highlighted herein by the azaphilone, dihydroxynaphthalene, tropolone, cytochalasan, terrequinone, terphenyl and brasilane pathways giving insights into the evolution and diversification of those compound classes. Vice versa , products of various BGCs can be predicted through homology analysis with known pathways from other fungi as shown for the identified ergot alkaloid, trigazaphilone, curvupallide, viridicatumtoxin and swainsonine BGCs. However, the majority of BGCs had no obvious links to known products from the Hypoxylaceae or other well-studied biosynthetic pathways from fungi. These findings highlight that the number of known compounds strongly underrepresents the biosynthetic potential in these fungi and that a tremendous number of unidentified secondary metabolites is still hidden. Moreover, with increasing numbers of genomes for further Hypoxylaceae species becoming available, the likelihood of revealing new biosynthetic pathways that encode new, potentially useful compounds will significantly improve. Reaching a better understanding of the biology of these producers, and further development of genetic methods for their manipulation, will be crucial to access their treasures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.