Secondary metabolites produced by bacteria and fungi are an important source of antimicrobials and other bioactive compounds. In recent years, genome mining has seen broad applications in identifying and characterizing new compounds as well as in metabolic engineering. Since 2011, the ‘antibiotics and secondary metabolite analysis shell—antiSMASH’ (https://antismash.secondarymetabolites.org) has assisted researchers in this, both as a web server and a standalone tool. It has established itself as the most widely used tool for identifying and analysing biosynthetic gene clusters (BGCs) in bacterial and fungal genome sequences. Here, we present an entirely redesigned and extended version 5 of antiSMASH. antiSMASH 5 adds detection rules for clusters encoding the biosynthesis of acyl-amino acids, β-lactones, fungal RiPPs, RaS-RiPPs, polybrominated diphenyl ethers, C-nucleosides, PPY-like ketones and lipolanthines. For type II polyketide synthase-encoding gene clusters, antiSMASH 5 now offers more detailed predictions. The HTML output visualization has been redesigned to improve the navigation and visual representation of annotations. We have again improved the runtime of analysis steps, making it possible to deliver comprehensive annotations for bacterial genomes within a few minutes. A new output file in the standard JavaScript object notation (JSON) format is aimed at downstream tools that process antiSMASH results programmatically.
Covering: 2006 to 2016The computational mining of genomes has become an important part in the discovery of novel natural products as drug leads. Thousands of bacterial genome sequences are publically available these days containing an even larger number and diversity of secondary metabolite gene clusters that await linkage to their encoded natural products. With the development of high-throughput sequencing methods and the wealth of DNA data available, a variety of genome mining methods and tools have been developed to guide discovery and characterisation of these compounds. This article reviews the development of these computational approaches during the last decade and shows how the revolution of next generation sequencing methods has led to an evolution of various genome mining approaches, techniques and tools. After a short introduction and brief overview of important milestones, this article will focus on the different approaches of mining genomes for secondary metabolites, from detecting biosynthetic genes to resistance based methods and "evo-mining" strategies including a short evaluation of the impact of the development of genome mining methods and tools on the field of natural products and microbial ecology.
New bioinformatic tools are needed to analyze the growing volume of DNA sequence data. This is especially true in the case of secondary metabolite biosynthesis, where the highly repetitive nature of the associated genes creates major challenges for accurate sequence assembly and analysis. Here we introduce the web tool Natural Product Domain Seeker (NaPDoS), which provides an automated method to assess the secondary metabolite biosynthetic gene diversity and novelty of strains or environments. NaPDoS analyses are based on the phylogenetic relationships of sequence tags derived from polyketide synthase (PKS) and non-ribosomal peptide synthetase (NRPS) genes, respectively. The sequence tags correspond to PKS-derived ketosynthase domains and NRPS-derived condensation domains and are compared to an internal database of experimentally characterized biosynthetic genes. NaPDoS provides a rapid mechanism to extract and classify ketosynthase and condensation domains from PCR products, genomes, and metagenomic datasets. Close database matches provide a mechanism to infer the generalized structures of secondary metabolites while new phylogenetic lineages provide targets for the discovery of new enzyme architectures or mechanisms of secondary metabolite assembly. Here we outline the main features of NaPDoS and test it on four draft genome sequences and two metagenomic datasets. The results provide a rapid method to assess secondary metabolite biosynthetic gene diversity and richness in organisms or environments and a mechanism to identify genes that may be associated with uncharacterized biochemistry.
Significance
Microbial natural products are a major source of new drug leads, yet discovery efforts are constrained by the lack of information describing the diversity and distributions of the associated biosynthetic pathways among bacteria. Using the marine actinomycete genus
Salinispora
as a model, we analyzed genome sequence data from 75 closely related strains. The results provide evidence for high levels of pathway diversity, with most being acquired relatively recently in the evolution of the genus. The distributions and evolutionary histories of these pathways provide insight into the mechanisms that generate new chemical diversity and the strategies used by bacteria to maximize their population-level capacity to produce diverse secondary metabolites.
Understanding the evolutionary background of a bacterial isolate has applications for a wide range of research. However generating an accurate species phylogeny remains challenging. Reliance on 16S rDNA for species identification currently remains popular. Unfortunately, this widespread method suffers from low resolution at the species level due to high sequence conservation. Currently, there is now a wealth of genomic data that can be used to yield more accurate species designations via modern phylogenetic methods and multiple genetic loci. However, these often require extensive expertise and time. The Automated Multi-Locus Species Tree (autoMLST) was thus developed to provide a rapid ‘one-click’ pipeline to simplify this workflow at: https://automlst.ziemertlab.com. This server utilizes Multi-Locus Sequence Analysis (MLSA) to produce high-resolution species trees; this does not preform multi-locus sequence typing (MLST), a related classification method. The resulting phylogenetic tree also includes helpful annotations, such as species clade designations and secondary metabolite counts to aid natural product prospecting. Distinct from currently available web-interfaces, autoMLST can automate selection of reference genomes and out-group organisms based on one or more query genomes. This enables a wide range of researchers to perform rigorous phylogenetic analyses more rapidly compared to manual MLSA workflows.
Background: The colonial cyanobacterium Microcystis proliferates in a wide range of freshwater ecosystems and is exposed to changing environmental factors during its life cycle. Microcystis blooms are often toxic, potentially fatal to animals and humans, and may cause environmental problems. There has been little investigation of the genomics of these cyanobacteria.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.