The potential of the diverse chemistries present in natural products (NP) for biotechnology and medicine remains untapped because NP databases are not searchable with raw data and the NP community has no way to share data other than in published papers. Although mass spectrometry techniques are well-suited to high-throughput characterization of natural products, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social molecular networking (GNPS, http://gnps.ucsd.edu), an open-access knowledge base for community wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Data-driven social-networking should facilitate identification of spectra and foster collaborations. We also introduce the concept of ‘living data’ through continuous reanalysis of deposited data.
A major goal in natural product discovery programs is to rapidly dereplicate known entities from complex biological extracts. We demonstrate here that molecular networking, an approach that organizes MS/MS data based on chemical similarity, is a powerful complement to traditional dereplication strategies. Successful dereplication with molecular networks requires MS/MS spectra of the natural product mixture along with MS/MS spectra of known standards, synthetic compounds, or well-characterized organisms, preferably organized into robust databases. This approach can accommodate different ionization platforms, enabling cross correlations of MS/MS data from ambient ionization, direct infusion, and LC-based methods. Molecular networking not only dereplicates known molecules from complex mixtures, it also captures related analogs, a challenge for many other dereplication strategies. To illustrate its utility as a dereplication tool, we apply mass spectrometry-based molecular networking to a diverse array of marine and terrestrial microbial samples, illustrating the dereplication of 58 molecules including analogs.
A screening program for bioactive compounds from marine cyanobacteria led to the isolation of jamaicamides A-C. Jamaicamide A is a novel and highly functionalized lipopeptide containing an alkynyl bromide, vinyl chloride, beta-methoxy eneone system, and pyrrolinone ring. The jamaicamides show sodium channelblocking activity and fish toxicity. Precursor feeding to jamaicamide-producing cultures mapped out the series of acetate and amino acid residues and helped develop an effective cloning strategy for the biosynthetic gene cluster. The 58 kbp gene cluster is composed of 17 open reading frames that show an exact colinearity with their expected utilization. A novel cassette of genes appears to form a pendent carbon atom possessing the vinyl chloride functionality; at its core this contains an HMG-CoA synthase-like motif, giving insight into the mechanism by which this functional group is created.
Metabolomics experiments can employ non-targeted tandem mass spectrometry to detect hundreds to thousands of molecules in a biological sample. Structural annotation of molecules is typically carried out by searching their fragmentation spectra in spectral libraries or, recently, in structure databases. Annotations are limited to structures present in the library or database employed, prohibiting a thorough utilization of the experimental data. We present a computational tool for systematic compound class annotation: CANOPUS uses a deep neural network to predict 1,270 compound classes from fragmentation spectra, and explicitly targets compounds where neither spectral nor structural reference data are available. CANOPUS even predicts classes for which no MS/MS training data are available. We demonstrate the broad utility of CANOPUS by investigating the effect of the microbial colonization in the digestive system in mice, and through analysis of the chemodiversity of different Euphorbia plants; both uniquely revealing biological insights at the compound class level.
Summary Marine life forms are an important source of structurally-diverse and biologically-active secondary metabolites, several of which have inspired the development of new classes of therapeutic agents. These success stories have had to overcome difficulties inherent to natural products-derived drugs, such as adequate sourcing of the agent and issues related to structural complexity. Nevertheless, several marine-derived agents are now approved, most as `first-in-class' drugs, with 5 of 7 appearing in the past few years. Additionally, there is a rich pipeline of clinical and pre-clinical marine compounds to suggest their continued application in human medicine. Understanding of how these agents are biosynthetically assembled has accelerated in recent years, especially through interdisciplinary approaches, and innovative manipulations and re-engineering of some of these gene clusters are yielding novel agents of enhanced pharmaceutical properties compared with the natural product.
Understanding of the capacity of the natural world to produce secondary metabolites is important to a broad range of fields, including drug discovery, ecology, biosynthesis, and chemical biology, among others. Both the absolute number and the rate of discovery of natural products have increased significantly in recent years. However, there is a perception and concern that the fundamental novelty of these discoveries is decreasing relative to previously known natural products. This study presents a quantitative examination of the field from the perspective of both number of compounds and compound novelty using a dataset of all published microbial and marine-derived natural products. This analysis aimed to explore a number of key questions, such as how the rate of discovery of new natural products has changed over the past decades, how the average natural product structural novelty has changed as a function of time, whether exploring novel taxonomic space affords an advantage in terms of novel compound discovery, and whether it is possible to estimate how close we are to having described all of the chemical space covered by natural products. Our analyses demonstrate that most natural products being published today bear structural similarity to previously published compounds, and that the range of scaffolds readily accessible from nature is limited. However, the analysis also shows that the field continues to discover appreciable numbers of natural products with no structural precedent. Together, these results suggest that the development of innovative discovery methods will continue to yield compounds with unique structural and biological properties.
Curacin A (1) is a potent cancer cell toxin obtained from strains of the tropical marine cyanobacterium Lyngbya majuscula found in Curaçao. Its structure is unique in that it contains the sequential positioning of a thiazoline and cyclopropyl ring, and it exerts its potent cell toxicity through interaction with the colchicine drug binding site on microtubules. A series of stable isotope-labeled precursors were fed to cultures of curacin A-producing strains and, following NMR analysis, allowed determination of the metabolic origin of all atoms in the natural product (one cysteine, 10 acetate units, two S-adenosyl methionine-derived methyl groups) as well as several unique mechanistic insights. Moreover, these incorporation experiments facilitated an effective gene cloning strategy that allowed identification and sequencing of the approximately 64 kb putative curacin A gene cluster. The metabolic system is comprised of a nonribosomal peptide synthetase (NRPS) and multiple polyketide synthases (PKSs) and shows a very high level of collinearity between genes in the cluster and the predicted biochemical steps required for curacin biosynthesis. Unique features of the cluster include (1) all but one of the PKSs are monomodular multifunctional proteins, (2) a unique gene cassette that contains an HMG-CoA synthase likely responsible for formation of the cyclopropyl ring, and (3) a terminating motif that is predicted to function in both product release and terminal dehydrative decarboxylation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.