The potential of the diverse chemistries present in natural products (NP) for biotechnology and medicine remains untapped because NP databases are not searchable with raw data and the NP community has no way to share data other than in published papers. Although mass spectrometry techniques are well-suited to high-throughput characterization of natural products, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social molecular networking (GNPS, http://gnps.ucsd.edu), an open-access knowledge base for community wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Data-driven social-networking should facilitate identification of spectra and foster collaborations. We also introduce the concept of ‘living data’ through continuous reanalysis of deposited data.
Global Natural Product Social Molecular Networking (GNPS) is an interactive online small molecule-focused tandem mass spectrometry (MS 2 ) data curation and analysis infrastructure. It is intended to provide as much chemical insight as possible into an untargeted MS 2 dataset and to connect this chemical insight to the user's underlying biological questions. This can be performed within one liquid chromatography (LC)-MS 2 experiment or at the repository scale. GNPS-MassIVE is a public data repository for untargeted MS 2 data with sample information (metadata) and annotated MS 2 spectra. These publicly accessible data can be annotated and updated with the GNPS infrastructure keeping a continuous record of all changes. This knowledge is disseminated across all public data; it is a living dataset. Molecular networking-one of the main analysis tools used within the GNPS platform-creates a structured data table that reflects the molecular diversity captured in tandem mass spectrometry experiments by computing the relationships of the MS 2 spectra as spectral similarity. This protocol provides step-by-step instructions for creating reproducible, high-quality molecular networks. For training purposes, the reader is led through a 90-to 120-min procedure that starts by recalling an example public dataset and its sample information and proceeds to creating and interpreting a molecular network. Each data analysis job can be shared or cloned to disseminate the knowledge gained, thus propagating information that can lead to the discovery of molecules, metabolic pathways, and ecosystem/community interactions.
The ability to correlate the production of specialized metabolites to the genetic capacity of the organism that produces such molecules has become an invaluable tool in aiding the discovery of biotechnologically applicable molecules. Here, we accomplish this task by matching molecular families with gene cluster families, making these correlations to 60 microbes at one time instead of connecting one molecule to one organism at a time, such as how it is traditionally done. We can correlate these families through the use of nanospray desorption electrospray ionization MS/MS, an ambient pressure MS technique, in conjunction with MS/MS networking and peptidogenomics. We matched the molecular families of peptide natural products produced by 42 bacilli and 18 pseudomonads through the generation of amino acid sequence tags from MS/MS data of specific clusters found in the MS/MS network. These sequence tags were then linked to biosynthetic gene clusters in publicly accessible genomes, providing us with the ability to link particular molecules with the genes that produced them. As an example of its use, this approach was applied to two unsequenced Pseudoalteromonas species, leading to the discovery of the gene cluster for a molecular family, the bromoalterochromides, in the previously sequenced strain P. piscicida JCM 20779 T . The approach itself is not limited to 60 related strains, because spectral networking can be readily adopted to look at molecular family-gene cluster families of hundreds or more diverse organisms in one single MS/MS network.MS/MS molecular networking | mass spectrometry | microbial ecology T ens of thousands of sequenced microbial genomes or rough drafts of genomes are available at this time, and this number is predicted to grow into the millions over the next decades. This wealth of sequence data has the potential to be used for the discovery of small bioactive molecules through genome mining (1-6). Genome mining is a process in which small molecules are discovered by predicting what compound will be genetically encoded based on the sequences of biosynthetic gene clusters. However, the process of mining genetically encoded small molecules is not keeping pace with the rate by which genome sequences are being obtained. In general, genome mining is still done one gene cluster at a time and requires many person-years of effort to annotate a single molecule. The time and significant expertise that current genome mining requires also make genome mining very expensive. In light of this extensive effort and cost, alternative approaches to genome mining and annotating specialized metabolites must be developed that not only take advantage of the sequenced resources available and make it efficient to perform genome mining on a more global scale but also enable the molecular analysis of unsequenced organisms. Such methods will then significantly reduce the cost of genome mining by increasing the speed with which molecules are connected to candidate genes and using resources already available. Here, we put fo...
Herein, we present a protocol for the use of Global Natural Products Social (GNPS) Molecular Networking, an interactive online chemistry-focused mass spectrometry data curation and analysis infrastructure. The goal of GNPS is to provide as much chemical insight for an untargeted tandem mass spectrometry data set as possible and to connect this chemical insight to the underlying biological questions a user wishers to address. This can be performed within one experiment or at the repository scale. GNPS not only serves as a public data repository for untargeted tandem mass spectrometry data with the sample information (metadata), it also captures community knowledge that is disseminated via living data across all public data. One or the main analysis tools used by the GNPS community is molecular networking. Molecular networking creates a structured data table that reflects the chemical space from tandem mass spectrometry experiments via computing the relationships of the tandem mass spectra through spectral similarity. This protocol provides step-by-step instructions for creating reproducible high-quality molecular networks. For training purposes, the reader is led through the protocol from recalling a public data set and its sample information to creating and interpreting a molecular network. Each data analysis job can be shared or cloned to disseminate the knowledge gained, thus propagating information that can lead to the discovery of molecules, metabolic pathways, and ecosystem/community interactions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.