General conceptualization 48 RS, DP, LFN, MW, PCD conceptualized the idea of IIMN and its integration into GNPS and 49 feature-finding software tools 50 RS, DP, LFN, PCD wrote the manuscript 51 RS, BA, FH, HUH conceptualized the MZmine feature grouping workflow 52 UK, HH provided discussion and feedback on IIMN and the MZmine workflow 53 Development 54 RS developed the IIMN modules in MZmine and the MS 2 spectral library generation modules 55 MW, RS developed the "supplementary edges" format in the FBMN workflow to enable IIMN 56 MW programmed the IIMN workflow on GNPS 57 RS, MW developed the direct submission of MZmine data to run IIMN on GNPS 58 JR, MGA developed the XCMS/CAMERA IIMN integration in R 59 HT developed the MS-DIAL FBMN and IIMN integration 60 KD developed the MS 2 spectral merge function into the export modules for FBMN, IIMN, and 61 SIRIUS, which was coordinated by SB 62 TP, AK provided feedback and help for the development and integration of IIMN in MZmine 63
68Metabolomics data are difficult to find and reuse, even in public repositories. We, therefore, developed the 69Reanalysis of Data User (ReDU) interface (https://redu.ucsd.edu/), a community-and data-driven approach that 70 solves this problem at the repository scale. ReDU enables public data discovery and co-or re-analysis via 71 uniformly formatted, publicly available MS/MS data and metadata in the Global Natural Product Social Molecular 72Networking Platform (GNPS), consistent with findable, accessible, interoperable, and reusable (FAIR) 73principles. 1 74 75 76 Many simple but important questions can be asked using repository-scale public data. For example, what 77 human biospecimen or sampling location is best for detecting a given drug? Or what molecules are found in 78 humans <2 years old? Current metabolomics repositories typically require manual navigation and conversion of 79 thousands of different vendor-formatted files with inconsistent metadata formats, and developing data integration 80 algorithms, greatly complicating analyses. 81 Results and DiscussionReDU addresses FAIR principles by enabling users to find and choose files (Fig 1a). This is possible 82because ReDU formats sample information consistently via a template and drag-and-drop validator backed by 83 standard controlled vocabularies and ontologies (e.g. NCBI taxonomy, 2 UBERON 3, Disease Ontology 4 and MS 84 ontology), and includes geographical location (important for natural products and environmental samples). ReDU 85 automatically uses all public data in the GNPS/MassIVE repository that has the corresponding ReDU-compliant 86 sample information. 34,087 files in GNPS are ReDU-compatible including natural and human-built environments, 87human and animal tissues, biofluids, food, and other data from around the world (Fig 1f), analyzed using different 88 instruments, ionization methods, sample preparation methods, etc. From the 103,230,404 million MS/MS spectra 89 included in ReDU, 4,528,624 spectra were annotated (rate of 4.39% with settings yielding ~1% FDR) as one of 90 13,217 unique chemicals (Table S1). 5,6,7 91 Uniformity of data and sample information in ReDU enables metadata-based and repository-scale 92 analyses ( Fig. 1b-g). Chemical explorer enables selection of a molecule and retrieval of its associations with the 93 metadata, i.e. sample information association. For instance, selecting 12-ketodeoxycholic acid (filtering to 94 include human feces) revealed it was observed after infancy (Fig 1c), whereas cholic acid displayed the opposite 95 trend, coupled to the developing microbiome. Similarly, rosuvastatin was found in adults matching prescription 96 demographics. Another approach enabled is chemical enrichment analysis. For example, human blood, feces, 97 and urine differed by bilirubin, urobilin, and stercobilin (Fig 1d). Bilirubin was more frequently annotated in blood, 98and urobilin and stercobilin were most often annotated in feces. 8 Similarly, comparison of bacterial cultures 99 revealed differences in annotati...
Correspondence:We introduce a web-enabled small-molecule mass spectrometry (MS) search engine. To date, no tool can query all the public small-molecule tandem MS data in metabolomics repositories, greatly limiting the utility of these resources in clinical, environmental and natural product applications. Therefore, we introduce a Mass Spectrometry Search Tool (MASST) (https://proteosafe-extensions.ucsd.edu/masst/), that enables the discovery of molecular relationships among accessible public metabolomics and natural product tandem mass spectrometry data (MS/MS).The ability to discover related sequences of proteins or genes in publicly accessible sequence data using Basic Local Alignment Search Tool (BLAST), connected to public sequence data repositories through a web interface (WebBLAST, https://blast.ncbi.nlm.nih.gov/Blast.cgi), was introduced in the 1990s. 1 It has garnered more than 138,159 citations according to Google Scholar, placing it among the most widely used bioinformatics tools. WebBLAST enabled detection of the number of sequences in public repositories related to a given query, the organisms in which those sequences occur, and the evolutionary and inferred functional relationships among related sequences. It therefore permitted a broad community to answer simple but scientifically compelling questions such as: Is a protein or DNA sequence common or rare? How is this sequence distributed among different kinds of organisms? What other sequences are related to this sequence (evolutionary variants, or new mutations, or synthetic constructs)? In the early days of making DNA or protein sequence data publicly available, the "metadata" (e.g., contextual information about the sample, population and location the sequence came from, and technical information about how it was produced) in the public repositories was limited and no standards existed. This is a situation similar to the current status of much of the mass spectrometry data in the public domain. However, when publicly deposited data has metadata available, such as organism, location of sampling, host phenotypes such as diseases, etc., it becomes possible to start building higherlevel hypotheses regarding the evolutionary, ecological or functional relationships among these DNA, RNA or protein sequences. The development of the ability to search data with added context continues to have profound impacts on fields including medicine, chemistry, genetics, molecular biology, genomics, microbiology, and ecology.Algorithms developed for mass spectrometry data, including molecular networking 2 and fragmentation trees 3 , enable similarity searches, while powerful metabolomics analysis software infrastructures, such as MS-DIAL 4 , MetaboAnalyst 5 , XCMS Online 6 , HMDB 7 , some of which have been available for over a decade, focus on annotation of MS/MS spectra or finding statistical relationships between molecular features. However, none of the existing tools enable searching against public data in repositories. Finding the distribution of specific data of i...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.