We engineered a machine learning approach, MSHub, to enable auto-deconvolution of gas chromatography-mass spectrometry (GC-MS) data. We then designed workflows to enable the community to store, process, share, annotate, compare and perform molecular networking of GC-MS data within the Global Natural Product Social (GNPS) Molecular Networking analysis platform. MSHub/GNPS performs auto-deconvolution of compound fragmentation patterns via unsupervised non-negative matrix factorization and quantifies the reproducibility of fragmentation patterns across samples.Given its ease of use and low operational cost, GC-MS has applications with broad societal effect, such as detection of metabolic disease in newborns, toxicology, doping, forensics, food science and clinical testing. The predominant ionization technique in GC-MS is electron ionization (EI), in which all compounds are ionized by high-energy (70-eV) electrons. Because fragmentation occurs with ionization, EI GC-MS data are subjected to spectral deconvolution, a process that separates fragmentation ion patterns for each eluting molecule into a composite mass spectrum.The 70 eV for ionizing electrons in GC-MS has been the standard, making it possible to use decades-old EI reference spectra for annotation 1 . There are ~1.2 million reference spectra that have been accumulated and curated over a period of more than 50 years 2 . Many tools and repositories for GC-MS data have been introduced [3][4][5][6][7][8][9][10][11][12][13][14][15] ; however, much of GC-MS data processing is restricted to vendor-specific formats and software 8 . Currently, deconvolution requires setting multiple parameters manually [3][4][5] or posessing computational skills to run the software 7 . Also, the lack of data sharing in a uniform format precludes data comparison between laboratories and prevents taking advantage of repository-scale information and community knowledge, resulting in infrequent reuse of GC-MS data 8,[11][12][13][14][15] .Although batch modes exist, deconvolution quality is currently not enhanced by using information from all other files. To leverage across-file information, improve scalability of spectral deconvolution and eliminate the need for manually setting the deconvolution parameters (m/z error correction of the ions and peak shapeslopes of raising and trailing edges, peak RT shifts and noise/intensity thresholds), we developed an algorithmic learning strategy for auto-deconvolution (Fig. 1a-f). We deployed this functionality within GNPS/MassIVE (https://gnps.ucsd.edu) 16 (Fig. 1f-i). To promote analysis reproducibility, all GNPS jobs performed are retained in the 'My User' space and can be shared as hyperlinks.This user-independent 'automatic' parameter optimization is accomplished via fast Fourier transform (FFT), multiplication and inverse Fourier transform for each ion across an entire data set, followed by an unsupervised non-negative matrix factorization (NMF) (one-layer neural network). Then, the compositional consistency of spectral patterns for each spec...
Global climate change and the expected increase in temperature are altering the relationship between geography and grapevine (V. vinifera) varietal performance, and the implications of which are yet to be fully understood. We investigated berry phenology and biochemistry of 30 cultivars, 20 red and 10 white, across three seasons (2017–2019) in response to a consistent average temperature difference of 1.5°C during the growing season between two experimental sites. The experiments were conducted at Ramat Negev (RN) and Ramon (MR) vineyards, located in the Negev desert, Israel. A significant interaction between vineyard location, season, and variety affected phenology and berry indices. The warmer RN site was generally associated with an advanced phenological course for the white cultivars, which reached harvest up to 2 weeks earlier than at the MR site. The white cultivars also showed stronger correlation between non-consecutive phenological stages than did the red ones. In contrast, harvest time of red cultivars considerably varied according to seasons and sites. Warmer conditions extended fruit developmental phases, causing berry shriveling and cluster collapse in a few cultivars such as Pinot Noir, Ruby Cabernet, and Tempranillo. Analyses of organic acid content suggested differences between red and white cultivars in the content of malate, tartrate, and citrate in response to the temperature difference between sites. However, generally, cultivars at lower temperatures exhibited lower concentrations of pulp organic acids at véraison, but acid degradation until harvest was reduced, compared to the significant pace of acid decline at the warmer site. Sugars showed the greatest differences between sites in both white and red berries at véraison, but differences were seasonal dependent. At harvest, cultivars of both groups exhibited significant variation in hexose/sucrose ratio, and the averages of which varied from 1.6 to 2.9. Hexose/sucrose ratio was significantly higher among the red cultivars at the warmer RN, while this tendency was very slight among white cultivars. White cultivars seem to harbor a considerable degree of resilience due to a combination of earlier and shorter ripening phase, which avoids most of the summer heat. Taken together, our study demonstrates that the extensive genetic capacity of V. vinifera bears significant potential and plasticity to withstand the temperature increase associated with climate change.
Gas chromatography-mass spectrometry (GC-MS) represents an analytical technique with significant practical societal impact. Spectral deconvolution is an essential step for interpreting GC-MS data. No public GC-MS repositories that also enable repository-scale analysis exist, in part because deconvolution requires significant user input. We therefore engineered a scalable machine learning workflow for the Global Natural Product Social Molecular Networking (GNPS) analysis platform to enable the mass spectrometry community to store, process, share, annotate, compare, and perform molecular networking of GC-MS data. The workflow performs auto-deconvolution of compound fragmentation patterns via unsupervised non-negative matrix factorization, using a Fast Fourier Transform-based strategy to overcome scalability limitations. We introduce a “balance score” that quantifies the reproducibility of fragmentation patterns across all samples. We demonstrate the utility of the platform with breathomics analysis applied to the early detection of oesophago-gastric cancer, and by creating the first molecular spatial map of the human volatilome.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.