Evaluation of GC-MS data may be challenging due to the high complexity of data including overlapped, embedded, retention time shifted and low S/N ratio peaks. In this work, we demonstrate a new approach, PARAFAC2 based Deconvolution and Identification System (PARADISe), for processing raw GC-MS data. PARADISe is a computer platform independent freely available software incorporating a number of newly developed algorithms in a coherent framework. It offers a solution for analysts dealing with complex chromatographic data. It allows extraction of chemical/metabolite information directly from the raw data. Using PARADISe requires only few inputs from the analyst to process GC-MS data and subsequently converts raw netCDF data files into a compiled peak table. Furthermore, the method is generally robust towards minor variations in the input parameters. The method automatically performs peak identification based on deconvoluted mass spectra using integrated NIST search engine and generates an identification report. In this paper, we compare PARADISe with AMDIS and ChromaTOF in terms of peak quantification and show that PARADISe is more robust to user-defined settings and that these are easier (and much fewer) to set. PARADISe is based on non-proprietary scientifically evaluated approaches and we here show that PARADISe can handle more overlapping signals, lower signal-to-noise peaks and do so in a manner that requires only about an hours worth of work regardless of the number of samples. We also show that there are no non-detects in PARADISe, meaning that all compounds are detected in all samples.
The capabilities of dynamic headspace entrainment followed by thermal desorption in combination with gas chromatography (GC) coupled to single quadrupole mass spectrometry (MS) have been tested for the determination of volatile components of olive oil. This technique has shown a great potential for olive oil quality classification by using an untargeted approach. The data processing strategy consisted of three different steps: component detection from GC-MS data using novel data treatment software PARADISe, a multivariate analysis using EZ-Info, and the creation of the statistical models. The great number of compounds determined enabled not only the development of a quality classification method as a complementary tool to the official established method "PANEL TEST" but also a correlation between these compounds and different types of defect. Classification method was finally validated using blind samples. An accuracy of 85% in oil classification was obtained, with 100% of extra virgin samples correctly classified.
PARAFAC2 is applied in multiple research areas, for example, where data containing shifts are analysed, but it is a challenge to determine the appropriate number of components in the model. In this paper, it is hypothesized that the core consistency diagnostic, which is currently applied in, for example, PARAFAC1 can be used to determine model complexity in PARAFAC2. Theoretically, a PARAFAC1 model is fitted ‘inside’ the PARAFAC2 algorithm, and it should therefore be possible to apply the core consistency diagnostic from PARAFAC1 in PARAFAC2. To support this hypothesis, three different datasets, as well as simulated datasets, have been evaluated by means of PARAFAC2, and the core consistencies have been investigated. There is a general trend that if the core consistency is low, the model is overfitted as in PARAFAC1. Also, core consistency captures the true variation in the data, whereas small peaks are easily overlooked by visual inspection of noisy models. However, for determining the number of components in a PARAFAC2 model, we suggest usage of the core consistency in combination with other model parameters such as residuals, loadings, and split‐half analysis. Copyright © 2013 John Wiley & Sons, Ltd.
Isolates of the zoonotic pathogen Campylobacter are generally considered to be unable to metabolize glucose due to lack of key glycolytic enzymes. However, the Entner-Doudoroff (ED) pathway has been identified in Campylobacter jejuni subsp. doylei and a few C. coli isolates. A systematic search for ED pathway genes in a wide range of Campylobacter isolates and in the C. jejuni/coli PubMLST database revealed that 1.7% of >6,000 genomes encoded a complete ED pathway, including both C. jejuni and C. coli from diverse clinical, environmental and animal sources. In rich media, glucose significantly enhanced stationary phase survival of a set of ED-positive C. coli isolates. Unexpectedly, glucose massively promoted floating biofilm formation in some of these ED-positive isolates. Metabolic profiling by gas chromatography–mass spectrometry revealed distinct responses to glucose in a low biofilm strain (CV1257) compared to a high biofilm strain (B13117), consistent with preferential diversion of hexose-6-phosphate to polysaccharide in B13117. We conclude that while the ED pathway is rare amongst Campylobacter isolates causing human disease (the majority of which would be of agricultural origin), some glucose-utilizing isolates exhibit specific fitness advantages, including stationary-phase survival and biofilm production, highlighting key physiological benefits of this pathway in addition to energy conservation.
a Parallel factor analysis 2 (PARAFAC2) has been shown to be a powerful tool for resolution of complex overlapping peaks in chromatographic analyses. It is particularly useful because of its ability to handle shifts in the elution time mode and peak shape changes. Like all curve resolution techniques, PARAFAC2 will only find chemically meaningful parameters (elution time profiles and mass spectra) if the correct number of factors are determined. So far, the primary way to determine an appropriate number of factors, when using PARAFAC2, is to calculate models with different number of factors and then inspect the models manually. This approach is time consuming, and the result may be biased because of the manual assessment of the model quality, making PARAFAC2 inaccessible for analytical chemists in general. Here, we develop a method that can determine an appropriate number of factors in an automated way. The automation is based on a number of model diagnostics (quality criteria) collected from models with different numbers of factors. Combining these diagnostics, it is possible to assess what the appropriate number of components is. In this work, only gas chromatography-mass spectrometry data are considered. However, it will most likely be fairly straightforward to expand the work to also cover liquid chromatography data (with a multivariate detector). Automating the model quality evaluation of the PARAFAC2 model enables both the inexperienced and trained user to perform comprehensive and advanced analysis of chromatographic data with a minimum of manual work.
An automated method (FastChrom) for baseline correction, peak detection and assignment (grouping) of similar peaks across samples has been developed. The method has been tested both on artificial data and a dataset obtained from gas chromatograph analysis of wine samples. As part of the automated approach, a new method for baseline estimation has been developed and compared with other methods. FastChrom has been shown to perform at least as well as conventional software. However, compared to other approaches, FastChrom finds more peaks in the chromatograms and not only those with retention times defined by the user. FastChrom is fast and easy to use and offers the possibility of applying a retention time index which facilitates the identification of peaks and the comparison between experiments.
Bilinear and multilinear models such as principal component analysis and PARAFAC have intrinsic sign indeterminacies. For example, any loading vector can be multiplied by −1 if another vector of that particular component is also multiplied by −1 without affecting the loss function values. This sometimes causes problems, for example, with respect to interpretation. In this paper, a method is developed to fix the sign indeterminacy for the PARAFAC, Tucker3 and PARAFAC2 models. Copyright © 2013 John Wiley & Sons, Ltd.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.