Preprocessing of chromatographic and spectral data is an important aspect of analytical sciences. In particular, recent advances in proteomics have resulted in the generation of large data sets that require analysis. To assist accurate comparison of chemical signals, we propose two methods for the alignment of multiple spectral data sets. Based on methods previously described, each chromatograph or spectrum to be aligned is divided and aligned as individual segments to a reference. However, our methods make use of fast Fourier transform for the rapid computation of a cross-correlation function that enables alignments between samples to be optimized. The proposed methods are demonstrated in comparison with an existing method on a chromatographic and a mass spectral data set. It is shown that our methods provide an advantage of speed and a reduction of the number of input parameters required. The software implementations for the proposed alignment methods are available under the downloads section at http://ptcl.chem.ox.ac.uk/~jwong/specalign.
Background: Polychromatic flow cytometry (PFC) allows the simultaneous determination of multiple antigens in the same cell, resulting in the generation of a high number of subsets. As a consequence, data analysis is the main difficulty with this technology. Here we show the use of cluster analysis (CA) and principal component analyses (PCA) to simplify multicolor data visualization and to allow subjects' classification. Methods: By eight-colour cytofluorimetric analysis, we investigated the T cell compartment in donors of different age (young, middle-aged, and centenarians). T cell subsets were identified by combining positive and negative expression of antigens. The resulting data set was organized into a matrix and subjected to CA and PCA. Results: CA clustered people of different ages on the basis of cytofluorimetric profile. PCA of the cellular subsets identified centenarians within a different cluster from
In the Food research and production field, system complexity is increasing and several new challenges are emerging every day. This implies a urgent necessity to extract information and obtain models capable of inferring the underlying relationships that link all the variability sources which characterize food or its production process (e.g. compositional profile, processing conditions) to very general end-properties of foodstuff, such as the healthiness, the consumer perception, the link to a territory and the effect of the production chain itself on food.\ud
This makes a “deductive”, theory-driven research approach inefficient, since it is often difficult to formulate hypotheses. Explorative Multivariate Data Analysis methods, together with the most recent analytical instrumentation, offer the possibility to come back to an “inductive” data-driven attitude with a minimum of a priori hypotheses, instead helping in formulating new ones from the direct observation of data.\ud
The aim of this Chapter is to offer the reader an overview of the most significant tools which can be used in a preliminary, exploratory phase, ranging from the most classical descriptive statistics methods, to Multivariate Analysis methods, with particular attention to Projection methods. For all techniques, examples are given so that the main advantage of this techniques, that is a direct, graphical representation of data and their characteristics, can be immediately experienced by the reader
Multilinear PLS (NPLS) and its discriminant version (NPLS-DA) are very diffuse tools to model multi-way data\ud
arrays. Analysis of NPLS weights and NPLS regression coefficients allows data patterns, feature correlation\ud
and covariance structure to be depicted. In this study we propose an extension of the Variable Importance\ud
in Projection (VIP) parameter to multi-way arrays in order to highlight the most relevant features to predict\ud
the studied dependent properties either for interpretative purposes or to operate feature selection. The VIPs\ud
are implemented for each mode of the data array and in the case of multivariate dependent responses considering\ud
both the cases of expressing VIP with respect to each single y-variable and of taking into account\ud
all y-variables altogether.\ud
Three different applications to real data are presented: i) NPLS has been used to model the properties of\ud
bread loaves from near infrared spectra of dough, acquired at different leavening times, and corresponding\ud
to different flour formulations. VIP values were used to assess the spectral regions mainly involved in determining\ud
flour performance; ii) assessing the authenticity of extra virgin olive oils by NPLS-DA elaboration of\ud
gas chromatography/mass spectrometry data (GC–MS). VIP values were used to assess both GC and MS discriminant\ud
features; iii) NPLS analysis of a fMRI-BOLD experiment based on a pain paradigm of acute\ud
prolonged pain in healthy volunteers, in order to reproduce efficiently the corresponding psychophysical\ud
pain profiles. VIP values were used to identify the brain regions mainly involved in determining the pain intensity\ud
profile
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.