Ultraperformance liquid chromatography quadrupole time-of-flight mass spectrometry (UPLC-QTOF-MS) was used for geographical origin discrimination of hazelnuts (Corylus avellana L.). Four different LC-MS methods for polar and nonpolar metabolites were evaluated with regard to best discrimination abilities. The most suitable method was used for analysis of 196 authentic samples from harvest years 2014 and 2015 (Germany, France, Italy, Turkey, Georgia), selecting and identifying 20 key metabolites with significant differences in abundancy (5 phosphatidylcholines, 3 phosphatidylethanolamines, 4 diacylglycerols, 7 triacylglycerols, and γ-tocopherol). Classification models using soft independent modeling of class analogy (SIMCA), linear discriminant analysis based on principal component analysis (PCA-LDA), support vector machine classification (SVM), and a customized statistical model based on confidence intervals of selected metabolite levels were created, yielding 99.5% training accuracy at its best by combining SVM and SIMCA. Forty nonauthentic hazelnut samples were subsequently used to estimate as realistically as possible the prediction capacity of the models.
A total
of 262 authentic samples was analyzed by 1H
NMR spectroscopy for the geographical discrimination of hazelnuts
(Corylus avellana L.) covering samples from five
countries (Germany, France, Georgia, Italy, and Turkey) and the harvest
years 2013–2016. This article describes method development
starting with an extraction protocol suitable for separation of polar
and nonpolar metabolites in addition to reduction of macromolecular
components. Using the polar fraction for data analysis, principle
component analysis was applied and used to monitor sample preparation
and measurement. Several machine learning algorithms were tested to
build a classification model. The best results were obtained by a
linear discrimination analysis applying a random subspace algorithm.
The division of the samples in a trainings set and a test set yielded
a cross validation accuracy of 91% for the training set and an accuracy
of 96% for the test set. The identification of key features was carried
out by Kruskal–Wallis test and t test. A feature
assigned to betaine exhibits a significant level for the classification
of all five countries and is considered a possible candidate for the
development of targeted approaches. Further, the results were compared
to a previously published study based on LC–MS analysis of
nonpolar metabolites. In summary, this study shows the robustness
and high accuracy of a discrimination model based on NMR analysis
of polar metabolites.
Fourier-transform near-infrared (FT-NIR) spectroscopy was used to determine the geographical origin of 233 hazelnut samples of various varieties from five different countries (Germany, France, Georgia, Italy, Turkey). The experimental determination of the geographical origin of hazelnuts is important, because there are usually large price differences between the producer countries and thus a risk of food fraud that should not be underestimated. The present work is a feasibility study using a low-cost method, as high-field NMR and UPLC-QTOF-MS have already been used for this question. Sample sets were split with repeated nested cross validation and an ensemble of discriminant classifiers with random subspaces was used to build the classification models. By using a preprocessing strategy consisting of multiplicative scatter correction, bucketing and the mean averaging of five measured spectra per sample, a test accuracy of 90.6 ± 3.9% was achieved, which rivals results obtained with much more expensive infrastructure. The application of the feature selection approach surrogate minimal depth showed that the successful classification is mainly caused by protein signals. In addition, a lowlevel data fusion of the NIR and NMR data was performed to assess how well the two methods complement each other. The data fusion was compared to a complementary approach, where the classification results based on the individual NIR and NMR models were jointly examined.The data fusion performed better than the individual methods with a test accuracy of 96.6 ± 2.8%. A comparison of the outliers in all classification models shows conspicuities in always the same samples, indicating that robust classification models are obtained.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.