In the last decade, the use of multivariate statistical techniques developed for analytical chemistry has been adopted widely in food science and technology. Usually, chemometrics is applied when there is a large and complex dataset, in terms of sample numbers, types, and responses. The results are used for authentication of geographical origin, farming systems, or even to trace adulteration of high value-added commodities. In this article, we provide an extensive practical and pragmatic overview on the use of the main chemometrics tools in food science studies, focusing on the effects of process variables on chemical composition and on the authentication of foods based on chemical markers. Pattern recognition methods, such as principal component analysis and cluster analysis, have been used to associate the level of bioactive components with in vitro functional properties, although supervised multivariate statistical methods have been used for authentication purposes. Overall, chemometrics is a useful aid when extensive, multiple, and complex real-life problems need to be addressed in a multifactorial and holistic context. Undoubtedly, chemometrics should be used by governmental bodies and industries that need to monitor the quality of foods, raw materials, and processes when high-dimensional data are available. We have focused on practical examples and listed the pros and cons of the most used chemometric tools to help the user choose the most appropriate statistical approach for analysis of complex and multivariate data.
Chemometrics has achieved major recognition and progress in the analytical chemistry field. In the first part of this tutorial, major achievements and contributions of chemometrics to some of the more important stages of the analytical process, like experimental design, sampling, and data analysis (including data pretreatment and fusion), are summarised. The tutorial is intended to give a general updated overview of the chemometrics field to further contribute to its dissemination and promotion in analytical chemistry.
In the projection methods (PCA, PLS) two distance measures are of importance. They are the score distance (SD, a.k.a. leverage) and the orthogonal distance (OD, a.k.a. the residual variance). This paper shows that both distance measures can be modeled by the x 2 -distribution. Each model includes a scaling factor that can be described by an explicit equation. Moreover, the models depend on an unknown number of degrees of freedom, which have to be estimated using a training dataset. Such modeling is further applied to classification within the SIMCA framework, and various acceptance areas are built for a given significance level. A triangular area, constructed using the sum of the normalized SD and OD, is deemed to be the most practical. This theoretical notion is supported by three examples. The first is based on a simulated dataset, while the other two employ real world data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.