Sample outlier detection is imperative before calculating a multivariate calibration model. Outliers, especially in high-dimensional space, can be difficult to detect. The outlier measures Hotelling's t-squared, Q-residuals, and Studentized residuals are standard in analytical chemistry with spectroscopic data. However, these and other merits are tuning parameter dependent and sensitive to the outlier themselves, i.e., the measures are susceptible to swamping and masking. Additionally, different samples are also often flagged as outliers depending on the outlier measure used. Sum of ranking differences (SRD) is a new generic fusion tool that can simultaneously evaluate multiple outlier measures across windows of tuning parameter values thereby simplifying outlier detection and providing improved detection. Presented in this paper is SRD to detect multiple outliers despite the effects of masking and swamping. Both spectral (x-outlier) and analyte (y-outlier) outliers can be detected separately or in tandem with SRD using respective merits. Unique to SRD are fusion verification processes to confirm samples flagged as outliers. The SRD process also allows for sample masking checks. Presented, and used by SRD, are several new outlier detection measures. These measures include atypical uses of Procrustes analysis and extended inverted signal correction (EISC). The methodologies are demonstrated on two near-infrared (NIR) data sets.
Classifying samples into categories is a common problem in analytical chemistry and other fields. Classification is usually based on only one method, but numerous classifiers are available with some being complex, such as neural networks, and others are simple, such as k nearest neighbors. Regardless, most classification schemes require optimization of one or more tuning parameters for best classification accuracy, sensitivity, and specificity. A process not requiring exact selection of tuning parameter values would be useful. To improve classification, several ensemble approaches have been used in past work to combine classification results from multiple optimized single classifiers. The collection of classifications for a particular sample are then combined by a fusion process such as majority vote to form the final classification. Presented in this Article is a method to classify a sample by combining multiple classification methods without specifically classifying the sample by each method, that is, the classification methods are not optimized. The approach is demonstrated on three analytical data sets. The first is a beer authentication set with samples measured on five instruments, allowing fusion of multiple instruments by three ways. The second data set is composed of textile samples from three classes based on Raman spectra. This data set is used to demonstrate the ability to classify simultaneously with different data preprocessing strategies, thereby reducing the need to determine the ideal preprocessing method, a common prerequisite for accurate classification. The third data set contains three wine cultivars for three classes measured at 13 unique chemical and physical variables. In all cases, fusion of nonoptimized classifiers improves classification. Also presented are atypical uses of Procrustes analysis and extended inverted signal correction (EISC) for distinguishing sample similarities to respective classes.
Synchronous fluorescence spectroscopy (SFS) is used for quantitative analysis as well as for qualitative analysis, such as with classification methods. With SFS, determination of a useful wavelength interval between the excitation and emission wavelengths (Δλ) is required. There are a multitude of Δλ intervals that can be evaluated and optimization of the best one is complex. Presented here is a fusion approach for combining Δλ intervals, thereby negating the need to perform the selection by a skilled operator. To demonstrate the feasibility of omitting selection of the best Δλ interval, adulterated argan oil samples are studied. Argan oil is made from the argan tree, endemic to southwestern Morocco, and is well-known for its cosmetic, pharmaceutical, and nutritional applications. It is considered a luxury product and exported from Morocco around the world. Consequently, detection of argan oil adulteration followed by quantitative analysis of the adulterant concentration is important. This study uses fusion of SFS spectra obtained at ten Δλ intervals to first detect adulteration of argan oil by corn oil and then determination of the corn oil content. For detection of adulteration, 15 one-class classification methods were used simultaneously over the ten Δλ sets of SFS spectra. For tuning parameter dependent classifiers such as Mahalanobis distance, non-optimized classifiers are used. Raw classification values are used, removing the need to set classifier-dependent threshold values, albeit, ultimately, a fusion decision rule is needed for classification. For quantitative analysis, two calibration approaches are evaluated with fusion of these ten Δλ SFS spectral data sets. One is multivariate calibration by partial least squares (PLS). The second approach is a univariate calibration process where the SFS spectra are summed over respective SFS spectral ranges, also known as the area under the curve (AUC). For adulteration detection and quantitation of the corn oil, prediction errors decrease with fusion compared to individually using the ten Δλ interval SFS specific data sets. For this argan oil data set, the AUC method generally provides equivalent prediction errors to PLS.
Calibration maintenance is an important aspect of multivariate calibration. With spectral measurements, the goal of calibration maintenance involves sustaining the predictability of a primary calibration model in new secondary conditions. Among the many methodologies, penalty‐based Tikhonov regularization variants have been successful by sample augmenting primary calibration data with a matrix of just a few secondary samples as well as operating with an additional sparse penalty to include wavelength selection. Studied in this paper is a new sample‐wise (local) Tikhonov regularization–based penalty calibration approach. Penalized is a diagonal matrix with the residual vector (relative to the primary calibration space) of the new secondary sample. Thus, the same full calibration set is used for each new sample. Changing for each secondary sample is the corresponding sample‐wise residual vector on the penalized diagonal matrix. The intent of the presented approach is to form sample‐wise regression vectors desensitized to characteristics of the new sample not present in the primary calibration set. The more distinct the secondary conditions are relative to the primary conditions, the more unsuccessful this local model updating becomes. Proposed is a sample‐wise outlier mechanism to discern when the residual penalty can or cannot be used to form a useful updated model. The residual penalty modeling and outlier detection processes require tuning parameter optimizations. A fusion approach is used to automatically select tuning parameter values. Simulated and near‐infrared data are evaluated, demonstrating the applicability of the method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.