In high reliability standards fields such as automotive, avionics or aerospace, the detection of anomalies is crucial. An efficient methodology for automatically detecting multivariate outliers is introduced. It takes advantage of the remarkable properties of the Invariant Coordinate Selection (ICS) method. Based on the simultaneous spectral decomposition of two scatter matrices, ICS leads to an affine invariant coordinate system in which the Euclidian distance corresponds to a Mahalanobis Distance (MD) in the original coordinates. The limitations of MD are highlighted using theoretical arguments in a context where the dimension of the data is large. Unlike MD, ICS makes it possible to select relevant components which removes the limitations. Owing to the resulting dimension reduction, the method is expected to improve the power of outlier detection rules such as MD-based criteria. It also greatly simplifies outliers interpretation. The paper includes practical guidelines for using ICS in the context of a small proportion of outliers which is relevant in high reliability standards fields. The choice of scatter matrices together with the selection of relevant invariant components through parallel analysis and normality tests are addressed. The use of the regular covariance matrix and the so called matrix of fourth moments as the scatter pair is recommended. This choice combines the simplicity of implementation together with the possibility to derive theoretical results. A simulation study confirms the good properties of the proposal and compares it with other scatter pairs. This study also provides a comparison with Principal Component Analysis and MD. The performance of our proposal is also 1 arXiv:1612.06118v3 [stat.ME] 2 Feb 2018 evaluated on several real data sets using a user-friendly R package accompanying the paper.
Detecting outliers in a multivariate and unsupervised context is an important and ongoing problem notably for quality control. Many statistical methods are already implemented in R and are briefly surveyed in the present paper. But only a few lead to the accurate identification of potential outliers in the case of a small level of contamination. In this particular context, the Invariant Coordinate Selection (ICS) method shows remarkable properties for identifying outliers that lie on a low-dimensional subspace in its first invariant components. It is implemented in the ICSOutlier package. The main function of the package, ics.outlier, offers the possibility of labelling potential outliers in a completely automated way. Four examples, including two real examples in quality control, illustrate the use of the function. Comparing with several other approaches, it appears that ICS is generally as efficient as its competitors and shows an advantage in the context of a small proportion of outliers lying in a low-dimensional subspace. In quality control, the method may help in properly identifying some defective products while not detecting too many false positives.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.