Using Angles to Identify Concentrated Multivariate Outliers

Ruiz, Jesús Juan; Prieto, Francisco J.

doi:10.1198/004017001316975907

Cited by 21 publications

(16 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The performance of these methods is documented in the article under discussion. Another attempt in this direction was given by Juan and Prieto (2001), who tried to nd outliers by looking at the angles subtended by clusters of points projected on an ellipsoid. They showed reasonable performance of this method but provided computations only for very concentrated outliers (‹ D 001).…”

Section: A3 An Expression For ƒ Zmentioning

confidence: 99%

See 1 more Smart Citation

Multivariate Outlier Detection and Robust Covariance Matrix Estimation

2001

Self Cite

View full text Add to dashboard Cite

In this article, we present a simple multivariate outlier-detection procedure and a robust estimator for the covariance matrix, based on the use of information obtained from projections onto the directions that maximize and minimize the kurtosis coef cient of the projected data. The properties of this estimator (computationa l cost, bias) are analyzed and compared with those of other robust estimators described in the literature through simulation studies. The performance of the outlier-detection procedure is analyzed by applying it to a set of well-known examples.KEY WORDS: Kurtosis; Linear projection; Multivariate statistics.The detection of outliers in multivariate data is recognized to be an important and dif cult problem in the physical, chemical, and engineering sciences. Whenever multiple measurements are obtained, there is always the possibility that changes in the measurement process will generate clusters of outliers. Most standard multivariate analysis techniques rely on the assumption of normality and require the use of estimates for both the location and scale parameters of the distribution. The presence of outliers may distort arbitrarily the values of these estimators and render meaningless the results of the application of these techniques. According to Rocke and Woodruff (1996), the problem of the joint estimation of location and shape is one of the most dif cult in robust statistics. Wilks (1963) proposed identifying sets of outliers of size j in normal multivariate data by checking the minimum values of the ratios -A 4I 5 -=-A-, where -A 4I 5 -is the internal scatter of a modi ed sample in which the set of observations I of size j has been deleted and -A-is the internal scatter of the complete sample. The internal scatter is proportional to the determinant of the covariance matrix and the ratios are computed for all possible sets of size j. Wilks computed the distribution of the statistic for j equal to 1 and 2. It is well known that this procedure is a likelihood ratio test and that for j D 1 the method is equivalent to selecting the observation with the largest Mahalanobis distance from the center of the data.Because a direct extension of this idea to sets of outliers larger than 2 or 3 is not practical, Gnanadesikan and Kettenring (1972) proposed to reduce the multivariate detection problem to a set of univariate problems by looking at projections of the data onto some direction. They chose the direction of maximum variability of the data and, therefore, they proposed to obtain the principal components of the data and search for outliers in these directions. Although this method provides the correct solution when the outliers are located close to the directions of the principal components, it may fail to identify outliers in the general case.An alternative approach is to use robust location and scale estimators. Maronna (1976) studied af nely equivariant M estimators for covariance matrices, and Campbell (1980) proposed using the Mahalanobis distance computed using M estimators for the mean...

show abstract

Section: A3 An Expression For ƒ Zmentioning

confidence: 99%

“…The greatest chance of success comes from use of multiple methods, at least one of which is a general-purpose method such as FAST-MCD and MULTOUT, and at least one of which is meant for clustered outliers, such as kurtosis1, the angle method of Juan and Prieto (2001), or our clustering method (Rocke and Woodruff 2001).…”

Section: A3 An Expression For ƒ Zmentioning

confidence: 99%

Multivariate Outlier Detection and Robust Covariance Matrix Estimation

2001

Self Cite

View full text Add to dashboard Cite

show abstract

Section: Introductionmentioning

confidence: 99%

“…Requirements for such detectors include a low input-output processing time, low computational time, and minimal number of user-specified parameters. Several types of anomaly detectors that have been considered for real-time implementation are based on linear mixing models [3-8], kernel functions [3, 9, 10], robust distance [11][12][13], angle [14,15], and statistical distance [1,[16][17][18].One type of anomaly detectors based on linear mixing models requires prior knowledge of the spectra of the end members. One of the difficulties with this type of anomaly detectors is that the spectra of the end members for the complex background are contaminated with background spectra and do not really resemble the actual background spectra.…”

mentioning

confidence: 99%

Maximized subspace model for hyperspectral anomaly detection

2011

Pattern Anal Applic

View full text Add to dashboard Cite

An important application in remote sensing using hyperspectral imaging system is the detection of anomalies in a large background in real-time. A basic anomaly detector for hyperspectral imagery that performs reasonaly well is the RX detector. In practice, the subspace RX (SSRX) detector which deletes the clutter subspace has been known to perform better than the RX detector. In this paper an anomaly detector that can do better than the SSRX detector without having to delete the clutter subspace is developed. The anomaly detector partials out the effect of the clutter subspace by predicting the background using a linear combination of the clutter subspace. The Mahalanobis distance of the resulting residual is defined as the anomaly detector. The coefficients of the linear combination are chosen to maximize a criterion based on squared correlation. The experimental results are obtained by implementing the anomaly detector as a global anomaly detector in unsupervised mode with background statistics computed from hyperspectral data cubes with wavelengths in the visible and near-infrared range. The results show that the anomaly detector has a better performance than the SSRX detector. In conclusion, the anomaly detector that is based on partialling out can achieve better performance than the conventional anomaly detectors.Keywords Anomaly detection Á Hyperspectral imaging Á Remote sensing BackgroundAnomaly detection in hyperspectral imagery refers to a detection algorithm that identifies an abnormal pixel that deviates significantly from a population of normal pixels in a data cube. Hyperspectral imaging is particularly useful in detecting anomalous man-made objects in a large natural background, especially in real time. As hyperspecral imaging advances, the spectral bands will become narrower, the size of the data cube will increase, and the realtime processing challenge will continue. Hardware designs that result in higher rates of data throughput are one approach while better, more efficient anomaly detection algorithms are another. It is computationally intensive just to compute a sample covariance from a data cube. Developing efficient algorithms that are appropriate for anomaly detection will continue to be a challenging task.Some common anomaly detectors for hyperspectral imagery are discussed in [1,2]. Of particular interest in this paper are hyperspectral anomaly detectors that are appropriate for implementation in real-time mode. Requirements for such detectors include a low input-output processing time, low computational time, and minimal number of user-specified parameters. Several types of anomaly detectors that have been considered for real-time implementation are based on linear mixing models [3-8], kernel functions [3, 9, 10], robust distance [11][12][13], angle [14,15], and statistical distance [1,[16][17][18].One type of anomaly detectors based on linear mixing models requires prior knowledge of the spectra of the end members. One of the difficulties with this type of anomaly detectors is that the s...

show abstract

“…It is well known that a few outliers in the data may arbitrarily distort the sample mean and the sample covariance matrix, therefore, the robust estimation of location and shape is a crucial problem in multivariate statistics. Several robust estimates have been proposed, see Gnanadesikan and Kettenring (1972), Maronna (1976), Stahel (1981), Donoho (1982), Rousseeuw (1985), Davies (1987), Rousseeuw and van Zomeren (1990), Tyler (1991Tyler ( , 1994, Hadi (1992), Cook, Hawkins, and Weisberg (1993), Rocke andWoodruff (1993, 1996), Atkinson (1994), Hawkins (1994), Maronna and Yohai (1995), Agulló (1996), Rousseeuw and van Driessen (1999), Becker and Gather (2001), Peña and Prieto (2001a), Juan and Prieto (2001), Hawkins and Olive (2002), and Maronna and Zamar (2002) and the references therein. For high-dimensional large datasets a useful way to avoid the curse of dimensionality in data mining applications is to search for outliers in univariate projections of the data.…”

Section: Introductionmentioning

confidence: 99%

Combining Random and Specific Directions for Outlier Detection and Robust Estimation in High-Dimensional Multivariate Data

Peña

Prieto²

2007

Journal of Computational and Graphical Statistics

Self Cite

View full text Add to dashboard Cite

A powerful procedure for outlier detection and robust estimation of shape and location with multivariate data in high dimension is proposed. The procedure searches for outliers in univariate projections on directions that are obtained both randomly, as in the Stahel-Donoho method, and by maximizing and minimizing the kurtosis coefficient of the projected data, as in the Peña and Prieto method. We propose modifications of both methods to improve their computational efficiency and combine them in a procedure which is affine equivariant, has a high breakdown point, is fast to compute and can be applied when the dimension is large. Its performance is illustrated with a Monte Carlo experiment and in a real dataset.

show abstract

Using Angles to Identify Concentrated Multivariate Outliers

Cited by 21 publications

References 13 publications

Multivariate Outlier Detection and Robust Covariance Matrix Estimation

Multivariate Outlier Detection and Robust Covariance Matrix Estimation

Maximized subspace model for hyperspectral anomaly detection

Combining Random and Specific Directions for Outlier Detection and Robust Estimation in High-Dimensional Multivariate Data

Contact Info

Product

Resources

About