Detecting Deviating Data Cells

Rousseeuw, Peter J.; Bossche, Wannes Van den

doi:10.1080/00401706.2017.1340909

Cited by 88 publications

(68 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We have shown that exact affine equivariance must be lost, but it is a reasonable price to be paid in order to achieve an arbitrarily high breakdown for the resulting trimmed estimators. This conclusion parallels similar findings in other situations where contamination produces only a minority of “good” observations, as in the case of cellwise contamination (see, e.g., Farcomeni, , ; Agostinelli, Leung, Yohai, & Zamar, ; Rousseeuw & Van den Bossche, ). We also support the use of adaptive trimming schemes, in order to explore the effect of different levels of trimming and to find a sensible trade‐off between robustness and efficiency.…”

Section: Discussionsupporting

confidence: 88%

Wild adaptive trimming for robust estimation and cluster analysis

Cerioli

Farcomeni

Riani

2018

Scandinavian J Statistics

View full text Add to dashboard Cite

Trimming principles play an important role in robust statistics. However, their use for clustering typically requires some preliminary information about the contamination rate and the number of groups. We suggest a fresh approach to trimming that does not rely on this knowledge and that proves to be particularly suited for solving problems in robust cluster analysis. Our approach replaces the original K‐population (robust) estimation problem with K distinct one‐population steps, which take advantage of the good breakdown properties of trimmed estimators when the trimming level exceeds the usual bound of 0.5. In this setting, we prove that exact affine equivariance is lost on one hand but, on the other hand, an arbitrarily high breakdown point can be achieved by “anchoring” the robust estimator. We also support the use of adaptive trimming schemes, in order to infer the contamination rate from the data. A further bonus of our methodology is its ability to provide a reliable choice of the usually unknown number of groups.

show abstract

Section: Discussionsupporting

confidence: 88%

Wild adaptive trimming for robust estimation and cluster analysis

Cerioli

Farcomeni

Riani

2018

Scandinavian J Statistics

View full text Add to dashboard Cite

show abstract

“…Detecting cellwise outliers is a hard problem, since the outlyingness of a cell depends on the relation of its column to the other columns of the data, and on the values of the other cells in its row (some of which may be outlying themselves). The DetectDeviatingCells algorithm addresses these issues, and apart from flagging cells it also provides a graphical output called a cellmap.…”

Section: Detecting Outlying Cellsmentioning

confidence: 99%

Anomaly detection by robust statistics

Rousseeuw

Hubert

2017

WIREs Data Min & Knowl

Self Cite

167

View full text Add to dashboard Cite

Real data often contain anomalous cases, also known as outliers. These may spoil the resulting analysis but they may also contain valuable information. In either case, the ability to detect such anomalies is essential. A useful tool for this purpose is robust statistics, which aims to detect the outliers by first fitting the majority of the data and then flagging data points that deviate from it. We present an overview of several robust methods and the resulting graphical outlier detection tools. We discuss robust procedures for univariate, low-dimensional, and high-dimensional data, such as estimating location and scatter, linear regression, principal component analysis, classification, clustering, and functional data analysis. Also the challenging new topic of cellwise outliers is introduced.

show abstract

“…A not-so-large contaminated cell that passes the univariate filter could be flagged when viewed together with other correlated components, especially for highly correlated data. To overcome this deficiency, we introduce a consistent bivariate filter and use it in combination with UF and a new filter developed by Rousseeuw and Van den Bossche (2016) in the first step of the two-step procedure. Maronna (2015) made a remark that UF-GSE, which uses a fixed loss function ρ in the second step, cannot handle well high-dimensional casewise outliers.…”

Section: Introductionmentioning

confidence: 99%

Multivariate location and scatter matrix estimation under cellwise and casewise contamination

Leung

Yohai

Zamar

2017

Computational Statistics & Data Analysis

View full text Add to dashboard Cite

We consider the problem of multivariate location and scatter matrix estimation when the data contain cellwise and casewise outliers. Agostinelli et al. (2015b) propose a two-step approach to deal with this problem: first apply a univariate filter to remove cellwise outliers and second apply a generalized S-estimator to downweight casewise outliers. We improve this proposal in three main directions. First, we introduce a consistent bivariate filter to be used in combination with the univariate filter in the first step. Second, we propose a new fast subsampling procedure to generate starting points for the generalized S-estimator in the second step. Third, we consider a non-monotonic weight function for the generalized S-estimator to better deal with casewise outliers in high dimension. A simulation study and real data example show that, unlike the original two-step procedure, the modified two-step approach performs and scales well for high dimension. Moreover, the modified procedure outperforms the original one and other state of the art robust procedures under cellwise and casewise data contamination. arXiv:1609.00402v2 [math.ST] Abstract This supplementary material contains all the proofs, additional simulation results, and related supplementary material referenced in the article "Robust Estimation of Multivariate Location and Scatter in the Presence of Cellwise and Casewise Contamination".

show abstract

Detecting Deviating Data Cells

Cited by 88 publications

References 25 publications

Wild adaptive trimming for robust estimation and cluster analysis

Wild adaptive trimming for robust estimation and cluster analysis

Anomaly detection by robust statistics

Multivariate location and scatter matrix estimation under cellwise and casewise contamination

Contact Info

Product

Resources

About