To perform multiple regression, the least squares estimator is commonly used. However, this estimator is not robust to outliers. Therefore, robust methods such as S-estimation have been proposed. These estimators flag any observation with a large residual as an outlier and downweight it in the further procedure. However, a large residual may be caused by an outlier in only one single predictor variable, and downweighting the complete observation results in a loss of information. Therefore, we propose the shooting S-estimator, a regression estimator that is especially designed for situations where a large number of observations suffer from contamination in a small number of predictor variables. The shooting S-estimator combines the ideas of the coordinate descent algorithm with simple S-regression, which makes it robust against componentwise contamination, at the cost of failing the regression equivariance property.
Spearman's rank correlation is a robust alternative for the standard correlation coefficient. By using ranks instead of the actual values of the observations, the impact of outliers remains limited. In this paper, we study an estimator based on this rank correlation measure for estimating covariance matrices and their inverses. The resulting estimator is robust and consistent at the normal distribution. By applying the graphical lasso, the inverse covariance matrix estimator is positive definite if more variables than observations are available in the data set. Moreover, it will contain many zeros, and is therefore said to be sparse. Instead of Spearman's rank correlation, one can use the Quadrant correlation or Gaussian rank scores. A simulation study compares the different estimators. This type of estimator is particularly usefull for estimating (inverse) covariance matrices in high dimensions, when the data may contain several outliers in many cells of the data matrix. More traditional robust estimators are not well defined or computable in this setting. An important feature of the proposed estimators is their simplicity and easyness to compute using existing software.
The dependency structure of multivariate data can be analyzed using the covariance matrix Σ. In many fields the precision matrix Σ −1 is even more informative. As the sample covariance estimator is singular in high-dimensions, it cannot be used to obtain a precision matrix estimator. A popular highdimensional estimator is the graphical lasso, but it lacks robustness. We consider the high-dimensional independent contamination model. Here, even a small percentage of contaminated cells in the data matrix may lead to a high percentage of contaminated rows. Downweighting entire observations, which is done by traditional robust procedures, would then results in a loss of information. In this paper, we formally prove that replacing the sample covariance matrix in the graphical lasso with an elementwise robust covariance matrix leads to an elementwise robust, sparse precision matrix estimator computable in high-dimensions. Examples of such elementwise robust covariance estimators are given. The final precision matrix estimator is positive definite, has a high breakdown point under elementwise contamination and can be computed fast.
To perform regression analysis in high dimensions, lasso or ridge estimation are a common choice. However, it has been shown that these methods are not robust to outliers. Therefore, alternatives as penalized M-estimation or the sparse least trimmed squares (LTS) estimator have been proposed. The robustness of these regression methods can be measured with the influence function. It quantifies the effect of infinitesimal perturbations in the data.Furthermore it can be used to compute the asymptotic variance and the mean squared error. In this paper we compute the influence function, the asymptotic variance and the mean squared error for penalized M-estimators and the sparse LTS estimator. The asymptotic biasedness of the estimators make the calculations non standard. We show that only M-estimators with a loss function with a bounded derivative are robust against regression outliers.In particular, the lasso has an unbounded influence function.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.