Cronbach's alpha is a popular method to measure reliability, e.g. in quantifying the reliability of a score to summarize the information of several items in questionnaires. The alpha coefficient is known to be non-robust. We study the behavior of this coefficient in different settings to identify situations where Cronbach's alpha is extremely sensitive to violations of the classical model assumptions. Furthermore, we construct a robust version of Cronbach's alpha which is insensitive to a small proportion of data that belong to a different source. The idea is that the robust Cronbach's alpha reflects the reliability of the bulk of the data. For example, it should not be possible that some small amount of outliers makes a score look reliable if it is not.
The so-called pinball loss for estimating conditional quantiles is a
well-known tool in both statistics and machine learning. So far, however, only
little work has been done to quantify the efficiency of this tool for
nonparametric approaches. We fill this gap by establishing inequalities that
describe how close approximate pinball risk minimizers are to the corresponding
conditional quantile. These inequalities, which hold under mild assumptions on
the data-generating distribution, are then used to establish so-called variance
bounds, which recently turned out to play an important role in the statistical
analysis of (regularized) empirical risk minimization approaches. Finally, we
use both types of inequalities to establish an oracle inequality for support
vector machines that use the pinball loss. The resulting learning rates are
min--max optimal under some standard regularity assumptions on the conditional
quantile.Comment: Published in at http://dx.doi.org/10.3150/10-BEJ267 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
We investigate statistical properties for a broad class of modern kernel-based regression (KBR) methods. These kernel methods were developed during the last decade and are inspired by convex risk minimization in infinite-dimensional Hilbert spaces. One leading example is support vector regression. We first describe the relationship between the loss function L of the KBR method and the tail of the response variable. We then establish the L-risk consistency for KBR which gives the mathematical justification for the statement that these methods are able to "learn". Then we consider robustness properties of such kernel methods. In particular, our results allow us to choose the loss function and the kernel to obtain computationally tractable and consistent KBR methods that have bounded influence functions. Furthermore, bounds for the bias and for the sensitivity curve, which is a finite sample version of the influence function, are developed, and the relationship between KBR and classical M estimators is discussed.
Support Vector Machines (SVMs) are known to be consistent and robust for classification and regression if they are based on a Lipschitz continuous loss function and on a bounded kernel with a dense and separable reproducing kernel Hilbert space. These facts are even true in the regression context for unbounded output spaces, if the target function f is integrable with respect to the marginal distribution of the input variable X and if the output variable Y has a finite first absolute moment. The latter assumption clearly excludes distributions with heavy tails, e.g., several stable distributions or some extreme value distributions which occur in financial or insurance projects. The main point of this paper is that we can enlarge the applicability of SVMs even to heavy-tailed distributions, which violate this moment condition. Results on existence, uniqueness, representation, consistency, and statistical robustness are given.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.