In statistical learning the excess risk of empirical risk minimization (ERM) is controlled bywhere n is a size of a learning sample, COMPn(F) is a complexity term associated with a given class F and α ∈ [ , 1] interpolates between slow and fast learning rates. In this paper we introduce an alternative localization approach for binary classification that leads to a novel complexity measure: fixed points of the local empirical entropy. We show that this complexity measure gives a tight control over COMPn(F) in the upper bounds under bounded noise. Our results are accompanied by a minimax lower bound that involves the same quantity. In particular, we practically answer the question of optimality of ERM under bounded noise for general VC classes.
We investigate the noise sensitivity of the top eigenvector of a Wigner matrix in the following sense. Let v be the top eigenvector of an N × N Wigner matrix. Suppose that k randomly chosen entries of the matrix are resampled, resulting in another realization of the Wigner matrix with top eigenvector v [k] . We prove that, with high probability, when k ≪ N 5/3−o(1) , then v and v [k] are almost collinear and when k ≫ N 5/3 , then v [k] is almost orthogonal to v.
This paper is devoted to uniform versions of the Hanson-Wright inequality for a random vector X ∈ R n with independent subgaussian components. The core technique of the paper is based on the entropy method combined with truncations of both gradients of functions of interest and of the components of X itself. Our results recover, in particular, the classic uniform bound of Talagrand (1996) for Rademacher chaoses and the more recent uniform result of Adamczak (2015) which holds under certain rather strong assumptions on the distribution of X. We provide several applications of our techniques: we establish a version of the standard Hanson-Wright inequality, which is tighter in some regimes. Extending our results we show a version of the dimension-free matrix Bernstein inequality that holds for random matrices with a subexponential spectral norm. We apply the derived inequality to the problem of covariance estimation with missing observations and prove an almost optimal high probability version of the recent result of Lounici (2014). Finally, we show a uniform Hanson-Wright-type inequality in the Ising model under Dobrushin's condition. A closely related question was posed by Marton (2003).
We study the problem of predicting as well as the best linear predictor in a bounded Euclidean ball with respect to the squared loss. When only boundedness of the data generating distribution is assumed, we establish that the least squares estimator constrained to a bounded Euclidean ball does not attain the classical O(d/n) excess risk rate, where d is the dimension of the covariates and n is the number of samples. In particular, we construct a bounded distribution such that the constrained least squares estimator incurs an excess risk of order Ω(d 3/2 /n) hence refuting a recent conjecture of Ohad Shamir [JMLR 2015]. In contrast, we observe that nonlinear predictors can achieve the optimal rate O(d/n) with no assumptions on the distribution of the covariates. We discuss additional distributional assumptions sufficient to guarantee an O(d/n) excess risk rate for the least squares estimator. Among them are certain moment equivalence assumptions often used in the robust statistics literature. While such assumptions are central in the analysis of unbounded and heavy-tailed settings, our work indicates that in some cases, they also rule out unfavorable bounded distributions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.