Suppose that we observe entries or, more generally, linear combinations of
entries of an unknown $m\times T$-matrix $A$ corrupted by noise. We are
particularly interested in the high-dimensional setting where the number $mT$
of unknown entries can be much larger than the sample size $N$. Motivated by
several applications, we consider estimation of matrix $A$ under the assumption
that it has small rank. This can be viewed as dimension reduction or sparsity
assumption. In order to shrink toward a low-rank representation, we investigate
penalized least squares estimators with a Schatten-$p$ quasi-norm penalty term,
$p\leq1$. We study these estimators under two possible assumptions---a modified
version of the restricted isometry condition and a uniform bound on the ratio
"empirical norm induced by the sampling operator/Frobenius norm." The main
results are stated as nonasymptotic upper bounds on the prediction risk and on
the Schatten-$q$ risk of the estimators, where $q\in[p,2]$. The rates that we
obtain for the prediction risk are of the form $rm/N$ (for $m=T$), up to
logarithmic factors, where $r$ is the rank of $A$. The particular examples of
multi-task learning and matrix completion are worked out in detail. The proofs
are based on tools from the theory of empirical processes. As a by-product, we
derive bounds for the $k$th entropy numbers of the quasi-convex Schatten class
embeddings $S_p^M\hookrightarrow S_2^M$, $p<1$, which are of independent
interest.Comment: Published in at http://dx.doi.org/10.1214/10-AOS860 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A scheme for locally adaptive bandwidth selection is proposed which sensitively shrinks the bandwidth of a kernel estimator at lowest density regions such as the support boundary which are unknown to the statistician. In case of a Hölder continuous density, this locally minimax-optimal bandwidth is shown to be smaller than the usual rate, even in case of homogeneous smoothness. Some new type of risk bound with respect to a density-dependent standardized loss of this estimator is established. This bound is fully nonasymptotic and allows to deduce convergence rates at lowest density regions that can be substantially faster than n −1/2 . It is complemented by a weighted minimax lower bound which splits into two regimes depending on the value of the density. The new estimator adapts into the second regime, and it is shown that simultaneous adaptation into the fastest regime is not possible in principle as long as the Hölder exponent is unknown. Consequences on plug-in rules for support recovery are worked out in detail. In contrast to those with classical density estimators, the plug-in rules based on the new construction are minimax-optimal, up to some logarithmic factor.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.