This paper deals with the trace regression model where n entries or linear combinations of entries of an unknown m1×m2 matrix A0 corrupted by noise are observed. We propose a new nuclearnorm penalized estimator of A0 and establish a general sharp oracle inequality for this estimator for arbitrary values of n, m1, m2 under the condition of isometry in expectation. Then this method is applied to the matrix completion problem. In this case, the estimator admits a simple explicit form and we prove that it satisfies oracle inequalities with faster rates of convergence than in the previous works. They are valid, in particular, in the high-dimensional setting m1m2 ≫ n. We show that the obtained rates are optimal up to logarithmic factors in a minimax sense and also derive, for any fixed matrix A0, a nonminimax lower bound on the rate of convergence of our estimator, which coincides with the upper bound up to a constant factor. Finally, we show that our procedure provides an exact recovery of the rank of A0 with probability close to 1. We also discuss the statistical learning setting where there is no underlying model determined by A0 and the aim is to find the best trace regression model approximating the data. As a by-product, we show that, under the Restricted Eigenvalue condition, the usual vector Lasso estimator satisfies a sharp oracle inequality (i.e., an oracle inequality with leading constant 1).
We consider the problem of estimating a sparse linear regression vector β * under a gaussian noise model, for the purpose of both prediction and model selection. We assume that prior knowledge is available on the sparsity pattern, namely the set of variables is partitioned into prescribed groups, only few of which are relevant in the estimation process. This group sparsity assumption suggests us to consider the Group Lasso method as a means to estimate β * . We establish oracle inequalities for the prediction and ℓ 2 estimation errors of this estimator. These bounds hold under a restricted eigenvalue condition on the design matrix. Under a stronger coherence condition, we derive bounds for the estimation error for mixed (2, p)-norms with 1 ≤ p ≤ ∞. When p = ∞, this result implies that a threshold version of the Group Lasso estimator selects the sparsity pattern of β * with high probability. Next, we prove that the rate of convergence of our upper bounds is optimal in a minimax sense, up to a logarithmic factor, for all estimators over a class of group sparse vectors. Furthermore, we establish lower bounds for the prediction and ℓ 2 estimation errors of the usual Lasso estimator. Using this result, we demonstrate that the Group Lasso can achieve an improvement in the prediction and estimation properties as compared to the Lasso.An important application of our results is provided by the problem of estimating multiple regression equation simultaneously or multi-task learning. In this case, our results lead to refinements of the results in [22] and allow one to establish the quantitative advantage of the Group Lasso over the usual Lasso in the multi-task setting. Finally, within the same setting, we show how our results can be extended to more general noise distributions, of which we only require the fourth moment to be finite. To obtain this extension, we establish a new maximal moment inequality, which may be of independent interest. 1 The phrase "β * is sparse" means that most of the components of this vector are equal to zero. 1 problem is relevant range from multi-task learning [2, 23, 28] and conjoint analysis [14,20] to longitudinal data analysis [11] as well as the analysis of panel data [15,38], among others. We briefly review these different settings in the course of the paper. In particular, multi-task learning provides a main motivation for our study. In that setting each regression equation corresponds to a different learning task; in addition to the requirement that M ≫ n, we also allow for the number of tasks T to be much larger than n. Following [2] we assume that there are only few common important variables which are shared by the tasks. That is, we assume that the vectors β * 1 , . . . , β * T are not only sparse but also have their sparsity patterns included in the same set of small cardinality. This group sparsity assumption induces a relationship between the responses and, as we shall see, can be used to improve estimation. The model (1.2) can be reformulated as a single regression problem of th...
In this paper, we study the problem of high-dimensional approximately low-rank covariance matrix estimation with missing observations. We propose a simple procedure computationally tractable in highdimension and that does not require imputation of the missing data. We establish non-asymptotic sparsity oracle inequalities for the estimation of the covariance matrix with the Frobenius and spectral norms, valid for any setting of the sample size and the dimension of the observations. We further establish minimax lower bounds showing that our rates are minimax optimal up to a logarithmic factor.
We derive the l∞ convergence rate simultaneously for Lasso and Dantzig estimators in a high-dimensional linear regression model under a mutual coherence assumption on the Gram matrix of the design and two different assumptions on the noise: Gaussian noise and general noise with finite variance. Then we prove that simultaneously the thresholded Lasso and Dantzig estimators with a proper choice of the threshold enjoy a sign concentration property provided that the non-zero components of the target vector are not too small.
Let X, X 1 , . . . , Xn, . . . be i.i.d. centered Gaussian random variables in a separable Banach space E with covariance operator Σ :The sample covariance operatorΣ : E * → E is defined aŝThe goal of the paper is to obtain concentration inequalities and expectation bounds for the operator norm Σ − Σ of the deviation of the sample covariance operator from the true covariance operator. In particular, it is shown thatMoreover, it is proved that, under the assumption that r(Σ) ≤ n, for all t ≥ 1, with probability at least 1 − e −twhere M is either the median, or the expectation of Σ − Σ . On the other hand, under the assumption that r(Σ) ≥ n, for all t ≥ 1, with probability at least 1 − e −t Σ − Σ − M Σ r(Σ) n t n t n .
Let X, X 1 , . . . , Xn be i.i.d. Gaussian random variables with zero mean and covariance operator Σ = E(X ⊗ X) taking values in a separable Hilbert space H. Letbe the effective rank of Σ, tr(Σ) being the trace of Σ and Σ ∞ being its operator norm. LetΣbe the sample (empirical) covariance operator based on (X 1 , . . . , Xn). The paper deals with a problem of estimation of spectral projectors of the covariance operator Σ by their empirical counterparts, the spectral projectors ofΣn (empirical spectral projectors). The focus is on the problems where both the sample size n and the effective rank r(Σ) are large. This framework includes and generalizes well known high-dimensional spiked covariance models. Given a spectral projector Pr corresponding to an eigenvalue µr of covariance operator Σ and its empirical counterpartPr, we derive sharp concentration bounds for bilinear forms of empirical spectral projectorPr in terms of sample size n and effective dimension r(Σ). Building upon these concentration bounds, we prove the asymptotic normality of bilinear forms of random operatorsPr − EPr under the assumptions that n → ∞ and r(Σ) = o(n). In a special case of eigenvalues of multiplicity one, these results are rephrased as concentration bounds and asymptotic normality for linear forms of empirical eigenvectors. Other results include bounds on the bias EPr − Pr and a method of bias reduction as well as a discussion of possible applications to statistical inference in high-dimensional principal component analysis.
We consider the statistical deconvolution problem where one observes $n$ replications from the model $Y=X+\epsilon$, where $X$ is the unobserved random signal of interest and $\epsilon$ is an independent random error with distribution $\phi$. Under weak assumptions on the decay of the Fourier transform of $\phi,$ we derive upper bounds for the finite-sample sup-norm risk of wavelet deconvolution density estimators $f_n$ for the density $f$ of $X$, where $f:\mathbb{R}\to \mathbb{R}$ is assumed to be bounded. We then derive lower bounds for the minimax sup-norm risk over Besov balls in this estimation problem and show that wavelet deconvolution density estimators attain these bounds. We further show that linear estimators adapt to the unknown smoothness of $f$ if the Fourier transform of $\phi$ decays exponentially and that a corresponding result holds true for the hard thresholding wavelet estimator if $\phi$ decays polynomially. We also analyze the case where $f$ is a "supersmooth"/analytic density. We finally show how our results and recent techniques from Rademacher processes can be applied to construct global confidence bands for the density $f$.Comment: Published in at http://dx.doi.org/10.1214/10-AOS836 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
This paper considers the problem of estimation of a low-rank matrix when most of its entries are not observed and some of the observed entries are corrupted. The observations are noisy realizations of a sum of a low-rank matrix, which we wish to estimate, and a second matrix having a complementary sparse structure such as elementwise sparsity or columnwise sparsity. We analyze a class of estimators obtained as solutions of a constrained convex optimization problem combining the nuclear norm penalty and a convex relaxation penalty for the sparse constraint. Our assumptions allow for simultaneous presence of random and deterministic patterns in the sampling scheme. We establish rates of convergence for the low-rank component from partial and corrupted observations in the presence of noise and we show that these rates are minimax optimal up to logarithmic factors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.