This paper deals with the trace regression model where n entries or linear combinations of entries of an unknown m1×m2 matrix A0 corrupted by noise are observed. We propose a new nuclearnorm penalized estimator of A0 and establish a general sharp oracle inequality for this estimator for arbitrary values of n, m1, m2 under the condition of isometry in expectation. Then this method is applied to the matrix completion problem. In this case, the estimator admits a simple explicit form and we prove that it satisfies oracle inequalities with faster rates of convergence than in the previous works. They are valid, in particular, in the high-dimensional setting m1m2 ≫ n. We show that the obtained rates are optimal up to logarithmic factors in a minimax sense and also derive, for any fixed matrix A0, a nonminimax lower bound on the rate of convergence of our estimator, which coincides with the upper bound up to a constant factor. Finally, we show that our procedure provides an exact recovery of the rank of A0 with probability close to 1. We also discuss the statistical learning setting where there is no underlying model determined by A0 and the aim is to find the best trace regression model approximating the data. As a by-product, we show that, under the Restricted Eigenvalue condition, the usual vector Lasso estimator satisfies a sharp oracle inequality (i.e., an oracle inequality with leading constant 1).
We consider the problem of estimating a sparse linear regression vector β * under a gaussian noise model, for the purpose of both prediction and model selection. We assume that prior knowledge is available on the sparsity pattern, namely the set of variables is partitioned into prescribed groups, only few of which are relevant in the estimation process. This group sparsity assumption suggests us to consider the Group Lasso method as a means to estimate β * . We establish oracle inequalities for the prediction and ℓ 2 estimation errors of this estimator. These bounds hold under a restricted eigenvalue condition on the design matrix. Under a stronger coherence condition, we derive bounds for the estimation error for mixed (2, p)-norms with 1 ≤ p ≤ ∞. When p = ∞, this result implies that a threshold version of the Group Lasso estimator selects the sparsity pattern of β * with high probability. Next, we prove that the rate of convergence of our upper bounds is optimal in a minimax sense, up to a logarithmic factor, for all estimators over a class of group sparse vectors. Furthermore, we establish lower bounds for the prediction and ℓ 2 estimation errors of the usual Lasso estimator. Using this result, we demonstrate that the Group Lasso can achieve an improvement in the prediction and estimation properties as compared to the Lasso.An important application of our results is provided by the problem of estimating multiple regression equation simultaneously or multi-task learning. In this case, our results lead to refinements of the results in [22] and allow one to establish the quantitative advantage of the Group Lasso over the usual Lasso in the multi-task setting. Finally, within the same setting, we show how our results can be extended to more general noise distributions, of which we only require the fourth moment to be finite. To obtain this extension, we establish a new maximal moment inequality, which may be of independent interest. 1 The phrase "β * is sparse" means that most of the components of this vector are equal to zero. 1 problem is relevant range from multi-task learning [2, 23, 28] and conjoint analysis [14,20] to longitudinal data analysis [11] as well as the analysis of panel data [15,38], among others. We briefly review these different settings in the course of the paper. In particular, multi-task learning provides a main motivation for our study. In that setting each regression equation corresponds to a different learning task; in addition to the requirement that M ≫ n, we also allow for the number of tasks T to be much larger than n. Following [2] we assume that there are only few common important variables which are shared by the tasks. That is, we assume that the vectors β * 1 , . . . , β * T are not only sparse but also have their sparsity patterns included in the same set of small cardinality. This group sparsity assumption induces a relationship between the responses and, as we shall see, can be used to improve estimation. The model (1.2) can be reformulated as a single regression problem of th...
In this paper, we study the problem of high-dimensional approximately low-rank covariance matrix estimation with missing observations. We propose a simple procedure computationally tractable in highdimension and that does not require imputation of the missing data. We establish non-asymptotic sparsity oracle inequalities for the estimation of the covariance matrix with the Frobenius and spectral norms, valid for any setting of the sample size and the dimension of the observations. We further establish minimax lower bounds showing that our rates are minimax optimal up to a logarithmic factor.
We derive the l∞ convergence rate simultaneously for Lasso and Dantzig estimators in a high-dimensional linear regression model under a mutual coherence assumption on the Gram matrix of the design and two different assumptions on the noise: Gaussian noise and general noise with finite variance. Then we prove that simultaneously the thresholded Lasso and Dantzig estimators with a proper choice of the threshold enjoy a sign concentration property provided that the non-zero components of the target vector are not too small.
Let X, X 1 , . . . , Xn, . . . be i.i.d. centered Gaussian random variables in a separable Banach space E with covariance operator Σ :The sample covariance operatorΣ : E * → E is defined aŝThe goal of the paper is to obtain concentration inequalities and expectation bounds for the operator norm Σ − Σ of the deviation of the sample covariance operator from the true covariance operator. In particular, it is shown thatMoreover, it is proved that, under the assumption that r(Σ) ≤ n, for all t ≥ 1, with probability at least 1 − e −twhere M is either the median, or the expectation of Σ − Σ . On the other hand, under the assumption that r(Σ) ≥ n, for all t ≥ 1, with probability at least 1 − e −t Σ − Σ − M Σ r(Σ) n t n t n .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.