convergence to the limit is not uniform. Furthermore, bootstrap and even subsampling techniques are plagued by noncontinuity of limiting distributions. Nevertheless, in the low-dimensional setting, a modified bootstrap scheme has been proposed; [13] and [14] have recently proposed a residual based bootstrap scheme. They provide consistency guarantees for the highdimensional setting; we consider this method in an empirical analysis in Section 4.Some approaches for quantifying uncertainty include the following. The work in [50] implicitly contains the idea of sample splitting and corresponding construction of p-values and confidence intervals, and the procedure has been improved by using multiple sample splitting and aggregation of dependent p-values from multiple sample splits [32]. Stability selection [31] and its modification [41] provides another route to estimate error measures for false positive selections in general high-dimensional settings. An alternative method for obtaining confidence sets is in the recent work [29]. From another and mainly theoretical perspective, the work in [24] presents necessary and sufficient conditions for recovery with the lassoβ in terms of β − β 0 ∞ , where β 0 denotes the true parameter: bounds on the latter, which hold with probability at least say 1 − α, could be used in principle to construct (very) conservative confidence regions. At a theoretical level, the paper [35] derives confidence intervals in ℓ 2 for the case of two possible sparsity levels. Other recent work is discussed in Section 1.1 below.We propose here a method which enjoys optimality properties when making assumptions on the sparsity and design matrix of the model. For a linear model, the procedure is as the one in [52] and closely related to the method in [23]. It is based on the lasso and is "inverting" the corresponding KKT conditions. This yields a nonsparse estimator which has a Gaussian (limiting) distribution. We show, within a sparse linear model setting, that the estimator is optimal in the sense that it reaches the semiparametric efficiency bound. The procedure can be used and is analyzed for high-dimensional sparse linear and generalized linear models and for regression problems with general convex (robust) loss functions.1.1. Related work. Our work is closest to [52] who proposed the semiparametric approach for distributional inference in a high-dimensional linear model. We take here a slightly different view-point, namely by inverting the KKT conditions from the lasso, while relaxed projections are used in [52]. Furthermore, our paper extends the results in [52] by: (i) treating generalized linear models and general convex loss functions; (ii) for linear models, we give conditions under which the procedure achieves the semiparametric efficiency bound and our analysis allows for rather general Gaussian, sub-Gaussian and bounded design. A related approach as in [52] was proposed CONFIDENCE REGIONS FOR HIGH-DIMENSIONAL MODELS 3 in [8] based on ridge regression which is clearly suboptimal and ineffi...
The group lasso is an extension of the lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations. We extend the group lasso to logistic regression models and present an efficient algorithm, that is especially suitable for high dimensional problems, which can also be applied to generalized linear models to solve the corresponding convex optimization problem. The group lasso estimator for logistic regression is shown to be statistically consistent even if the number of predictors is much larger than sample size but with sparse true underlying structure. We further use a two-stage procedure which aims for sparser models than the group lasso, leading to improved prediction performance for some cases. Moreover, owing to the two-stage nature, the estimates can be constructed to be hierarchical. The methods are used on simulated and real data sets about splice site detection in DNA sequences. Copyright 2008 Royal Statistical Society.
We propose a new sparsity-smoothness penalty for high-dimensional generalized additive models. The combination of sparsity and smoothness is crucial for mathematical theory as well as performance for finite-sample data. We present a computationally efficient algorithm, with provable numerical convergence properties, for optimizing the penalized likelihood. Furthermore, we provide oracle results which yield asymptotic optimality of our estimator for high dimensional but sparse additive models. Finally, an adaptive version of our sparsity-smoothness penalized approach yields large additional performance gains.Comment: Published in at http://dx.doi.org/10.1214/09-AOS692 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
We consider the problem of estimating a sparse linear regression vector β * under a gaussian noise model, for the purpose of both prediction and model selection. We assume that prior knowledge is available on the sparsity pattern, namely the set of variables is partitioned into prescribed groups, only few of which are relevant in the estimation process. This group sparsity assumption suggests us to consider the Group Lasso method as a means to estimate β * . We establish oracle inequalities for the prediction and ℓ 2 estimation errors of this estimator. These bounds hold under a restricted eigenvalue condition on the design matrix. Under a stronger coherence condition, we derive bounds for the estimation error for mixed (2, p)-norms with 1 ≤ p ≤ ∞. When p = ∞, this result implies that a threshold version of the Group Lasso estimator selects the sparsity pattern of β * with high probability. Next, we prove that the rate of convergence of our upper bounds is optimal in a minimax sense, up to a logarithmic factor, for all estimators over a class of group sparse vectors. Furthermore, we establish lower bounds for the prediction and ℓ 2 estimation errors of the usual Lasso estimator. Using this result, we demonstrate that the Group Lasso can achieve an improvement in the prediction and estimation properties as compared to the Lasso.An important application of our results is provided by the problem of estimating multiple regression equation simultaneously or multi-task learning. In this case, our results lead to refinements of the results in [22] and allow one to establish the quantitative advantage of the Group Lasso over the usual Lasso in the multi-task setting. Finally, within the same setting, we show how our results can be extended to more general noise distributions, of which we only require the fourth moment to be finite. To obtain this extension, we establish a new maximal moment inequality, which may be of independent interest. 1 The phrase "β * is sparse" means that most of the components of this vector are equal to zero. 1 problem is relevant range from multi-task learning [2, 23, 28] and conjoint analysis [14,20] to longitudinal data analysis [11] as well as the analysis of panel data [15,38], among others. We briefly review these different settings in the course of the paper. In particular, multi-task learning provides a main motivation for our study. In that setting each regression equation corresponds to a different learning task; in addition to the requirement that M ≫ n, we also allow for the number of tasks T to be much larger than n. Following [2] we assume that there are only few common important variables which are shared by the tasks. That is, we assume that the vectors β * 1 , . . . , β * T are not only sparse but also have their sparsity patterns included in the same set of small cardinality. This group sparsity assumption induces a relationship between the responses and, as we shall see, can be used to improve estimation. The model (1.2) can be reformulated as a single regression problem of th...
Leastsquares penalized regression estimates with total variation penalties are considered. It is shown that these estimators are least squares splines with locally data adaptive placed knot points. The definition of these variable knot splines as minimizers of global functionals can be used to study their asymptotic properties. In particular, these results imply that the estimates adapt well to spatially inhomogeneous smoothness. We show rates of convergence in bounded variation function classes and discuss pointwise limiting distributions. An iterative algorithm based on stepwise addition and deletion of knot points is proposed and its consistency proved.
We consider a finite mixture of regressions (FMR) model for high-dimensional inhomogeneous data where the number of covariates may be much larger than sample size. We propose an ℓ 1 -penalized maximum likelihood estimator in an appropriate parameterization. This kind of estimation belongs to a class of problems where optimization and theory for non-convex functions is needed. This distinguishes itself very clearly from high-dimensional estimation with convex loss-or objective functions, as for example with the Lasso in linear or generalized linear models. Mixture models represent a prime and important example where non-convexity arises.For FMR models, we develop an efficient EM algorithm for numerical optimization with provable convergence properties. Our penalized estimator is numerically better posed (e.g., boundedness of the criterion function) than unpenalized maximum likelihood estimation, and it allows for effective statistical regularization including variable selection. We also present some asymptotic theory and oracle inequalities: due to non-convexity of the negative log-likelihood function, different mathematical arguments are needed than for problems with convex losses. Finally, we apply the new method to both simulated and real data.
We propose an ℓ1-penalized estimation procedure for high-dimensional linear mixed-effects models. The models are useful whenever there is a grouping structure among high-dimensional observations, i.e. for clustered data. We prove a consistency and an oracle optimality result and we develop an algorithm with provable numerical convergence. Furthermore, we demonstrate the performance of the method on simulated and a real high-dimensional data set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.