Sara van de Geer scite author profile

convergence to the limit is not uniform. Furthermore, bootstrap and even subsampling techniques are plagued by noncontinuity of limiting distributions. Nevertheless, in the low-dimensional setting, a modified bootstrap scheme has been proposed; [13] and [14] have recently proposed a residual based bootstrap scheme. They provide consistency guarantees for the highdimensional setting; we consider this method in an empirical analysis in Section 4.Some approaches for quantifying uncertainty include the following. The work in [50] implicitly contains the idea of sample splitting and corresponding construction of p-values and confidence intervals, and the procedure has been improved by using multiple sample splitting and aggregation of dependent p-values from multiple sample splits [32]. Stability selection [31] and its modification [41] provides another route to estimate error measures for false positive selections in general high-dimensional settings. An alternative method for obtaining confidence sets is in the recent work [29]. From another and mainly theoretical perspective, the work in [24] presents necessary and sufficient conditions for recovery with the lassoβ in terms of β − β 0 ∞ , where β 0 denotes the true parameter: bounds on the latter, which hold with probability at least say 1 − α, could be used in principle to construct (very) conservative confidence regions. At a theoretical level, the paper [35] derives confidence intervals in ℓ 2 for the case of two possible sparsity levels. Other recent work is discussed in Section 1.1 below.We propose here a method which enjoys optimality properties when making assumptions on the sparsity and design matrix of the model. For a linear model, the procedure is as the one in [52] and closely related to the method in [23]. It is based on the lasso and is "inverting" the corresponding KKT conditions. This yields a nonsparse estimator which has a Gaussian (limiting) distribution. We show, within a sparse linear model setting, that the estimator is optimal in the sense that it reaches the semiparametric efficiency bound. The procedure can be used and is analyzed for high-dimensional sparse linear and generalized linear models and for regression problems with general convex (robust) loss functions.1.1. Related work. Our work is closest to [52] who proposed the semiparametric approach for distributional inference in a high-dimensional linear model. We take here a slightly different view-point, namely by inverting the KKT conditions from the lasso, while relaxed projections are used in [52]. Furthermore, our paper extends the results in [52] by: (i) treating generalized linear models and general convex loss functions; (ii) for linear models, we give conditions under which the procedure achieves the semiparametric efficiency bound and our analysis allows for rather general Gaussian, sub-Gaussian and bounded design. A related approach as in [52] was proposed CONFIDENCE REGIONS FOR HIGH-DIMENSIONAL MODELS 3 in [8] based on ridge regression which is clearly suboptimal and ineffi...

show abstract

The Group Lasso for Logistic Regression

Meier

2008

View full text Add to dashboard Cite

The group lasso is an extension of the lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations. We extend the group lasso to logistic regression models and present an efficient algorithm, that is especially suitable for high dimensional problems, which can also be applied to generalized linear models to solve the corresponding convex optimization problem. The group lasso estimator for logistic regression is shown to be statistically consistent even if the number of predictors is much larger than sample size but with sparse true underlying structure. We further use a two-stage procedure which aims for sparser models than the group lasso, leading to improved prediction performance for some cases. Moreover, owing to the two-stage nature, the estimates can be constructed to be hierarchical. The methods are used on simulated and real data sets about splice site detection in DNA sequences. Copyright 2008 Royal Statistical Society.

show abstract

Statistics for High-Dimensional Data

Bühlmann¹,

Geer²

2011

1,406

716

View full text Add to dashboard Cite

High-dimensional additive modeling

Meier¹,

Geer²,

Bühlmann³

2009

Ann. Statist.

350

411

View full text Add to dashboard Cite

We propose a new sparsity-smoothness penalty for high-dimensional generalized additive models. The combination of sparsity and smoothness is crucial for mathematical theory as well as performance for finite-sample data. We present a computationally efficient algorithm, with provable numerical convergence properties, for optimizing the penalized likelihood. Furthermore, we provide oracle results which yield asymptotic optimality of our estimator for high dimensional but sparse additive models. Finally, an adaptive version of our sparsity-smoothness penalized approach yields large additional performance gains.Comment: Published in at http://dx.doi.org/10.1214/09-AOS692 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

show abstract

Oracle inequalities and optimal inference under group sparsity

Lounici¹,

Pontil²,

Geer³

et al. 2011

Ann. Statist.

294

367

View full text Add to dashboard Cite

We consider the problem of estimating a sparse linear regression vector β * under a gaussian noise model, for the purpose of both prediction and model selection. We assume that prior knowledge is available on the sparsity pattern, namely the set of variables is partitioned into prescribed groups, only few of which are relevant in the estimation process. This group sparsity assumption suggests us to consider the Group Lasso method as a means to estimate β * . We establish oracle inequalities for the prediction and ℓ 2 estimation errors of this estimator. These bounds hold under a restricted eigenvalue condition on the design matrix. Under a stronger coherence condition, we derive bounds for the estimation error for mixed (2, p)-norms with 1 ≤ p ≤ ∞. When p = ∞, this result implies that a threshold version of the Group Lasso estimator selects the sparsity pattern of β * with high probability. Next, we prove that the rate of convergence of our upper bounds is optimal in a minimax sense, up to a logarithmic factor, for all estimators over a class of group sparse vectors. Furthermore, we establish lower bounds for the prediction and ℓ 2 estimation errors of the usual Lasso estimator. Using this result, we demonstrate that the Group Lasso can achieve an improvement in the prediction and estimation properties as compared to the Lasso.An important application of our results is provided by the problem of estimating multiple regression equation simultaneously or multi-task learning. In this case, our results lead to refinements of the results in [22] and allow one to establish the quantitative advantage of the Group Lasso over the usual Lasso in the multi-task setting. Finally, within the same setting, we show how our results can be extended to more general noise distributions, of which we only require the fourth moment to be finite. To obtain this extension, we establish a new maximal moment inequality, which may be of independent interest. 1 The phrase "β * is sparse" means that most of the components of this vector are equal to zero. 1 problem is relevant range from multi-task learning [2, 23, 28] and conjoint analysis [14,20] to longitudinal data analysis [11] as well as the analysis of panel data [15,38], among others. We briefly review these different settings in the course of the paper. In particular, multi-task learning provides a main motivation for our study. In that setting each regression equation corresponds to a different learning task; in addition to the requirement that M ≫ n, we also allow for the number of tasks T to be much larger than n. Following [2] we assume that there are only few common important variables which are shared by the tasks. That is, we assume that the vectors β * 1 , . . . , β * T are not only sparse but also have their sparsity patterns included in the same set of small cardinality. This group sparsity assumption induces a relationship between the responses and, as we shall see, can be used to improve estimation. The model (1.2) can be reformulated as a single regression problem of th...

show abstract

Locally adaptive regression splines

Mammen¹,

Geer²

1997

Ann. Statist.

268

354

View full text Add to dashboard Cite

Leastsquares penalized regression estimates with total variation penalties are considered. It is shown that these estimators are least squares splines with locally data adaptive placed knot points. The definition of these variable knot splines as minimizers of global functionals can be used to study their asymptotic properties. In particular, these results imply that the estimates adapt well to spatially inhomogeneous smoothness. We show rates of convergence in bounded variation function classes and discuss pointwise limiting distributions. An iterative algorithm based on stepwise addition and deletion of knot points is proposed and its consistency proved.

show abstract

ℓ1-penalization for mixture regression models

Städler

Bühlmann

Geer

2010

TEST

218

326

View full text Add to dashboard Cite

We consider a finite mixture of regressions (FMR) model for high-dimensional inhomogeneous data where the number of covariates may be much larger than sample size. We propose an ℓ 1 -penalized maximum likelihood estimator in an appropriate parameterization. This kind of estimation belongs to a class of problems where optimization and theory for non-convex functions is needed. This distinguishes itself very clearly from high-dimensional estimation with convex loss-or objective functions, as for example with the Lasso in linear or generalized linear models. Mixture models represent a prime and important example where non-convexity arises.For FMR models, we develop an efficient EM algorithm for numerical optimization with provable convergence properties. Our penalized estimator is numerically better posed (e.g., boundedness of the criterion function) than unpenalized maximum likelihood estimation, and it allows for effective statistical regularization including variable selection. We also present some asymptotic theory and oracle inequalities: due to non-convexity of the negative log-likelihood function, different mathematical arguments are needed than for problems with convex losses. Finally, we apply the new method to both simulated and real data.

show abstract

Estimation for High‐Dimensional Linear Mixed‐Effects Models Using ℓ₁‐Penalization

Schelldorfer

Bühlmann

Geer

2011

Scandinavian J Statistics

160

237

View full text Add to dashboard Cite

We propose an ℓ1-penalized estimation procedure for high-dimensional linear mixed-effects models. The models are useful whenever there is a grouping structure among high-dimensional observations, i.e. for clustered data. We prove a consistency and an oracle optimality result and we develop an algorithm with provable numerical convergence. Furthermore, we demonstrate the performance of the method on simulated and a real high-dimensional data set.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sara van de Geer

On asymptotically optimal confidence regions and tests for high-dimensional models

The Group Lasso for Logistic Regression

Statistics for High-Dimensional Data

High-dimensional additive modeling

Oracle inequalities and optimal inference under group sparsity

Locally adaptive regression splines

ℓ1-penalization for mixture regression models

Estimation for High‐Dimensional Linear Mixed‐Effects Models Using ℓ₁‐Penalization

Contact Info

Product

Resources

About

Sara van de Geer

On asymptotically optimal confidence regions and tests for high-dimensional models

The Group Lasso for Logistic Regression

Statistics for High-Dimensional Data

High-dimensional additive modeling

Oracle inequalities and optimal inference under group sparsity

Locally adaptive regression splines

ℓ1-penalization for mixture regression models

Estimation for High‐Dimensional Linear Mixed‐Effects Models Using ℓ1‐Penalization

Contact Info

Product

Resources

About

Estimation for High‐Dimensional Linear Mixed‐Effects Models Using ℓ₁‐Penalization