Variable Selection Using Adaptive Nonlinear Interaction Structures in High Dimensions

Radchenko, Peter; James, Gareth

doi:10.1198/jasa.2010.tm10130

Cited by 104 publications

(127 citation statements)

References 28 publications

Supporting

Mentioning

124

Contrasting

Order By: Relevance

“…The CAP can also be seen as a part of the composite penalties (5), so we added the CAP to Table 1. Choi et al (2010), Radchenko and James (2010), Bach et al (2012) and Bien et al (2013) also considered methods for hierarchical selection. This paper focuses on the setting where there are grouped structures without any overlap.…”

Section: Penalties For Other Settingsmentioning

confidence: 99%

Sparse Regularization for Bi-Level Variable Selection

Matsui

2015

Journal of the Japanese Society of Computational Statistics

View full text Add to dashboard Cite

Sparse regularization provides solutions in which some parameters are exactly zero and therefore they can be used for selecting variables in regression models and so on. The lasso is proposed as a method for selecting individual variables for regression models. On the other hand, the group lasso selects groups of variables rather than individuals and therefore it has been used in various fields of applications. More recently, penalties that select variables at both the group and individual levels has been considered. They are so called bi-level selection. In this paper we focus on some penalties that aim for bi-level selection. We overview these penalties and estimation algorithms, and then compare the effectiveness of these penalties from the viewpoint of accuracy of prediction and selection of variables and groups through simulation studies.

show abstract

Section: Penalties For Other Settingsmentioning

confidence: 99%

Sparse Regularization for Bi-Level Variable Selection

Matsui

2015

Journal of the Japanese Society of Computational Statistics

View full text Add to dashboard Cite

show abstract

“…) and h (j) is the vector of the bandwidths except h j , with a(x, K , h (j) ) and b j (x, K ) defined in (13). This assumption has a similar role as the assumption (4) in [19].…”

Section: Assumption (B)mentioning

confidence: 92%

“…1(a). Now consider the bias functional in (13). As a function of h j , its behavior is depicted in Fig.…”

Section: The Optimal Bandwidth Matrixmentioning

confidence: 99%

“…The derivations are more difficult, above all, in order to derive the bias of our estimators. In literature, there are papers as [13,14], which can allow correlated covariates for nonparametric models in order to make variable selection. However, they use an additive way to deal with the general nonparametric unknown function and the methods are based on penalized approach with regularization parameters to estimate.…”

Section: Extension To Non-uniform Designsmentioning

confidence: 99%

See 1 more Smart Citation

Bias-corrected inference for multivariate nonparametric regression: Model selection and oracle property

Giordano

Parrella

2016

Journal of Multivariate Analysis

View full text Add to dashboard Cite

The local polynomial estimator is particularly affected by the curse of dimensionality, which reduces the potential of this tool for large-dimensional applications. We propose an estimation procedure based on the local linear estimator and a sparseness condition that focuses on nonlinearities in the model. Our procedure, called BID (bias inflation--deflation), is automatic and easily applicable to models with many covariates without requiring any additivity assumption. It is an extension of the RODEO method, and introduces important new contributions: consistent estimation of the multivariate optimal bandwidth (the tuning parameter of the estimator); consistent estimation of the multivariate bias-corrected regression function and confidence bands; and automatic identification and separation of nonlinear and linear effects. Some theoretical properties of the method are discussed. In particular, we show the nonparametric oracle property. For linear models, BID automatically reaches the optimal rate $O_p(n^{-1/2})$, equivalent to the parametric case. A simulation study shows the performance of the procedure for finite samples

show abstract

“…However, the all-pairs Lasso estimator does not account for any structural information which has been shown to be important for prediction and interpretation of the high dimensional interaction regression model [2,30,25,29,6]. In statistics, a hierarchical structure between main effects and interaction effects has been shown to be very effective in constraining the search space and identifying important individual features and interactions [2,30,25,29,6]. Specifically, the hierarchical constraint requires that an interaction term xixj is selected in the model only if the main effects xi and/or xj are included.…”

Section: Introductionmentioning

confidence: 99%

An efficient algorithm for weak hierarchical lasso

Liu

Wang

2014

Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

Linear regression is a widely used tool in data mining and machine learning. In many applications, fitting a regression model with only linear effects may not be sufficient for predictive or explanatory purposes. One strategy which has recently received increasing attention in statistics is to include feature interactions to capture the nonlinearity in the regression model. Such model has been applied successfully in many biomedical applications. One major challenge in the use of such model is that the data dimensionality is significantly higher than the original data, resulting in the small sample size large dimension problem. Recently, weak hierarchical Lasso, a sparse interaction regression model, is proposed that produces sparse and hierarchical structured estimator by exploiting the Lasso penalty and a set of hierarchical constraints. However, the hierarchical constraints make it a non-convex problem and the existing method finds the solution of its convex relaxation, which needs additional conditions to guarantee the hierarchical structure. In this paper, we propose to directly solve the non-convex weak hierarchical Lasso by making use of the GIST (General Iterative Shrinkage and Thresholding) optimization framework which has been shown to be efficient for solving non-convex sparse formulations. The key step in GIST is to compute a sequence of proximal operators. One of our key technical contributions is to show that the proximal operator associated with the non-convex weak hierarchical Lasso admits a closed form solution. However, a naive approach for solving each subproblem of the proximal operator leads to a quadratic time complexity, which is not desirable for largesize problems. To this end, we further develop an efficient algorithm for computing the subproblems with a linearithmic time complexity. We have conducted extensive experiments on both synthetic and real data sets. Results show that our proposed algorithm is much more efficient and effective than its convex relaxation.

show abstract

Variable Selection Using Adaptive Nonlinear Interaction Structures in High Dimensions

Cited by 104 publications

References 28 publications

Sparse Regularization for Bi-Level Variable Selection

Sparse Regularization for Bi-Level Variable Selection

Bias-corrected inference for multivariate nonparametric regression: Model selection and oracle property

An efficient algorithm for weak hierarchical lasso

Contact Info

Product

Resources

About