A unified approach to model selection and sparse recovery using regularized least squares

Lv, Jinchi; Fan, Yingying

doi:10.1214/09-aos683

Cited by 290 publications

(295 citation statements)

References 39 publications

Supporting

Mentioning

284

Contrasting

Unclassified

Order By: Relevance

“…Although we used the derivatives p ′ λ (t) and p ′′ λ (t) in the above proposition, the results continue to hold if we replace −p ′ λ (t) with the subdifferential of −p λ (t), and −p ′′ λ (t) with the local concavity of p λ (t) at point t, when the penalty function is nondifferentiable at t (Lv & Fan, 2009). The hard-thresholding penalty p H,λ (t) satisfies conditions of Proposition 1, with c 1 = 0.…”

Section: Main Results 3·1 Hard-thresholding Propertymentioning

confidence: 97%

“…The hard-thresholding penalty p H,λ (t) satisfies conditions of Proposition 1, with c 1 = 0. This class of penalty functions also includes, for example, the L 0 -penalty and the smooth integration of counting and absolute deviation penalty (Lv & Fan, 2009), with suitably chosen c 1 ∈ [0, 1) and tuning parameters.…”

Section: Main Results 3·1 Hard-thresholding Propertymentioning

confidence: 99%

“…With this strong result on sign consistency of β, we can derive tight bounds on the L ∞ -loss. By Theorem 1 of Lv & Fan (2009)…”

Section: Appendixmentioning

confidence: 94%

“…This work has been extended to higher dimensions in different contexts, and the key message is the same. See, for example, Lv & Fan (2009), Zhang (2010, and Fan & Lv (2011). In particular, the weak oracle property, a surrogate of the oracle property, was introduced in Lv & Fan (2009).…”

mentioning

confidence: 99%

“…See, for example, Lv & Fan (2009), Zhang (2010, and Fan & Lv (2011). In particular, the weak oracle property, a surrogate of the oracle property, was introduced in Lv & Fan (2009). When p > n, it is generally difficult to study the properties of the global optimizer for concave regularization methods.…”

mentioning

confidence: 99%

See 4 more Smart Citations

Asymptotic properties for combined L1 and concave regularization

Fan

2013

Biometrika

Self Cite

View full text Add to dashboard Cite

SUMMARYTwo important goals of high-dimensional modeling are prediction and variable selection. In this article, we consider regularization with combined L 1 and concave penalties, and study the sampling properties of the global optimum of the suggested method in ultra-high dimensional settings. The L 1 -penalty provides the minimum regularization needed for removing noise variables in order to achieve oracle prediction risk, while concave penalty imposes additional regularization to control model sparsity. In the linear model setting, we prove that the global optimum of our method enjoys the same oracle inequalities as the lasso estimator and admits an explicit bound on the false sign rate, which can be asymptotically vanishing. Moreover, we establish oracle risk inequalities for the method and the sampling properties of computable solutions. Numerical studies suggest that our method yields more stable estimates than using a concave penalty alone.Some key words: Concave penalty; Global optimum; Lasso penalty; Prediction and variable selection.1. INTRODUCTION Prediction and variable selection are two important goals in many contemporary large-scale problems. Many regularization methods in the context of penalized empirical risk minimization have been proposed to select important covariates. See, for example, Fan & Lv (2010) for a review of some recent developments in high-dimensional variable selection. Penalized empirical risk minimization has two components: empirical risk for a chosen loss function for prediction, and a penalty function on the magnitude of parameters for reducing model complexity. The loss function is often chosen to be convex. The inclusion of the regularization term helps prevent overfitting when the number of covariates p is comparable to or exceeds the number of observations n.Generally speaking, two classes of penalty functions have been proposed in the literature: convex ones and concave ones. When a convex penalty such as the lasso penalty (Tibshirani, 1996) is used, the resulting estimator is a well-defined global optimizer. For the properties of L 1 -regularization methods, see, for example, Chen et al. (1999), Efron et al. (2004), Zou (2006), Candès & Tao (2007, Rosset & Zhu (2007), and Bickel et al. (2009). In particular, Bickel et al. (2009 proved that using the L 1 -penalty leads to estimators satisfying the oracle inequalities under the prediction loss and L q -loss, with 1 ≤ q ≤ 2, in high-dimensional nonparametric regression models. An oracle inequality means that with an overwhelming probability, the loss of the regularized estimator is within a logarithmic factor, a power of log p, of that of the oracle estimator, with the power depending on the chosen estimation loss. Despite these nice properties,

show abstract

Section: Main Results 3·1 Hard-thresholding Propertymentioning

confidence: 97%

Section: Main Results 3·1 Hard-thresholding Propertymentioning

confidence: 99%

“…With this strong result on sign consistency of β, we can derive tight bounds on the L ∞ -loss. By Theorem 1 of Lv & Fan (2009)…”

Section: Appendixmentioning

confidence: 94%

mentioning

confidence: 99%

mentioning

confidence: 99%

See 3 more Smart Citations

Asymptotic properties for combined L1 and concave regularization

Fan

2013

Biometrika

Self Cite

View full text Add to dashboard Cite

show abstract

Model Selection in High‐Dimensional Regression

Zhang

2022

Wiley StatsRef: Statistics Reference Online

View full text Add to dashboard Cite

Model selection refers to the process of selecting the best model from a set of candidate models given data. In statistical inference and machine learning, model selection is critical to understanding the underlying data‐generating processes, interpreting the nature of the data, and making future predictions. Judicious model selection can help to improve model interpretability, robustness, and prediction accuracy. Traditional selection methods such as the best subset selection and forward selection are useful for low‐ or moderate‐dimensional data analysis, but they face challenges for high‐ or ultrahigh‐dimensional data. Over the past two decades, rapid advances have been made in model selection for high‐dimensional problems, and there are exciting breakthroughs in methodologies, theories, computational algorithms, and software. A variety of cutting‐edge methods are developed for model selection in linear models, additive models, interaction models, graphical models, and so on. This article provides a selective overview of these works and their main results.

show abstract