Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms

Hazimeh, Hussein; Mazumder, Rahul

doi:10.1287/opre.2019.1919

Cited by 104 publications

(145 citation statements)

References 57 publications

Supporting

Mentioning

128

Contrasting

Order By: Relevance

“…Our Automated approach consistently produces sparser models with interpretable coefficients when compared with the current industrial approach, improving prediction accuracy, especially on short-term horizons. As such, for the datasets considered, our procedure does not suffer from the problem of overfitting as observed for best subset methods by some works, for example, when the signalto-noise ratio is low, see, for example, Hastie and Tibshirani (2017), Mazumder et al (2017) and Hazimeh and Mazumder (2018). In other contexts, this may be the case; we leave this as an area for future work.…”

Section: Telecommunications Data Studymentioning

confidence: 82%

Semi-automated simultaneous predictor selection for regression-SARIMA models

et al. 2020

View full text Add to dashboard Cite

Deciding which predictors to use plays an integral role in deriving statistical models in a wide range of applications. Motivated by the challenges of predicting events across a telecommunications network, we propose a semi-automated, joint model-fitting and predictor selection procedure for linear regression models. Our approach can model and account for serial correlation in the regression residuals, produces sparse and interpretable models and can be used to jointly select models for a group of related responses. This is achieved through fitting linear models under constraints on the number of nonzero coefficients using a generalisation of a recently developed mixed integer quadratic optimisation approach. The resultant models from our approach achieve better predictive performance on the motivating telecommunications data than methods currently used by industry.

show abstract

Section: Telecommunications Data Studymentioning

confidence: 82%

Semi-automated simultaneous predictor selection for regression-SARIMA models

et al. 2020

View full text Add to dashboard Cite

show abstract

“…To this end, rlasso 3 shows robust performance and sparsity. We can expect more sparsity with a modest increase in computational time 12 by performing regularized 12 Importantly, Hazimeh et al [16] report that the faster 'Algorithm 1' achieved notable speedups of 25-300% over glmnet and ncvreg for very large instances and performed comparatively well on real problems. We did not test this algorithm because we wanted to maximize the regression performance metrics and, in [16], the more intensive CDPSI algorithm performed substantially better than Algorithm 1 on synthetic data.…”

Section: Discussionmentioning

confidence: 99%

“…L0Learn [16] (unregularized and l 1 -regularized) SparseNet [21] Lasso/ENet [33] MCP [31] SCAD [11] Best Subset CIO (CIO) [2] (cardinality-constrained, l 2 -regularized) Boolean Relaxation of Best Subset (SS) [24] (cardinality-constrained, l 2 -regularized) p 10, 100, 10 3 n > p and p > n 2 × 10 4 , 10 4 , 2 × 10 3 p > n n 50 (p = 10 3 ) 100 (p = {10, 10 3 }) 500 (p = 10 2 ) 500-thousands k true 5 (p = {10, 10 2 , 10 3 }) 10 (p = 10 3 )…”

Section: Estimatorsmentioning

confidence: 99%

“…Bertsimas et al argue that the superior performance of CIO and SS "speaks in favor of formulations that explicitly constrain the number of features" instead of inducing sparsity via regularization. To test this claim briefly, we compare these cardinality-constrained methods with the highlyscalable L0Learn [16]. L0Learn approximately solves the objective-penalized l 0 -regularization problem with an optional l 1 or l 2 penalty (indicated as L0L1Learn/L0L2Learn).…”

Section: The Best Subset Selector: Performance and Practicalitymentioning

confidence: 99%

See 1 more Smart Citation

A Discussion on Practical Considerations with Sparse Regression Methodologies

Sarwar¹,

Sauk²,

Sahinidis³

2020

Statist. Sci.

View full text Add to dashboard Cite

Sparse linear regression is a vast field and there are many different algorithms available to build models. Two new papers published in Statistical Science study the comparative performance of several sparse regression methodologies, including the lasso and subset selection. Comprehensive empirical analyses allow the researchers to demonstrate the relative merits of each estimator and provide guidance to practitioners. In this discussion, we summarize and compare the two studies and we examine points of agreement and divergence, aiming to provide clarity and value to users. The authors have started a highly constructive dialogue, our goal is to continue it.

show abstract

“…And there is reason to suspect that these may have competitive performance. For example, while best subset selection (a.k.a, the 0 penalty) is computationally intensive, as it requires fitting every possible model (but see Hazimeh & Mazumder, 2020), it is often considered the gold standard for model selection. This is because the penalty is applied directly to the number of parameters, that is, the 0 (pseudo) norm.…”

mentioning

confidence: 99%

Beyond Lasso: A Survey of Nonconvex Regularization in Gaussian Graphical Models

Dr¹

2020

Preprint

View full text Add to dashboard Cite

Studying complex relations in multivariate datasets is a common task in psychological science. Recently, the Gaussian graphical model has emerged as an increasingly popular model for characterizing the conditional dependence structure of random variables. Although the graphical lasso ($\ell_1$-regularization) is the most well-known estimator across the sciences, it has several drawbacks that make it less than ideal for model selection. There are now alternative forms of regularization that were developed specifically to overcome issues inherent to the $\ell_1$-penalty.To date, this information has not been synthesized. This paper provides a comprehensive survey of nonconvex regularization that spans from the smoothly clipped absolute deviation penalty to continuous approximations of the $\ell_0$-penalty (i.e., best subset) for directly estimating the inverse covariance matrix. A common thread shared by these penalties is that they all enjoy the oracle properties, that is, they perform as though the \emph{true} generating model were known in advance. To ensure their theoretical properties are general, I conducted extensive numerical experiments that indicated superior and more than competitive performance when compared to glasso and non-regularized model selection, respectively, all the while being computationally feasible for many variables. In addition, the important topics of tuning parameter selection and statistical inference in regularized models are reviewed.The penalties are employed to estimate the dependence structure of post-traumatic stress disorder symptoms. The discussion includes several ideas for future research, including a plethora of information to facilitate their study. I have implemented the methods in the

show abstract

Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms

Cited by 104 publications

References 57 publications

Semi-automated simultaneous predictor selection for regression-SARIMA models

Semi-automated simultaneous predictor selection for regression-SARIMA models

A Discussion on Practical Considerations with Sparse Regression Methodologies

Beyond Lasso: A Survey of Nonconvex Regularization in Gaussian Graphical Models

Contact Info

Product

Resources

About