2020
DOI: 10.1287/opre.2019.1919
|View full text |Cite
|
Sign up to set email alerts
|

Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms

Abstract: In several scientific and industrial applications, it is desirable to build compact, interpretable learning models where the output depends on a small number of input features. Recent work has shown that such best-subset selection-type problems can be solved with modern mixed integer optimization solvers. Despite their promise, such solvers often come at a steep computational price when compared with open-source, efficient specialized solvers based on convex optimization and greedy heuristics. In “Fast Best-Su… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
128
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 104 publications
(145 citation statements)
references
References 57 publications
1
128
1
Order By: Relevance
“…Our Automated approach consistently produces sparser models with interpretable coefficients when compared with the current industrial approach, improving prediction accuracy, especially on short-term horizons. As such, for the datasets considered, our procedure does not suffer from the problem of overfitting as observed for best subset methods by some works, for example, when the signalto-noise ratio is low, see, for example, Hastie and Tibshirani (2017), Mazumder et al (2017) and Hazimeh and Mazumder (2018). In other contexts, this may be the case; we leave this as an area for future work.…”
Section: Telecommunications Data Studymentioning
confidence: 82%
“…Our Automated approach consistently produces sparser models with interpretable coefficients when compared with the current industrial approach, improving prediction accuracy, especially on short-term horizons. As such, for the datasets considered, our procedure does not suffer from the problem of overfitting as observed for best subset methods by some works, for example, when the signalto-noise ratio is low, see, for example, Hastie and Tibshirani (2017), Mazumder et al (2017) and Hazimeh and Mazumder (2018). In other contexts, this may be the case; we leave this as an area for future work.…”
Section: Telecommunications Data Studymentioning
confidence: 82%
“…To this end, rlasso 3 shows robust performance and sparsity. We can expect more sparsity with a modest increase in computational time 12 by performing regularized 12 Importantly, Hazimeh et al [16] report that the faster 'Algorithm 1' achieved notable speedups of 25-300% over glmnet and ncvreg for very large instances and performed comparatively well on real problems. We did not test this algorithm because we wanted to maximize the regression performance metrics and, in [16], the more intensive CDPSI algorithm performed substantially better than Algorithm 1 on synthetic data.…”
Section: Discussionmentioning
confidence: 99%
“…L0Learn [16] (unregularized and l 1 -regularized) SparseNet [21] Lasso/ENet [33] MCP [31] SCAD [11] Best Subset CIO (CIO) [2] (cardinality-constrained, l 2 -regularized) Boolean Relaxation of Best Subset (SS) [24] (cardinality-constrained, l 2 -regularized) p 10, 100, 10 3 n > p and p > n 2 × 10 4 , 10 4 , 2 × 10 3 p > n n 50 (p = 10 3 ) 100 (p = {10, 10 3 }) 500 (p = 10 2 ) 500-thousands k true 5 (p = {10, 10 2 , 10 3 }) 10 (p = 10 3 )…”
Section: Estimatorsmentioning
confidence: 99%
See 1 more Smart Citation
“…And there is reason to suspect that these may have competitive performance. For example, while best subset selection (a.k.a, the 0 penalty) is computationally intensive, as it requires fitting every possible model (but see Hazimeh & Mazumder, 2020), it is often considered the gold standard for model selection. This is because the penalty is applied directly to the number of parameters, that is, the 0 (pseudo) norm.…”
mentioning
confidence: 99%