2013
DOI: 10.1080/10618600.2012.681213
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Feature Selection via Weighted Kernels and Regularization

Abstract: Selecting important features in nonlinear kernel spaces is a difficult challenge in both classification and regression problems. This article proposes to achieve feature selection by optimizing a simple criterion: a feature-regularized loss function. Features within the kernel are weighted, and a lasso penalty is placed on these weights to encourage sparsity. This feature-regularized loss function is minimized by estimating the weights in conjunction with the coefficients of the original classification or regr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
67
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 56 publications
(69 citation statements)
references
References 25 publications
2
67
0
Order By: Relevance
“…SPAM (sparse additive models) is similar to COSSO in that it truncates complexity, but it allows p ≫ n (Ravikumar et al, 2009). Kernel iterative feature extraction (KNIFE) by Allen (2013) imposes L 1 -regularization on L 2 -penalized splines.…”
Section: Methods Comparison and Numerical Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…SPAM (sparse additive models) is similar to COSSO in that it truncates complexity, but it allows p ≫ n (Ravikumar et al, 2009). Kernel iterative feature extraction (KNIFE) by Allen (2013) imposes L 1 -regularization on L 2 -penalized splines.…”
Section: Methods Comparison and Numerical Resultsmentioning
confidence: 99%
“…We use the default GCV criterion for MARS. For KNIFE, we fix λ 1 = 1 and use a radial kernel with γ = 1/ p as suggested in Allen (2013). The weight power for the ACOSSO is fixed at γ = 2, as suggested by Storlie et al (2011).…”
Section: Methods Comparison and Numerical Resultsmentioning
confidence: 99%
“…Note that this is not the case in the classical kernel-based approaches where the prediction is the only goal but not features selection. Previously demonstrated methods for feature selection using kernel machines [52] lack a probabilistic model required by our approach. Further extension of our model to those cases is possible but beyond the scope of this paper.…”
Section: Discussionmentioning
confidence: 99%
“…To achieve that goal, one may consider more flexible forms of kernel functions to allow the method to automatically remove variables. Existing literature in nonlinear variable selection, for example, the COSSO (Lin and Zhang, 2007) and KNIFE (Allen, 2013), can be useful here. In that case, the corresponding computational algorithm can be much more challenging.…”
Section: Discussionmentioning
confidence: 99%