2009
DOI: 10.1007/s10994-009-5126-6
|View full text |Cite
|
Sign up to set email alerts
|

Sparse kernel SVMs via cutting-plane training

Abstract: We explore an algorithm for training SVMs with Kernels that can represent the learned rule using arbitrary basis vectors, not just the support vectors (SVs) from the training set. This results in two benefits. First, the added flexibility makes it possible to find sparser solutions of good quality, substantially speeding-up prediction. Second, the improved sparsity can also make training of Kernel SVMs more efficient, especially for high-dimensional and sparse data (e.g. text classification). This has the pote… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
90
0

Year Published

2010
2010
2019
2019

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 108 publications
(90 citation statements)
references
References 6 publications
0
90
0
Order By: Relevance
“…Several researchers also explore how to train the primal form of (4) and the extended models fast. The existing algorithms can be broadly categorized into two categories: the cutting-plane methods [11,5,12,13,25], and subgradient methods [3,17]. For example, in [17], Shalev-Shwartz et al described and analyzed a simple and effective stochastic sub-gradient descent algorithm and prove that the number of iterations required to obtain a solution of accuracy is O(1/ ).…”
Section: Introductionmentioning
confidence: 99%
“…Several researchers also explore how to train the primal form of (4) and the extended models fast. The existing algorithms can be broadly categorized into two categories: the cutting-plane methods [11,5,12,13,25], and subgradient methods [3,17]. For example, in [17], Shalev-Shwartz et al described and analyzed a simple and effective stochastic sub-gradient descent algorithm and prove that the number of iterations required to obtain a solution of accuracy is O(1/ ).…”
Section: Introductionmentioning
confidence: 99%
“…Alternative approaches, which involve dimensionality reduction in feature spaces, such as kernel principal component analysis [3] and kernel Fisher discriminant analysis [4], can identify efficient basis vectors; however, these kernel-based dimensionality reduction methods require O(N 3 ) computational time for pre-processing. Recently, novel methods based on addition of efficient basis vectors, rather than selection/ limitation, have been proposed [5][6][7]. In contrast to the dimensionality reduction methods these methods search efficient basis vectors and increase the dimensionality by adding these vectors iteratively.…”
Section: Introductionmentioning
confidence: 99%
“…For example, the kernel matching pursuit (KMP) method [5] selects basis vectors from a given training dataset to approximate a solution by employing the matching pursuit theory [8], which greedily selects a basis vector that maximizes the projection of the optimal parameter vector. The cutting plane subspace pursuit (CPSP) method [6] extends the cutting planebased training of support vector machines (SVMs) [9] by incorporating pre-image optimization that finds y ∈ R D such that φ(y) represents an essential basis vector, where φ is a feature mapping function implicitly defined by a given kernel function. The kernel gradient matching pursuit (KGMP) method [7], which is briefly described in the following section, generalizes the CPSP method by incorporating pre-image optimization to find pre-image vectors that approximate gradient vectors of a performance function of model training.…”
Section: Introductionmentioning
confidence: 99%
“…However, a significant part of the handy SVM theory (such as some learning guarantees in the projection space given by a nonlinear kernel and the notion of reproducing kernel Hilbert space) does not hold for 1-norm SVM. For this reason, other methods aim at developing sparse versions of standard (L 2 ) SVM [19], [20], [21], [22], [23]. Unfortunately, most of these methods are expensive to compute and applying them on structured data, where kernel evaluations are often costly, would slow down the computation even more.…”
Section: Introductionmentioning
confidence: 99%