Sparse kernel SVMs via cutting-plane training

Joachims, Thorsten; Yu, Chun-Nam

doi:10.1007/s10994-009-5126-6

Cited by 108 publications

(90 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several researchers also explore how to train the primal form of (4) and the extended models fast. The existing algorithms can be broadly categorized into two categories: the cutting-plane methods [11,5,12,13,25], and subgradient methods [3,17]. For example, in [17], Shalev-Shwartz et al described and analyzed a simple and effective stochastic sub-gradient descent algorithm and prove that the number of iterations required to obtain a solution of accuracy is O(1/ ).…”

Section: Introductionmentioning

confidence: 99%

Two New Decomposition Algorithms for Training Bound-Constrained Support Vector Machines*

Niu

Zhou

Zhao

et al. 2015

Foundations of Computing and Decision Sciences

View full text Add to dashboard Cite

Abstract. Bound-constrained Support Vector Machine(SVM) is one of the stateof-art model for binary classification. The decomposition method is currently one of the major methods for training SVMs, especially when the nonlinear kernel is used. In this paper, we proposed two new decomposition algorithms for training bound-constrained SVMs. Projected gradient algorithm and interior point method are combined together to solve the quadratic subproblem efficiently. The main difference between the two algorithms is the way of choosing working set. The first one only uses first order derivative information of the model for simplicity. The second one incorporate part of second order information into the process of working set selection, besides the gradient. Both algorithms are proved to be global convergent in theory. New algorithms is compared with the famous package BSVM. Numerical experiments on several public data sets validate the efficiency of the proposed methods.

show abstract

Section: Introductionmentioning

confidence: 99%

Two New Decomposition Algorithms for Training Bound-Constrained Support Vector Machines*

Niu

Zhou

Zhao

et al. 2015

Foundations of Computing and Decision Sciences

View full text Add to dashboard Cite

show abstract

“…Alternative approaches, which involve dimensionality reduction in feature spaces, such as kernel principal component analysis [3] and kernel Fisher discriminant analysis [4], can identify efficient basis vectors; however, these kernel-based dimensionality reduction methods require O(N 3 ) computational time for pre-processing. Recently, novel methods based on addition of efficient basis vectors, rather than selection/ limitation, have been proposed [5][6][7]. In contrast to the dimensionality reduction methods these methods search efficient basis vectors and increase the dimensionality by adding these vectors iteratively.…”

Section: Introductionmentioning

confidence: 99%

“…For example, the kernel matching pursuit (KMP) method [5] selects basis vectors from a given training dataset to approximate a solution by employing the matching pursuit theory [8], which greedily selects a basis vector that maximizes the projection of the optimal parameter vector. The cutting plane subspace pursuit (CPSP) method [6] extends the cutting planebased training of support vector machines (SVMs) [9] by incorporating pre-image optimization that finds y ∈ R D such that φ(y) represents an essential basis vector, where φ is a feature mapping function implicitly defined by a given kernel function. The kernel gradient matching pursuit (KGMP) method [7], which is briefly described in the following section, generalizes the CPSP method by incorporating pre-image optimization to find pre-image vectors that approximate gradient vectors of a performance function of model training.…”

Section: Introductionmentioning

confidence: 99%

Basis vector orthogonalization for an improved kernel gradient matching pursuit method

Kubo

Watanabe

Nakamura

et al. 2012

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

With the aim of achieving a computationally efficient optimization of kernel-based probabilistic models for various problems, such as sequential pattern recognition, we have already developed the kernel gradient matching pursuit method as an approximation technique for kernel-based classification. The conventional kernel gradient matching pursuit method approximates the optimal parameter vector by using a linear combination of a small number of basis vectors. In this paper, we propose an improved kernel gradient matching pursuit method that introduces orthogonality constraints to the obtained basis vector set. We verified the efficiency of the proposed method by conducting recognition experiments based on handwritten image datasets and speech datasets. We realized a scalable kernel optimization that incorporated various models, handled very highdimensional features (> 100 K features), and enabled the use of large scale datasets (> 10 M samples).

show abstract

“…However, a significant part of the handy SVM theory (such as some learning guarantees in the projection space given by a nonlinear kernel and the notion of reproducing kernel Hilbert space) does not hold for 1-norm SVM. For this reason, other methods aim at developing sparse versions of standard (L 2 ) SVM [19], [20], [21], [22], [23]. Unfortunately, most of these methods are expensive to compute and applying them on structured data, where kernel evaluations are often costly, would slow down the computation even more.…”

Section: Introductionmentioning

confidence: 99%

An Experimental Study on Learning with Good Edit Similarity Functions

Bellet

Sebban

Habrard

2011

2011 IEEE 23rd International Conference on Tools With Artificial Intelligence

View full text Add to dashboard Cite

Abstract-Similarity functions are essential to many learning algorithms. To allow their use in support vector machines (SVM), i.e., for the convergence of the learning algorithm to be guaranteed, they must be valid kernels. In the case of structured data, the similarities based on the popular edit distance often do not satisfy this requirement, which explains why they are typically used with k-nearest neighbor (k-NN). A common approach to use such edit similarities in SVM is to transform them into potentially (but not provably) valid kernels. Recently, a different theory of learning with (ǫ, γ, τ )-good similarity functions was proposed, allowing the use of non-kernel similarity functions. Moreover, the resulting models are supposedly sparse, as opposed to standard SVM models that can be unnecessarily dense. In this paper, we study the relevance and applicability of this theory in the context of string edit similarities. We show that they are naturally good for a given string classification task and provide experimental evidence that the obtained models not only clearly outperform the k-NN approach, but are also competitive with standard SVM models learned with state-of-the-art edit kernels, while being much sparser.

show abstract

Sparse kernel SVMs via cutting-plane training

Cited by 108 publications

References 6 publications

Two New Decomposition Algorithms for Training Bound-Constrained Support Vector Machines*

Two New Decomposition Algorithms for Training Bound-Constrained Support Vector Machines*

Basis vector orthogonalization for an improved kernel gradient matching pursuit method

An Experimental Study on Learning with Good Edit Similarity Functions

Contact Info

Product

Resources

About