Pegasos: primal estimated sub-gradient solver for SVM

Shalev‐Shwartz, Shai; Singer, Yoram; Srebro, Nathan; Cotter, Andrew

doi:10.1007/s10107-010-0420-4

Cited by 1,243 publications

(1,167 citation statements)

References 27 publications

(47 reference statements)

Supporting

Mentioning

1,154

Contrasting

Unclassified

Order By: Relevance

“…It can be checked easily that the above chosen u and v, together with α satisfy the KKT condition (17).…”

Section: Algorithm 4 the Practical Second-order Working Set Selectionmentioning

confidence: 99%

“…The existing algorithms can be broadly categorized into two categories: the cutting-plane methods [11,5,12,13,25], and subgradient methods [3,17]. For example, in [17], Shalev-Shwartz et al described and analyzed a simple and effective stochastic sub-gradient descent algorithm and prove that the number of iterations required to obtain a solution of accuracy is O(1/ ). Generally speaking, without counting the loading time, these recent advances on linear classification have shown that training one million instances takes only a few seconds [22].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Two New Decomposition Algorithms for Training Bound-Constrained Support Vector Machines*

Niu

Zhou

Zhao

et al. 2015

Foundations of Computing and Decision Sciences

View full text Add to dashboard Cite

Abstract. Bound-constrained Support Vector Machine(SVM) is one of the stateof-art model for binary classification. The decomposition method is currently one of the major methods for training SVMs, especially when the nonlinear kernel is used. In this paper, we proposed two new decomposition algorithms for training bound-constrained SVMs. Projected gradient algorithm and interior point method are combined together to solve the quadratic subproblem efficiently. The main difference between the two algorithms is the way of choosing working set. The first one only uses first order derivative information of the model for simplicity. The second one incorporate part of second order information into the process of working set selection, besides the gradient. Both algorithms are proved to be global convergent in theory. New algorithms is compared with the famous package BSVM. Numerical experiments on several public data sets validate the efficiency of the proposed methods.

show abstract

“…It can be checked easily that the above chosen u and v, together with α satisfy the KKT condition (17).…”

Section: Algorithm 4 the Practical Second-order Working Set Selectionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Two New Decomposition Algorithms for Training Bound-Constrained Support Vector Machines*

Niu

Zhou

Zhao

et al. 2015

Foundations of Computing and Decision Sciences

View full text Add to dashboard Cite

show abstract

“…We used a primal projected sub-gradient algorithm [19], and tuned the regularization constants on the validation data. The mixture model took typically less than 10 global iterations to converge.…”

Section: Comparison Of Modelsmentioning

confidence: 99%

“…This corresponds to a ranking SVM [11]. One simple strategy to minimize this objective is to use a primal sub-gradient method [19], which is the approach we use in this paper.…”

Section: Introductionmentioning

confidence: 99%

A Latent Variable Ranking Model for Content-Based Retrieval

Carreras

Torralba

2012

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Since their introduction, ranking SVM models [11] have become a powerful tool for training content-based retrieval systems. All we need for training a model are retrieval examples in the form of triplet constraints, i.e. examples specifying that relative to some query, a database item a should be ranked higher than database item b. These types of constraints could be obtained from feedback of users of the retrieval system. Most previous ranking models learn either a global combination of elementary similarity functions or a combination defined with respect to a single database item. Instead, we propose a "coarse to fine" ranking model where given a query we first compute a distribution over "coarse" classes and then use the linear combination that has been optimized for queries of that class. These coarse classes are hidden and need to be induced by the training algorithm. We propose a latent variable ranking model that induces both the latent classes and the weights of the linear combination for each class from ranking triplets. Our experiments over two large image datasets and a text retrieval dataset show the advantages of our model over learning a global combination as well as a combination for each test point (i.e. transductive setting). Furthermore, compared to the transductive approach our model has a clear computational advantages since it does not need to be retrained for each test query.

show abstract

“…However, the problem with bottom-q sketches, is that the samples lose their alignment. In applications of large-scale machine learning this alignment is needed in order to efficiently construct a dot-product for use with a linear support vector machine (SVM) 4 such as LIBLINEAR [9] or Pegasos [19]. Using the alignment of kˆminwise, it was shown how to construct such a dot-product in [13] based on this scheme.…”

Section: Introductionmentioning

confidence: 99%

Approximately Minwise Independence with Twisted Tabulation

Dahlgaard

Thorup

2014

Algorithm Theory – SWAT 2014

View full text Add to dashboard Cite

A random hash function h is ε-minwise if for any set S, |S| " n, and element x P S, Prrhpxq " min hpSqs " p1˘εq{n. Minwise hash functions with low bias ε have widespread applications within similarity estimation.Hashing from a universe rus, the twisted tabulation hashing of Pǎtraşcu and Thorup [SODA'13] makes c " Op1q lookups in tables of size u 1{c . Twisted tabulation was invented to get good concentration for hashing based sampling. Here we show that twisted tabulation yields Op1{u 1{c q-minwise hashing.In the classic independence paradigm of Wegman and Carter [FOCS'79]Õp1{u 1{c q-minwise hashing requires Ωplog uq-independence [Indyk SODA'99]. Pǎtraşcu and Thorup [STOC'11] had shown that simple tabulation, using same space and lookups yieldsÕp1{n 1{c q-minwise independence, which is good for large sets, but useless for small sets. Our analysis uses some of the same methods, but is much cleaner bypassing a complicated induction argument.

show abstract

Pegasos: primal estimated sub-gradient solver for SVM

Cited by 1,243 publications

References 27 publications

Two New Decomposition Algorithms for Training Bound-Constrained Support Vector Machines*

Two New Decomposition Algorithms for Training Bound-Constrained Support Vector Machines*

A Latent Variable Ranking Model for Content-Based Retrieval

Approximately Minwise Independence with Twisted Tabulation

Contact Info

Product

Resources

About