Yoram Singer scite author profile

We describe several improvements to Freund and Schapire's AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give a simplified analysis of AdaBoost in this setting, and we show how this analysis can be used to find improved parameter settings as well as a refined criterion for training weak hypotheses. We give a specific method for assigning confidences to the predictions of decision trees, a method closely related to one used by Quinlan. This method also suggests a technique for growing decision trees which turns out to be identical to one proposed by Keams and Mansour.We focus next on how to apply the new boosting algorithms to multiclass classification problems, paaicularly to the multi-label case in which each example may belong to more than one class. We give two boosting methods for this problem. One of these leads to a new method for handling the single-label case which is simpler but as effective as techniques suggested by Freund and Schapire. Finally, we give some experimental results comparing a few of the algorithms discussed in this paper.

show abstract

Feature-rich part-of-speech tagging with a cyclic dependency network

Toutanova

et al. 2003

View full text Add to dashboard Cite

We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective use of priors in conditional loglinear models, and (iv) fine-grained modeling of unknown word features. Using these ideas together, the resulting tagger gives a 97.24% accuracy on the Penn Treebank WSJ, an error reduction of 4.4% on the best previous single automatically learned tagging result.1 Rather than subscripting all variables with a position index, we use a hopefully clearer relative notation, where t 0 denotes the current position and t −n and t +n are left and right context tags, and similarly for words.

show abstract

Pegasos: primal estimated sub-gradient solver for SVM

et al. 2010

View full text Add to dashboard Cite

We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy isÕ(1/ ), where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require Ω(1/ 2 ) iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total run-time of our method isÕ(d/(λ )), where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to non-linear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an order-of-magnitude speedup over previous SVM learning methods.

show abstract

Efficient projections onto thel₁-ball for learning in high dimensions

Duchi¹,

Shalev‐Shwartz²,

Singer³

et al. 2008

910

869

View full text Add to dashboard Cite

We describe efficient algorithms for projecting a vector onto the ℓ 1 -ball. We present two methods for projection. The first performs exact projection in O(n) expected time, where n is the dimension of the space. The second works on vectors k of whose elements are perturbed outside the ℓ 1 -ball, projecting in O(k log(n)) time. This setting is especially useful for online learning in sparse feature spaces such as text categorization applications. We demonstrate the merits and effectiveness of our algorithms in numerous batch and online learning tasks. We show that variants of stochastic gradient projection methods augmented with our efficient projection procedures outperform interior point methods, which are considered state-of-the-art optimization techniques. We also show that in online settings gradient updates with ℓ 1 projections outperform the exponentiated gradient algorithm while obtaining models with high degrees of sparsity.

show abstract

Learning to Order Things

Cohen¹,

Schapire²,

Singer³

1999

jair

476

438

View full text Add to dashboard Cite

There are many applications in which it is desirable to order rather than classify instances. Here we consider the problem of learning how to order instances given feedback in the form of preference judgments, i.e., statements to the effect that one instance should be ranked ahead of another. We outline a two-stage approach in which one first learns by conventional means a binary preference function indicating whether it is advisable to rank one instance before another. Here we consider an on-line algorithm for learning preference functions that is based on Freund and Schapire's 'Hedge' algorithm. In the second stage, new instances are ordered so as to maximize agreement with the learned preference function. We show that the problem of finding the ordering that agrees best with a learned preference function is NP-complete. Nevertheless, we describe simple greedy algorithms that are guaranteed to find a good approximation. Finally, we show how metasearch can be formulated as an ordering problem, and present experimental results on learning a combination of 'search experts', each of which is a domain-specific query expansion strategy for a web search engine

show abstract

The power of amnesia: Learning probabilistic automata with variable memory length

Ron¹,

1997

View full text Add to dashboard Cite

Abstract.We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions generated by general probabilistic automata, we prove that the algorithm we present can efficiently learn distributions generated by PSAs. In particular, we show that for any target PSA, the KL-divergence between the distribution generated by the target and the distribution generated by the hypothesis the learning algorithm outputs, can be made small with high confidence in polynomial time and sample complexity. The learning algorithm is motivated by applications in human-machine interaction, Here we present two applications of the algorithm. In the first one we apply the algorithm in order to construct a model of the English language, and use this model to correct corrupted text. In the second application we construct a simple stochastic model for E.coli DNA.

show abstract

Ultraconservative Online Algorithms for Multiclass Problems

2001

View full text Add to dashboard Cite

A Stochastic Quasi-Newton Method for Large-Scale Optimization

Byrd¹,

Hansen²,

Nocedal³

et al. 2016

SIAM J. Optim.

319

299

View full text Add to dashboard Cite

The question of how to incorporate curvature information in stochastic approximation methods is challenging. The direct application of classical quasi-Newton updating techniques for deterministic optimization leads to noisy curvature estimates that have harmful effects on the robustness of the iteration. In this paper, we propose a stochastic quasi-Newton method that is efficient, robust and scalable. It employs the classical BFGS update formula in its limited memory form, and is based on the observation that it is beneficial to collect curvature information pointwise, and at regular intervals, through (sub-sampled) Hessian-vector products. This technique differs from the classical approach that would compute differences of gradients at every iteration, and where controlling the quality of the curvature estimates can be difficult. We present numerical results on problems arising in machine learning that suggest that the proposed method shows much promise.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yoram Singer

Improved boosting algorithms using confidence-rated predictions

Feature-rich part-of-speech tagging with a cyclic dependency network

Pegasos: primal estimated sub-gradient solver for SVM

Efficient projections onto thel₁-ball for learning in high dimensions

Learning to Order Things

The power of amnesia: Learning probabilistic automata with variable memory length

Ultraconservative Online Algorithms for Multiclass Problems

A Stochastic Quasi-Newton Method for Large-Scale Optimization

Contact Info

Product

Resources

About

Yoram Singer

Improved boosting algorithms using confidence-rated predictions

Feature-rich part-of-speech tagging with a cyclic dependency network

Pegasos: primal estimated sub-gradient solver for SVM

Efficient projections onto thel1-ball for learning in high dimensions

Learning to Order Things

The power of amnesia: Learning probabilistic automata with variable memory length

Ultraconservative Online Algorithms for Multiclass Problems

A Stochastic Quasi-Newton Method for Large-Scale Optimization

Contact Info

Product

Resources

About

Efficient projections onto thel₁-ball for learning in high dimensions