Bandit online optimization over the permutahedron

Ailon, Nir; Hatano, Kohei; Takimoto, Eiji

doi:10.1016/j.tcs.2016.07.033

Cited by 9 publications

(14 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This is not surprising, as these choices do not produce valid permutation matrices. • Using a squared loss 1 2 ϕ(y) − θ 2 (C = R k×k , no projection) works relatively well when combined with permutation decoding. Using supersets of the Birkhoff polytope as projection set C, such as [0, 1] k×k or k×k , improves accuracy substantially.…”

Section: Resultsmentioning

confidence: 99%

“…The structured perceptron, hinge and CRF losses are generally not consistent when using MAP as decoder d [38]. Inspired by kernel dependency estimation [55,17,25], several works [15,26,31] showed good empirical results and proved consistency by combining a squared loss S sq (θ, y) := 1 2 ϕ(y) − θ 2 2 with calibrated decoding (no oracle is needed during training). A drawback of this loss, however, is that it does not make use of the output space Y during training, ignoring precious structural information.…”

Section: Background and Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Structured Prediction with Projection Oracles

Blondel¹

2019

Preprint

View full text Add to dashboard Cite

We propose in this paper a general framework for deriving loss functions for structured prediction. In our framework, the user chooses a convex set including the output space and provides an oracle for projecting onto that set. Given that oracle, our framework automatically generates a corresponding convex and smooth loss function. As we show, adding a projection as output layer provably makes the loss smaller. We identify the marginal polytope, the output space's convex hull, as the best convex set on which to project. However, because the projection onto the marginal polytope can sometimes be expensive to compute, we allow to use any convex superset instead, with potentially cheaper-to-compute projection. Since efficient projection algorithms are available for numerous convex sets, this allows us to construct loss functions for a variety of tasks. On the theoretical side, when combined with calibrated decoding, we prove that our loss functions can be used as a consistent surrogate for a (potentially non-convex) target loss function of interest. We demonstrate our losses on label ranking, ordinal regression and multilabel classification, confirming the improved accuracy enabled by projections.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Background and Related Workmentioning

confidence: 99%

Structured Prediction with Projection Oracles

Blondel¹

2019

Preprint

View full text Add to dashboard Cite

show abstract

“…It has been used for specific convex polytopes, most importantly in the optimal transport literature (Cuturi, 2013;Peyré and Cuturi, 2017) but also for learning to predict permutation matrices (Helmbold and Warmuth, 2009) or permutations (Yasutake et al, 2011;Ailon et al, 2016). The mean regularization counterpart of sparsemax is known as SparseMAP (Niculae et al, 2018):…”

Section: Mean Regularization and Sparsemapmentioning

confidence: 99%

“…Permutahedra have been used to derive online learning to rank algorithms (Yasutake et al, 2011;Ailon et al, 2016) but it is not obvious how to extract a loss from these works. Ordered weighted averaging (OWA) operators have been used to define related top-k multiclass losses (Usunier et al, 2009;Lapin et al, 2015) but without identifying the connection We set θ = y and inspect how the loss changes when varying each θ i .…”

Section: Examplesmentioning

confidence: 99%

Learning with Fenchel-Young Losses

Blondel¹,

Martins²,

Niculae³

2019

Preprint

View full text Add to dashboard Cite

Over the past decades, numerous loss functions have been been proposed for a variety of supervised learning tasks, including regression, classification, ranking, and more generally structured prediction. Understanding the core principles and theoretical properties underpinning these losses is key to choose the right loss for the right problem, as well as to create new losses which combine their strengths. In this paper, we introduce Fenchel-Young losses, a generic way to construct a convex loss function for a regularized prediction function. We provide an in-depth study of their properties in a very broad setting, covering all the aforementioned supervised learning tasks, and revealing new connections between sparsity, generalized entropies, and separation margins. We show that Fenchel-Young losses unify many well-known loss functions and allow to create useful new ones easily. Finally, we derive efficient predictive and training algorithms, making Fenchel-Young losses appealing both in theory and practice.

show abstract

“…Certains algorithmes de bandits proposent de tirer les bras de manière ordonnée selon l'espérance estimée des bras par rapport à un utilisateur. Ailon et al (2014) propose l'algorithme BanditRank pour aborder la problématique d'ordonnancement. Pour cela l'algorithme soumet à chaque instant une permutation de l'ensemble des bras disponibles, dont l'objectif principal est de trier les bras par ordre de pertinence en fonction des récompenses obtenues.…”

Section: Stratégies Contextuellesunclassified

Algorithmes de bandits pour la recommandation à tirages multiples

Louëdec¹,

Chevalier²,

Garivier³

et al. 2015

Document numérique

View full text Add to dashboard Cite

Les systèmes de recommandation (SR) à tirages multiples font référence aux SR recommandant plusieurs objets en même temps aux utilisateurs. La plupart des SR s'appuient sur des modèles d'apprentissage afin de décider les objets à recommander. Parmi ces modèles, les algorithmes de bandits offrent l'avantage d'apprendre tout en exploitant les éléments déjà appris. Les approches actuelles utilisent autant d'instances d'un algorithme de bandits que le nombre d'objets que doit recommander le SR. Nous proposons au contraire de gérer l'ensemble des recommandations par une seule instance d'un algorithme de bandits pour rendre l'apprentissage plus efficace. Nous montrons sur deux jeux de données de références (Movielens et Jester) que notre méthode, MPB (Multiple Plays Bandit), obtient des temps d'apprentissage jusqu'à treize fois plus rapides tout en obtenant des taux de clics équivalents. Nous montrons également que le choix de l'algorithme de bandits utilisé influence l'amélioration obtenue. ABSTRACT. The multiple-play recommender systems (RS) are RS which recommend several items to the users. RS are based on learning models in order to choose the items to recommend. Among these models, the bandit algorithms offer the advantage to learn and exploite the learnt elements at the same time. Current approaches require running as many instances of a bandit algorithm as there are items to recommend. As opposed to that, we handle all recommendations simultaneously, by a single instance of a bandit algorithm. We show on two benchmark datasets (Movielens and Jester) that our method, MPB (Multiple Plays Bandit), obtains a learning rate about thirteen times faster while obtaining equivalent click-through rates. We also show that the choice of the bandit algorithm used impacts the level of improvement.

show abstract

Bandit online optimization over the permutahedron

Cited by 9 publications

References 13 publications

Structured Prediction with Projection Oracles

Structured Prediction with Projection Oracles

Learning with Fenchel-Young Losses

Algorithmes de bandits pour la recommandation à tirages multiples

Contact Info

Product

Resources

About