Improved Second-Order Bounds for Prediction with Expert Advice

External regret compares the performance of an online algorithm, selecting among N actions, to the performance of the best of those actions in hindsight. Internal regret compares the loss of an online algorithm to the loss of a modified online algorithm, which consistently replaces one action by another.In this paper we give a simple generic reduction that, given an algorithm for the external regret problem, converts it to an efficient online algorithm for the internal regret problem. We provide methods that work both in the full information model, in which the loss of every action is observed at each time step, and the partial information (bandit) model, where at each time step only the loss of the selected action is observed. The importance of internal regret in game theory is due to the fact that in a general game, if each player has sublinear internal regret, then the empirical frequencies converge to a correlated equilibrium.For external regret we also derive a quantitative regret bound for a very general setting of regret, which includes an arbitrary set of modification rules (that possibly modify the online algorithm) and an arbitrary set of time selection functions (each giving different weight to each time step). The regret for a given time selection and modification rule is the difference between the cost of the online algorithm and the cost of the modified online algorithm, where the costs are weighted by the time selection function. This can be viewed as a generalization of the previously-studied sleeping experts setting.

show abstract

“…For example, using the algorithms of Cesa-Bianchi et al (2005) one can get a more refined regret bound, which depends on the second moment.…”

Section: Corollary 6 Using An Optimized Experts Algorithm As the A I mentioning

confidence: 99%

“…We need to use here an external regret algorithm which does not need to have as an input the value of L i min . An example of such an algorithm is Corollary 2 in Cesa- Bianchi et al (2005), which guarantees an external regret of at most O( √ L min log N + log N).…”

Section: Lower Bounds On Swap Regretmentioning

confidence: 99%

From External to Internal Regret

2005

Self Cite

View full text Add to dashboard Cite

show abstract

“…These quantities are of importance, because they are used in schemes for adaptively tuning the learning rate online. In particular, [6] introduces a parameter-free online tuning scheme based on the variance, for which the expected regret is at most of the order…”

Section: Computing Expected Loss and Variancementioning

confidence: 99%

“…However, the overall running time changes from O (T ) to O (T 2 ). 6 We don't know how to do the fancier method efficiently, which mixes in a bit of the past average distribution. The reason is that the exponential weight updates on the combined lists seem to be at loggerheads with mixing in the past average weight.…”

Section: Open Problemsmentioning

confidence: 99%

Combining initial segments of lists

Warmuth

Koolen

Helmbold

2014

Theoretical Computer Science

View full text Add to dashboard Cite

a r t i c l e i n f o a b s t r a c t Keywords:Online learning Ranking Worst-case analysis Relative loss boundsWe propose a new way to build a combined list from K base lists, each containing N items. A combined list consists of top segments of various sizes from each base list so that the total size of all top segments equals N. A sequence of item requests is processed and the goal is to minimize the total number of misses. That is, we seek to build a combined list that contains all the frequently requested items. We first consider the special case of disjoint base lists. There, we design an efficient algorithm that computes the best combined list for a given sequence of requests. In addition, we develop a randomized online algorithm whose expected number of misses is close to that of the best combined list chosen in hindsight. We prove lower bounds that show that the expected number of misses of our randomized algorithm is close to the optimum. In the presence of duplicate items, we show that computing the best combined list is NP-hard. We show that our algorithms still apply to a linearized notion of loss in this case. We expect that this new way of aggregating lists will find many ranking applications.

show abstract

“…al. [7] studied second-order bounds for exponentially weighted average forecaster and they analyzed the expected regret of the algorithm in the full monitoring case when the bound of the loss function unknown. They indicated their results in partial monitoring case.…”

Section: Introductionmentioning

confidence: 99%

Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring

Allenberg

Auer

Györfi

et al. 2006

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. In this paper the sequential prediction problem with expert advice is considered when the loss is unbounded under partial monitoring scenarios. We deal with a wide class of the partial monitoring problems: the combination of the label efficient and multi-armed bandit problem, that is, where the algorithm is only informed about the performance of the chosen expert with probability ε ≤ 1. For bounded losses an algorithm is given whose expected regret scales with the square root of the loss of the best expert. For unbounded losses we prove that Hannan consistency can be achieved, depending on the growth rate of the average squared losses of the experts.

show abstract

Improved Second-Order Bounds for Prediction with Expert Advice

Cited by 100 publications

References 15 publications

From External to Internal Regret

From External to Internal Regret

Combining initial segments of lists

Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring

Contact Info

Product

Resources

About