Online Learning with Graph-Structured Feedback against Adaptive Adversaries

Feng, Zhili; Loh, Po-Ling

doi:10.1109/isit.2018.8437541

Cited by 6 publications

(6 citation statements)

References 15 publications

(30 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In EXP3.G, the loss feedback is not strictly bandit like the proposed algorithm, and each weakly connected agent is allowed to observe the losses of its neighbors. Also, the proposed regret bound is comparable to the regret bound of Lazy Revealing Action algorithm with orderÕ(T 2/3 ) for the full information setting [34], where each weakly connected agents can observe the losses of all agents in the graph network. Despite the restrictions of the bandit setting, the proposed algorithm has the same regret bound as the EXP3.G and Lazy Revealing Action algorithms with less restrictive settings.…”

Section: Theoretical Resultsmentioning

confidence: 86%

Learning the Truth by Weakly Connected Agents in Social Networks Using Multi-Armed Bandit

Odeyomi

2020

IEEE Access

View full text Add to dashboard Cite

Section: Theoretical Resultsmentioning

confidence: 86%

Learning the Truth by Weakly Connected Agents in Social Networks Using Multi-Armed Bandit

Odeyomi

2020

IEEE Access

View full text Add to dashboard Cite

“…Regret bounds that scale with the loss of the best action have been obtained by Lykouris et al (2018). Other variants include sleeping experts (Cortes et al, 2019), switching experts (Arora et al, 2019), and adaptive adversaries (Feng and Loh, 2018). Some works use feedback graphs to bound the regret in auctions (Cesa-Bianchi et al, 2017;Han et al, 2020).…”

Section: Additional Related Workmentioning

confidence: 99%

Beyond Bandit Feedback in Online Multiclass Classification

van der Hoeven,

Fusco,

Cesa-Bianchi

2021

Preprint

View full text Add to dashboard Cite

We study the problem of online multiclass classification in a setting where the learner's feedback is determined by an arbitrary directed graph. While including bandit feedback as a special case, feedback graphs allow a much richer set of applications, including filtering and label efficient classification. We introduce Gappletron, the first online multiclass algorithm that works with arbitrary feedback graphs. For this new algorithm, we prove surrogate regret bounds that hold, both in expectation and with high probability, for a large class of surrogate losses. Our bounds are of order B √ ρKT , where B is the diameter of the prediction space, K is the number of classes, T is the time horizon, and ρ is the domination number (a graph-theoretic parameter affecting the amount of exploration). In the full information case, we show that Gappletron achieves a constant surrogate regret of order B 2 K. We also prove a general lower bound of order max B 2 K, √ T showing that our upper bounds are not significantly improvable. Experiments on synthetic data show that for various feedback graphs our algorithm is competitive against known baselines.

show abstract

“…Other similar works [LBS18, TDD17, BES14] mainly focused on stochastic settings. The follow-up works related to the weakly observable graph mainly considered harder settings including the time-varying graphs [ACBDK15b,CHK16,ACBG + 17], bounded-memory adversaries [FL18] and the feedback graphs with switching costs [RF19,AMM19]. The recent work of [LLZ20] considered the bound with respect to cumulative losses of the best arm.…”

Section: Graph Typementioning

confidence: 99%

Understanding Bandits with Graph Feedback

Chen¹,

Huang²,

Li³

et al. 2021

Preprint

View full text Add to dashboard Cite

A. The bandit problem with graph feedback, proposed in [Mannor and Shamir, NeurIPS 2011], is modeled by a directed graph = ( , ) where is the collection of bandit arms, and once an arm is triggered, all its incident arms are observed. A fundamental question is how the structure of the graph affects the min-max regret. We propose the notions of the fractional weak domination number * and the -packing independence number capturing upper bound and lower bound for the regret respectively. We show that the two notions are inherently connected via aligning them with the linear program of the weakly dominating set and its dual -the fractional vertex packing set respectively. Based on this connection, we utilize the strong duality theorem to prove a general regret upper boundand a lower bound Ω ( * / )3 where is the integrality gap of the dual linear program. Therefore, our bounds are tight up to a (log | |)1 3 factor on graphs with bounded integrality gap for the vertex packing problem including trees and graphs with bounded degree. Moreover, we show that for several special families of graphs, we can get rid of the (log | |) 1 3 factor and establish optimal regret.

show abstract

Online Learning with Graph-Structured Feedback against Adaptive Adversaries

Cited by 6 publications

References 15 publications

Learning the Truth by Weakly Connected Agents in Social Networks Using Multi-Armed Bandit

Learning the Truth by Weakly Connected Agents in Social Networks Using Multi-Armed Bandit

Beyond Bandit Feedback in Online Multiclass Classification

Understanding Bandits with Graph Feedback

Contact Info

Product

Resources

About