2018 IEEE International Symposium on Information Theory (ISIT) 2018
DOI: 10.1109/isit.2018.8437541
|View full text |Cite
|
Sign up to set email alerts
|

Online Learning with Graph-Structured Feedback against Adaptive Adversaries

Abstract: We derive upper and lower bounds for the policy regret of T -round online learning problems with graph-structured feedback, where the adversary is nonoblivious but assumed to have a bounded memory. We obtain upper bounds of O(T 2/3 ) and O(T 3/4 ) for strongly-observable and weakly-observable graphs, respectively, based on analyzing a variant of the Exp3 algorithm. When the adversary is allowed a bounded memory of size 1, we show that a matching lower bound of Ω(T 2/3 ) is achieved in the case of full-informat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 15 publications
(30 reference statements)
0
6
0
Order By: Relevance
“…In EXP3.G, the loss feedback is not strictly bandit like the proposed algorithm, and each weakly connected agent is allowed to observe the losses of its neighbors. Also, the proposed regret bound is comparable to the regret bound of Lazy Revealing Action algorithm with orderÕ(T 2/3 ) for the full information setting [34], where each weakly connected agents can observe the losses of all agents in the graph network. Despite the restrictions of the bandit setting, the proposed algorithm has the same regret bound as the EXP3.G and Lazy Revealing Action algorithms with less restrictive settings.…”
Section: Theoretical Resultsmentioning
confidence: 86%
“…In EXP3.G, the loss feedback is not strictly bandit like the proposed algorithm, and each weakly connected agent is allowed to observe the losses of its neighbors. Also, the proposed regret bound is comparable to the regret bound of Lazy Revealing Action algorithm with orderÕ(T 2/3 ) for the full information setting [34], where each weakly connected agents can observe the losses of all agents in the graph network. Despite the restrictions of the bandit setting, the proposed algorithm has the same regret bound as the EXP3.G and Lazy Revealing Action algorithms with less restrictive settings.…”
Section: Theoretical Resultsmentioning
confidence: 86%
“…Regret bounds that scale with the loss of the best action have been obtained by Lykouris et al (2018). Other variants include sleeping experts (Cortes et al, 2019), switching experts (Arora et al, 2019), and adaptive adversaries (Feng and Loh, 2018). Some works use feedback graphs to bound the regret in auctions (Cesa-Bianchi et al, 2017;Han et al, 2020).…”
Section: Additional Related Workmentioning
confidence: 99%
“…Other similar works [LBS18, TDD17, BES14] mainly focused on stochastic settings. The follow-up works related to the weakly observable graph mainly considered harder settings including the time-varying graphs [ACBDK15b,CHK16,ACBG + 17], bounded-memory adversaries [FL18] and the feedback graphs with switching costs [RF19,AMM19]. The recent work of [LLZ20] considered the bound with respect to cumulative losses of the best arm.…”
Section: Graph Typementioning
confidence: 99%