2017
DOI: 10.1137/140989455
|View full text |Cite
|
Sign up to set email alerts
|

Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback

Abstract: We present and study a partial-information model of online learning, where a decision maker repeatedly chooses from a finite set of actions, and observes some subset of the associated losses. This naturally models several situations where the losses of different actions are related, and knowing the loss of one action provides information on the loss of other actions. Moreover, it generalizes and interpolates between the well studied full-information setting (where all losses are revealed) and the bandit settin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
48
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 50 publications
(51 citation statements)
references
References 26 publications
(32 reference statements)
3
48
0
Order By: Relevance
“…Examples include combinatorial bandits [5], [6], [7], [8], linearly parameterized bandits [9], [10], [11], and spectral bandits for smooth graph functions [12], [13]. The second type of arm relation can be termed as observation-based relation [14], [15], [16]. Specifically, playing an arm provides additional side observations about its neighboring arms.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Examples include combinatorial bandits [5], [6], [7], [8], linearly parameterized bandits [9], [10], [11], and spectral bandits for smooth graph functions [12], [13]. The second type of arm relation can be termed as observation-based relation [14], [15], [16]. Specifically, playing an arm provides additional side observations about its neighboring arms.…”
Section: Related Workmentioning
confidence: 99%
“…where α is an input parameter, the regret analysis in Theorem 2 still applies and the upper bound on regret is only affected up to a constant scaling factor, as long as α > 6σ 2 . A similar extension also applies to LSDT-PSI if we change the second terms of the UCB indicies in (16) to…”
Section: Extensions To Other Distributionsmentioning
confidence: 99%
“…The information sharing among devices can be modeled through directed graphs [30]- [32]. Consider a single IoT device j, which at the end of slot t obtains the security risk of the selected edge server a j t as well as other servers' security risk (a.k.a.…”
Section: A Cooperation Via a Graph-encoded Feedbackmentioning
confidence: 99%
“…To facilitate the analysis and explicitly quantify the impact of information sharing, a few notations from graph theory are introduced [32]. An independent set of an undirected graph is a set of vertices that are not connected by any edges; while the so-termed independence number α j t is the cardinality of the maximum independent set.…”
Section: A Cooperation Via a Graph-encoded Feedbackmentioning
confidence: 99%
“…Although the minimax regret was shown to be Θ(T 1/2 ) in the case of full-information games and Θ(T 2/3 ) in the case of bandit feedback [5,6], the gap between O(T 2/3 ) upper bounds and Ω(T 1/2 ) lower bounds for the more general class of adversaries with unit memory in the case of full-information feedback has remained unaddressed. For the problem of general feedback graphs with oblivious adversaries, Alon et al [7,8] showed that the regret is characterized by certain characteristics of the graph structure involving domination numbers and independent sets. This leads to three different regret regimes for minimax regret: Θ(T 1/2 ), Θ(T 2/3 ), and Θ(T ), which may be compared with the different rates of learning for partial monitoring games [9].…”
Section: Introductionmentioning
confidence: 99%