Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback

Alon, Noga; Cesa-Bianchi, Nicolò; Gentile, Claudio; Mannor, Shie; Mansour, Yishay; Shamir, Ohad

doi:10.1137/140989455

Cited by 50 publications

(51 citation statements)

References 26 publications

(32 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Examples include combinatorial bandits [5], [6], [7], [8], linearly parameterized bandits [9], [10], [11], and spectral bandits for smooth graph functions [12], [13]. The second type of arm relation can be termed as observation-based relation [14], [15], [16]. Specifically, playing an arm provides additional side observations about its neighboring arms.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Multi-Armed Bandits on Partially Revealed Unit Interval Graphs

Vakili²,

Zhao

et al. 2020

IEEE Trans. Netw. Sci. Eng.

View full text Add to dashboard Cite

A stochastic multi-armed bandit problem with side information on the similarity and dissimilarity across different arms is considered. The action space of the problem can be represented by a unit interval graph (UIG) where each node represents an arm and the presence (absence) of an edge between two nodes indicates similarity (dissimilarity) between their mean rewards. Two settings of complete and partial side information based on whether the UIG is fully revealed are studied and a general two-step learning structure consisting of an offline reduction of the action space and online aggregation of reward observations from similar arms is proposed to fully exploit the topological structure of the side information. In both cases, the computation efficiency and the order optimality of the proposed learning policies in terms of both the size of the action space and the time length are established.Index Terms-Multi-armed bandits, unit interval graph, side information.✦

show abstract

Section: Related Workmentioning

confidence: 99%

“…where α is an input parameter, the regret analysis in Theorem 2 still applies and the upper bound on regret is only affected up to a constant scaling factor, as long as α > 6σ 2 . A similar extension also applies to LSDT-PSI if we change the second terms of the UCB indicies in (16) to…”

Section: Extensions To Other Distributionsmentioning

confidence: 99%

Multi-Armed Bandits on Partially Revealed Unit Interval Graphs

Vakili²,

Zhao

et al. 2020

IEEE Trans. Netw. Sci. Eng.

View full text Add to dashboard Cite

show abstract

“…The information sharing among devices can be modeled through directed graphs [30]- [32]. Consider a single IoT device j, which at the end of slot t obtains the security risk of the selected edge server a j t as well as other servers' security risk (a.k.a.…”

Section: A Cooperation Via a Graph-encoded Feedbackmentioning

confidence: 99%

“…To facilitate the analysis and explicitly quantify the impact of information sharing, a few notations from graph theory are introduced [32]. An independent set of an undirected graph is a set of vertices that are not connected by any edges; while the so-termed independence number α j t is the cardinality of the maximum independent set.…”

Section: A Cooperation Via a Graph-encoded Feedbackmentioning

confidence: 99%

Secure Mobile Edge Computing in IoT via Collaborative Online Learning

Chen

Giannakis

2019

IEEE Trans. Signal Process.

View full text Add to dashboard Cite

To accommodate heterogeneous tasks in Internet of Things (IoT), a new communication and computing paradigm termed mobile edge computing emerges that extends computing services from the cloud to edge, but at the same time exposes new challenges on security. The present paper studies online securityaware edge computing under jamming attacks. Leveraging online learning tools, novel algorithms abbreviated as SAVE-S and SAVE-A are developed to cope with the stochastic and adversarial forms of jamming, respectively. Without utilizing extra resources such as spectrum and transmission power to evade jamming attacks, SAVE-S and SAVE-A can select the most reliable server to offload computing tasks with minimal privacy and security concerns. It is analytically established that without any prior information on future jamming and server security risks, the proposed schemes can achieve O √ T regret. Information sharing among devices can accelerate the security-aware computing tasks. Incorporating the information shared by other devices, SAVE-S and SAVE-A offer impressive improvements on the sublinear regret, which is guaranteed by what is termed "value of cooperation." Effectiveness of the proposed schemes is tested on both synthetic and real datasets.

show abstract

“…Although the minimax regret was shown to be Θ(T 1/2 ) in the case of full-information games and Θ(T 2/3 ) in the case of bandit feedback [5,6], the gap between O(T 2/3 ) upper bounds and Ω(T 1/2 ) lower bounds for the more general class of adversaries with unit memory in the case of full-information feedback has remained unaddressed. For the problem of general feedback graphs with oblivious adversaries, Alon et al [7,8] showed that the regret is characterized by certain characteristics of the graph structure involving domination numbers and independent sets. This leads to three different regret regimes for minimax regret: Θ(T 1/2 ), Θ(T 2/3 ), and Θ(T ), which may be compared with the different rates of learning for partial monitoring games [9].…”

Section: Introductionmentioning

confidence: 99%

Online Learning with Graph-Structured Feedback against Adaptive Adversaries

Feng

Loh

2018

2018 IEEE International Symposium on Information Theory (ISIT)

View full text Add to dashboard Cite

We derive upper and lower bounds for the policy regret of T -round online learning problems with graph-structured feedback, where the adversary is nonoblivious but assumed to have a bounded memory. We obtain upper bounds of O(T 2/3 ) and O(T 3/4 ) for strongly-observable and weakly-observable graphs, respectively, based on analyzing a variant of the Exp3 algorithm. When the adversary is allowed a bounded memory of size 1, we show that a matching lower bound of Ω(T 2/3 ) is achieved in the case of full-information feedback. We also study the particular loss structure of an oblivious adversary with switching costs, and show that in such a setting, non-revealing strongly-observable feedback graphs achieve a lower bound of Ω(T 2/3 ), as well.

show abstract

Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback

Cited by 50 publications

References 26 publications

Multi-Armed Bandits on Partially Revealed Unit Interval Graphs

Multi-Armed Bandits on Partially Revealed Unit Interval Graphs

Secure Mobile Edge Computing in IoT via Collaborative Online Learning

Online Learning with Graph-Structured Feedback against Adaptive Adversaries

Contact Info

Product

Resources

About