Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Netw 2022
DOI: 10.1145/3492866.3549720
|View full text |Cite
|
Sign up to set email alerts
|

Power-of-2-arms for bandit learning with switching costs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
20
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(20 citation statements)
references
References 4 publications
0
20
0
Order By: Relevance
“…Specifically, we divide the state space S and construct special state transitions, such that the episodic reinforcement learning is reduced to Θ(S/H) chains of bandit learning. Notice that the lower-bound analysis in [21] implies that, with the loss function l t upper-bounded by H, and with A arms and T time-slots, the regret of any bandit-learning algorithm with switching costs is at least Ω β 1/3 A 1/3 (HT ) 2/3 when T ≥ max{6H 2 A, β}. Hence, the total regret from all Θ(S/H) chains of bandit learning…”
Section: Discussionmentioning
confidence: 99%
See 4 more Smart Citations
“…Specifically, we divide the state space S and construct special state transitions, such that the episodic reinforcement learning is reduced to Θ(S/H) chains of bandit learning. Notice that the lower-bound analysis in [21] implies that, with the loss function l t upper-bounded by H, and with A arms and T time-slots, the regret of any bandit-learning algorithm with switching costs is at least Ω β 1/3 A 1/3 (HT ) 2/3 when T ≥ max{6H 2 A, β}. Hence, the total regret from all Θ(S/H) chains of bandit learning…”
Section: Discussionmentioning
confidence: 99%
“…As we discussed above, the idea for reducing switching in static RL does not work well here. To handle the losses that can change arbitrarily, our design is inspired by the approach in [21] for bandit learning, but with two novel ideas. (a) We delay each switch by a fixed (but tunable) number of episodes, which ensures that switch occurs only every Õ(T 1/3 ) episodes.…”
Section: Our Contributionsmentioning
confidence: 99%
See 3 more Smart Citations