2013
DOI: 10.1007/978-3-642-40935-6_16
|View full text |Cite
|
Sign up to set email alerts
|

Robust Risk-Averse Stochastic Multi-armed Bandits

Abstract: Abstract. We study a variant of the standard stochastic multi-armed bandit problem when one is not interested in the arm with the best mean, but instead in the arm maximising some coherent risk measure criterion. Further, we are studying the deviations of the regret instead of the less informative expected regret. We provide an algorithm, called RA-UCB to solve this problem, together with a high probability bound on its regret.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
24
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 34 publications
(25 citation statements)
references
References 23 publications
0
24
0
Order By: Relevance
“…PROOF OF PROPOSITION 3.3. From the lower bound in (20) in the proof of Theorem 3.2, there exists c > 0 such that for all x in the range T ≤ x ≤ (1 − )T and T sufficiently large,…”
Section: 1mentioning
confidence: 99%
See 1 more Smart Citation
“…PROOF OF PROPOSITION 3.3. From the lower bound in (20) in the proof of Theorem 3.2, there exists c > 0 such that for all x in the range T ≤ x ≤ (1 − )T and T sufficiently large,…”
Section: 1mentioning
confidence: 99%
“…There is also a growing literature on risk-averse formulations of the MAB problem, with a non-comprehensive list being: [23,20,29,24,26,11,7,25,28,21,4,15]. As noted earlier, risk-averse formulations involve defining arm optimality using criteria other than the expected value.…”
mentioning
confidence: 99%
“…However, the performance guarantees were still within the risk-neutral framework (in terms of the loss in the expected total reward) under the assumption that the best action in terms of the mean value is also the best action in terms of the conditional value at risk. Logarithm of moment generating function was considered as a risk measure for bandit problems in [20] and high probability bounds on regret were obtained. We point out that the logarithm of the moment generating function reduces to mean-variance for a random variable with Gaussian distribution.…”
Section: Related Workmentioning
confidence: 99%
“…Other risk-averse MAB papers also considered the CVaR. Upper confidence bound algorithms in this context are studied by Maillard (2013), Cassel et al (2018), Khajonchotpanya et al (2021). Alternative arm selection approaches in the context of risk-averse bandits include the max-min approach discussed in Galichet et al (2013), the successive rejects relying on concentration bound guarantees of Kolla et al (2019a), robust estimation-based algorithms in Kagrecha et al (2020), or Thompson Sampling approaches in Chang et al (2020) and Baudry et al (2021).…”
Section: Introductionmentioning
confidence: 99%