Statistically Robust, Risk-Averse Best Arm Identification in Multi-Armed Bandits

Kagrecha, Anmol; Nair, Jayakrishnan; Jagannathan, Krishna

doi:10.48550/arxiv.2008.13629

Search citation statements

Order By: Relevance

Paper Sections

Select...

Introduction1

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2021

Publication Types

Select...

Other1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

(1 citation statement)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Upper confidence bound algorithms in this context are studied by Maillard (2013), Cassel et al (2018), Khajonchotpanya et al (2021). Alternative arm selection approaches in the context of risk-averse bandits include the max-min approach discussed in Galichet et al (2013), the successive rejects relying on concentration bound guarantees of Kolla et al (2019a), robust estimation-based algorithms in Kagrecha et al (2020), or Thompson Sampling approaches in Chang et al (2020) and Baudry et al (2021).…”

Section: Introductionmentioning

confidence: 99%

Risk averse non-stationary multi-armed bandits

Benac¹,

Godin²

2021

Preprint

View full text Add to dashboard Cite

This paper tackles the risk averse multi-armed bandits problem when incurred losses are nonstationary. The conditional value-at-risk (CVaR) is used as the objective function. Two estimation methods are proposed for this objective function in the presence of non-stationary losses, one relying on a weighted empirical distribution of losses and another on the dual representation of the CVaR. Such estimates can then be embedded into classic arm selection methods such as -greedy policies. Simulation experiments assess the performance of the arm selection algorithms based on the two novel estimation approaches, and such policies are shown to outperform naive benchmarks not taking non-stationarity into account.

show abstract

Section: Introductionmentioning

confidence: 99%