Multi-objective multi-armed bandit with lexicographically ordered and satisficing objectives

Hüyük, Alihan; Tekin, Cem

doi:10.1007/s10994-021-05956-1

Cited by 3 publications

(13 citation statements)

References 18 publications

(37 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To deal with these real-world applications, a natural idea is to utilize lexicographic ordering, as it ranks the objectives according to their importance (Ehrgott 2005;Wray, Zilberstein, and Mouaddib 2015;Hüyük and Tekin 2021;Hosseini et al 2021;Skalse et al 2022). Let X represent an arm space, and the expected payoffs for a, b ∈ X are [µ 1 (a), µ 2 (a), .…”

Section: Introductionmentioning

confidence: 99%

“…, m}, such that µ i (a) = µ i (b) for 1 ≤ i ≤ i * − 1 and µ i * (a) > µ i * (b). The lexicographically optimal arm is the one that is not lexicographically dominated by any other arms (Hüyük and Tekin 2021).…”

Section: Introductionmentioning

confidence: 99%

“…The only existing algorithm for multiobjective bandits under lexicographic ordering is specifically designed for the MOMAB model (Hüyük and Tekin 2021), whose arm set is finite, i.e., X = [K] 1 . Let x * denote the lexicographically optimal arm among X and x t be the arm chosen at t-th epoch.…”

Section: Introductionmentioning

confidence: 99%

“…Let x * denote the lexicographically optimal arm among X and x t be the arm chosen at t-th epoch. Hüyük and Tekin (2021) defined a priority-based regret to evaluate the performance of their algorithm, given by…”

Section: Introductionmentioning

confidence: 99%

“…The Thirty-Eighth AAAI Conference on Artificial Intelligence event that the previous i − 1 expected payoffs of the chosen arm are optimal, i.e., A i (x t ) = {µ j (x * ) − µ j (x t ) = 0, j ∈ [i−1]}. Hüyük and Tekin (2021) proposed an algorithm with a bound of O((KT ) 2/3 ) under this priority-based regret.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Multiobjective Lipschitz Bandits under Lexicographic Ordering

Xue,

Cheng,

Liu

et al. 2024

AAAI

View full text Add to dashboard Cite

This paper studies the multiobjective bandit problem under lexicographic ordering, wherein the learner aims to simultaneously maximize ? objectives hierarchically. The only existing algorithm for this problem considers the multi-armed bandit model, and its regret bound is O((KT)^(2/3)) under a metric called priority-based regret. However, this bound is suboptimal, as the lower bound for single objective multi-armed bandits is Omega(KlogT). Moreover, this bound becomes vacuous when the arm number K is infinite. To address these limitations, we investigate the multiobjective Lipschitz bandit model, which allows for an infinite arm set. Utilizing a newly designed multi-stage decision-making strategy, we develop an improved algorithm that achieves a general regret bound of O(T^((d_z^i+1)/(d_z^i+2))) for the i-th objective, where d_z^i is the zooming dimension for the i-th objective, with i in {1,2,...,m}. This bound matches the lower bound of the single objective Lipschitz bandit problem in terms of T, indicating that our algorithm is almost optimal. Numerical experiments confirm the effectiveness of our algorithm.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Multiobjective Lipschitz Bandits under Lexicographic Ordering

Xue,

Cheng,

Liu

et al. 2024

AAAI

View full text Add to dashboard Cite

show abstract

A New Framework: Short-Term and Long-Term Returns in Stochastic Multi-Armed Bandit

Sawwan,

2023

IEEE INFOCOM 2023 - IEEE Conference on Computer Communications

View full text Add to dashboard Cite

Hierarchize Pareto Dominance in Multi-Objective Stochastic Linear Bandits

Cheng,

Xue,

et al. 2024

AAAI

View full text Add to dashboard Cite

Multi-objective Stochastic Linear bandit (MOSLB) plays a critical role in the sequential decision-making paradigm, however, most existing methods focus on the Pareto dominance among different objectives without considering any priority. In this paper, we study bandit algorithms under mixed Pareto-lexicographic orders, which can reflect decision makers' preferences. We adopt the Grossone approach to deal with these orders and develop the notion of Pareto-lexicographic optimality to evaluate the learners' performance. Our work represents a first attempt to address these important and realistic orders in bandit algorithms. To design algorithms under these orders, the upper confidence bound (UCB) policy and the prior free lexicographical filter are adapted to approximate the optimal arms at each round. Moreover, the framework of the algorithms involves two stages in pursuit of the balance between exploration and exploitation. Theoretical analysis as well as numerical experiments demonstrate the effectiveness of our algorithms.

show abstract

Multi-objective multi-armed bandit with lexicographically ordered and satisficing objectives

Cited by 3 publications

References 18 publications

Multiobjective Lipschitz Bandits under Lexicographic Ordering

Multiobjective Lipschitz Bandits under Lexicographic Ordering

A New Framework: Short-Term and Long-Term Returns in Stochastic Multi-Armed Bandit

Hierarchize Pareto Dominance in Multi-Objective Stochastic Linear Bandits

Contact Info

Product

Resources

About