Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously

Lee, Chung-Wei; Luo, Haipeng; Wei, Chen-Yu; Zhang, Mengxiao; Zhang, Xiaojin

doi:10.48550/arxiv.2102.05858

Cited by 7 publications

(21 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Remark Lee et al (2021), our result has a multiplicative quadratic dependence on C, which seems to be worse. Nevertheless, we want to emphasize that we focus on the linear contextual bandit setting, where the decision sets D t at each round are not identical, which is more challenging than stochastic linear bandit setting in Lee et al (2021), where the decision set is pregiven before the execution of the algorithm and fixed during the execution of the algorithm. Therefore, our result and that in Lee et al (2021) are not directly comparable.…”

Section: Resultsmentioning

confidence: 72%

“…Nevertheless, we want to emphasize that we focus on the linear contextual bandit setting, where the decision sets D t at each round are not identical, which is more challenging than stochastic linear bandit setting in Lee et al (2021), where the decision set is pregiven before the execution of the algorithm and fixed during the execution of the algorithm. Therefore, our result and that in Lee et al (2021) are not directly comparable.…”

Section: Resultsmentioning

confidence: 99%

“…Gupta et al (2019) proposed an improved algorithm that can achieve a regret bound with only additive dependence on C. On the flip side, many research efforts have also been devoted into designing adversarial attacks that cause standard bandit algorithms to fail (Jun et al, 2018;Liu and Shroff, 2019;Lykouris et al, 2018;Garcelon et al, 2020). Stochastic Linear Bandits with Corruptions: Li et al (2019a) studied stochastic linear bandits with adversarial corruptions and achieved O(d 5/2 C/∆ + d 6 /∆ 2 ) regret bound where d is the dimension of the context vectors, ∆ is the gap between the rewards of the best and the second best actions in the decision set D. Bogunovic et al (2021) Lee et al (2021) used a slightly different definition of regret and adopted a strong assumption on corruptions that at each round t, the corruptions on rewards are linear in the actions. Linear Contextual Bandits with Corruptions: Bogunovic et al (2021) studied linear contextual bandits with adversarial corruptions and considered the setting under the assumption that or equal to Quantity(A) in their proof.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Linear Contextual Bandits with Adversarial Corruptions

Zhao¹,

Zhou²,

Gu³

2021

Preprint

View full text Add to dashboard Cite

We study the linear contextual bandit problem in the presence of adversarial corruption, where the interaction between the player and a possibly infinite decision set is contaminated by an adversary that can corrupt the reward up to a corruption level C measured by the sum of the largest alteration on rewards in each round. We present a variance-aware algorithm that is adaptive to the level of adversarial contamination C. The key algorithmic design includes (1) a multi-level partition scheme of the observed data, (2) a cascade of confidence sets that are adaptive to the level of the corruption, and (3) a variance-aware confidence set construction that can take advantage of low-variance reward. We further prove that the regret of the proposed, where d is the dimension of context vectors, T is the number of rounds, R is the range of noise and σ 2 t , t = 1 . . . , T are the variances of instantaneous reward. We also prove a gap-dependent regret bound for the proposed algorithm, which is instance-dependent and thus leads to better performance on good practical instances. To the best of our knowledge, this is the first variance-aware corruption-robust algorithm for contextual bandits. Experiments on synthetic data corroborate our theory.

show abstract

Section: Resultsmentioning

confidence: 72%

Section: Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Linear Contextual Bandits with Adversarial Corruptions

Zhao¹,

Zhou²,

Gu³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…data, Wei et al (2020) show that a martingale version of Catoni is possible, which is what we apply in this work. We remark that several applications of the Catoni estimator to linear bandits have been proposed recently (Camilleri et al, 2021;Lee et al, 2021). We refer the reader to the survey Lugosi & Mendelson (2019) for a discussion of other robust mean estimators.…”

Section: Related Workmentioning

confidence: 94%

First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach

Wagenmaker¹,

Chen²,

Simchowitz³

et al. 2021

Preprint

View full text Add to dashboard Cite

Obtaining first-order regret bounds-regret bounds scaling not as the worst-case but with some measure of the performance of the optimal policy on a given instance-is a core question in sequential decision-making. While such bounds exist in many settings, they have proven elusive in reinforcement learning with large state spaces. In this work we address this gap, and show that it is possible to obtain regret scaling as O( V 1 K) in reinforcement learning with large state spaces, namely the linear MDP setting. Here V 1 is the value of the optimal policy and K is the number of episodes. We demonstrate that existing techniques based on least squares estimation are insufficient to obtain this result, and instead develop a novel robust self-normalized concentration bound based on the robust Catoni mean estimator, which may be of independent interest.

show abstract

“…For MAB, there are a number of studies on best-of-both-worlds algorithms [Bubeck and Slivkins, 2012, Zimmert and Seldin, 2021, Seldin and Slivkins, 2014, Seldin and Lugosi, 2017, Pogodin and Lattimore, 2020, Auer and Chiang, 2016, Wei and Luo, 2018, Zimmert et al, 2019, Lee et al, 2021, Ito, 2021. Among these, studies by Wei and Luo [2018], Zimmert and Seldin [2021], Zimmert et al [2019] are closely related to this work.…”

Section: Related Workmentioning

confidence: 99%

On Optimal Robustness to Adversarial Corruption in Online Decision Problems

Ito

2021

Preprint

View full text Add to dashboard Cite

This paper considers two fundamental sequential decision-making problems: the problem of prediction with expert advice and the multi-armed bandit problem. We focus on stochastic regimes in which an adversary may corrupt losses, and we investigate what level of robustness can be achieved against adversarial corruptions. The main contribution of this paper is to show that optimal robustness can be expressed by a square-root dependency on the amount of corruption. More precisely, we show that two classes of algorithms, anytime Hedge with decreasing learning rate and algorithms with second-order regret bounds, achieve)-regret, where N, ∆, and C represent the number of experts, the gap parameter, and the corruption level, respectively. We further provide a matching lower bound, which means that this regret bound is tight up to a constant factor. For the multi-armed bandit problem, we also provide a nearly tight lower bound up to a logarithmic factor.Preprint. Under review.

show abstract

Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously

Cited by 7 publications

References 6 publications

Linear Contextual Bandits with Adversarial Corruptions

Linear Contextual Bandits with Adversarial Corruptions

First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach

On Optimal Robustness to Adversarial Corruption in Online Decision Problems

Contact Info

Product

Resources

About